Code: pylprotpredictor module¶
CDS class¶
-
class
pylprotpredictor.cds.
CDS
(seq_id='', origin_seq=None, origin_seq_id='', start=-1, end=-1, strand='forward', seq=None, alternative_ends=[], alternative_cds=[], alignments=[], conserved_cds=None, rejected_cds=[], status='')[source]¶ Class to describe a CDS
-
add_alignment
(alignment)[source]¶ Add an alignment object to the list of alignment
Parameters: alignment – an alignment object
-
add_alternative_cds
(alternative_cds)[source]¶ Add an alternative CDS to the list of possible alternative CDS
Parameters: alternative_cds – a CDS object
-
add_id_alignment
(seq_id, alignment)[source]¶ Add alignment to the correct CDS object
Parameters: - seq_id – id of the CDS
- alignment – alignment object to add
-
add_rejected_cds
(rejected_cds)[source]¶ Add a rejected CDS to the list of rejected CDS
Parameters: rejected_cds – a CDS object
-
extract_possible_alternative_seq
()[source]¶ Extract the start, end and sequence of different possible sequences for a CDS identified as potential PYL CDS
-
find_alternative_ends
()[source]¶ Find alternative ends (on the same ORF) for a CDS until the next found STOP codon on the genome (or its complement if the CDS is on the reverse strand)
-
get_alternative_cds
()[source]¶ Return the list of possible alternative CDS if the CDS is ending with TAG STOP codon
Returns: list of CDS object
-
get_alternative_end
()[source]¶ Return the list of alternative CDS end
Returns: list of the end of the alternative CDS
-
get_alternative_ends
()[source]¶ Return the list of possible alternative ends if the CDS is ending with TAG STOP codon
Returns: list of int corresponding to the alternative ends
-
get_alternative_start
()[source]¶ Return the list of alternative CDS start
Returns: list of the start of the alternative CDS
-
get_conserved_cds
()[source]¶ Return the CDS object of the conserved CDS as correct CDS (start, end, sequence)
Returns: CDS object of the conserved CDS
-
get_end
()[source]¶ Return the end position of the CDS on the origin sequence
Returns: int corresponding to the end position
-
get_lowest_evalue_alignment
()[source]¶ Return the alignment with the lowest evalue
Returns: alignment
-
get_origin_seq
()[source]¶ Return the SeqRecord object corresponding to the origin seq of the CDS
Returns: SeqRecord object
-
get_origin_seq_id
()[source]¶ Return the id of origin seq of the CDS
Returns: string corresponding to the origin seq
-
get_origin_seq_size
()[source]¶ Return the length of the origin sequence
Returns: int corresponding to the length of the origin sequence
-
get_origin_seq_string
()[source]¶ Return the string of the origin sequence
Returns: string corresponding to the origin sequence
-
get_rejected_cds
()[source]¶ Return a list of the rejected CDS objects as correct CDS (start, end, sequence)
Returns: list of CDS objects
-
get_start
()[source]¶ Return the start position of the CDS on the origin sequence
Returns: int corresponding to the start position
-
get_strand
()[source]¶ Return the strand of the CDS on the origin sequence
Returns: string corresponding to the strand (forward or reverse)
-
get_translated_alternative_seq
()[source]¶ Return a list of the translated sequences of the alternative sequences
Returns: list of SeqRecord objects
-
get_translated_seq
()[source]¶ Return the translated sequence of the CDS
Returns: SeqRecord object corresponding to the translated sequence
-
identify_cons_rej_cds
()[source]¶ Identify which alternative CDS to converse or reject based on the evalue and the alignment length: Keep the sequence with a lowest evalue and a longer alignment
Returns: better alignment
-
init_from_dict
(in_dict)[source]¶ Initiate a CDS instance with a dictionary
Parameters: in_dict – dictionary with attribute for a CDS object
-
init_from_record
(record)[source]¶ Initiate a CDS instance with a SeqRecord object
Parameters: record –
-
set_alternative_ends
(alternative_ends)[source]¶ Change the list of alternative ends
Parameters: alternative_ends – list of int corresponding to the new alternative ends
-
set_conserved_cds
(conserved_cds)[source]¶ Change the conserved CDS
Parameters: conserved_cds – CDS object of the conserved CDS
-
set_origin_seq
(origin_seq)[source]¶ Change the SeqRecord object corresponding to the origin seq of the CDS
Parameters: origin_seq – SeqRecord object
-
set_origin_seq_id
(origin_seq_id)[source]¶ Change the id of the origin sequence of the CDS
Parameters: origin_seq_id – new origin seq id value
-
set_seq
(seq)[source]¶ Change the sequence object of the CDS
Parameters: seq – new Seq object with the sequence of the CDS
-
-
pylprotpredictor.cds.
extract_seq_desc
(desc)[source]¶ Extract from description the seq id, the origin sequence, start, end and strand from a predicted CDS
Parameters: desc – description of a prediced CDS with Prodigal Returns: id of predicted CDS Returns: id of the origin sequence Returns: start position of the predicted CDS Returns: end position of the predicted CDS Returns: strand of the predicted CDS
-
pylprotpredictor.cds.
find_stop_codon_pos_in_seq
(seq)[source]¶ Find position of STOP codon inside a sequence (not the last position)
Parameters: seq – string sequence of amino acids Returns: list of position for possible STOP codons in a sequence
-
pylprotpredictor.cds.
test_to_continue
(end, origin_seq_size)[source]¶ Test if possible to extract next codon: position still in the genome
Parameters: - end – int corresponding to the current end
- origin_seq_size – size of the origin sequence
Returns: boolean
-
pylprotpredictor.cds.
transform_strand
(strand_id)[source]¶ Transform strand from numerical value to string value
Parameters: strand_id – numerical value to represent a strand (1 or -1) Returns: string value (forward or reverse) for the strand
-
pylprotpredictor.cds.
translate
(seq)[source]¶ Translate a sequence into amino acids while replacing any possible STOP codon encoded by TAG by a Pyl amino acid
Parameters: seq – a Seq object Returns: string with the corresponding amino acid sequence with the TAG encoded STOP are replaced by Pyl amino acid
Alignment class¶
-
class
pylprotpredictor.alignment.
Alignment
(sseqid='', pident=0, length=0, mismatch=0, gapopen=0, qstart=0, qend=0, sstart=0, send=0, evalue=10, bitscore=0)[source]¶ Class to describe a DIAMOND alignment
Predict¶
-
pylprotpredictor.predict.
extract_potential_pyl_cds
(pred_cds, pot_pyl_cds_filepath, pot_pyl_cds_info_filepath, pred_cds_obj_filepath)[source]¶ Extract potential PYL CDS from TAG-ending CDS
Parameters: - pred_cds – a dictionary with the predicted CDS represented as CDS objects
- pot_pyl_cds_filepath – path to fasta file in which the protein sequences of the potential PYL CDS are saved
- pot_pyl_cds_info_filepath – path to a cvs file to get information about potential PYL CDS
- pred_cds_obj_filepath – path to generated JSON file to store the list of predicted CDS objects
-
pylprotpredictor.predict.
extract_predicted_cds
(pred_cds_path, pred_cds_info_path, tag_ending_cds_info_path, genome_filepath)[source]¶ Extract the list of predicted CDS and identify the CDS ending with TAG STOP codon
Parameters: - pred_cds_path – path to the output of CDS prediction (Prodigal)
- pred_cds_info_path – path to a CSV file in which the information (start, end, strand, origin) are collected for each predicted CDS
- tag_ending_cds_info_path – path to CSV file to export the information about the TAG ending CDS
- genome_filepath – path to reference genome
Returns: a dictionary with the predicted CDS represented by CDS object
-
pylprotpredictor.predict.
extract_seqs
(seq_filepath)[source]¶ Extract the sequences in a fasta file
Parameters: seq_filepath – path to a fasta file Returns: a dictionary with all sequences indexed by their id, their length and their complement sequence
-
pylprotpredictor.predict.
predict_pyl_proteins
(genome_filepath, pred_cds_filepath, pot_pyl_seq_filepath, log_filepath, pred_cds_info_filepath, tag_ending_cds_info_filepath, pot_pyl_seq_info_filepath, pred_cds_obj_filepath)[source]¶ Run prediction of potentila PYL CDS:
- Extraction of predicted CDS into a dictionary
- Identification of TAG-ending proteins
- Extraction of potential PYL sequences
Parameters: - genome_filepath – path to file with genome sequence
- pred_cds_filepath – path to the output of CDS prediction (Prodigal)
- pot_pyl_seq_filepath – path to fasta file with potential PYL CDS sequence
- log_filepath – path to log file
- pred_cds_info_filepath – path to CSV file with predicted CDS info
- tag_ending_cds_info_filepath – path to CSV file with TAG-ending CDS info
- pot_pyl_seq_info_filepath – path to CSV file with potential PYL CDS info
- pred_cds_obj_filepath – path to generated JSON file to store the list of predicted CDS objects
Check¶
-
pylprotpredictor.check.
check_pyl_proteins
(pot_pyl_similarity_search, pred_cds_obj_filepath, cons_pred_cds_seq, info_filepath)[source]¶ Check predicted PYL CDS:
- Get the potential PYL CDS
- Parse the similarity search report
- Identify and extract the correct CDS sequence (the one with the lowest evalue and longest alignment for potential PYL)
Parameters: - pot_pyl_similarity_search – path to similarity search report of potential PYL CDS against a reference database
- pred_cds_obj_filepath – path to generated JSON file to store the list of predicted CDS objects
- cons_pred_cds_seq – path to a FASTA file for the conserved CDS sequences
- info_filepath – path to a CSV file with final information about the CDS
-
pylprotpredictor.check.
extract_correct_cds
(pred_cds, cons_pred_cds_seq, info_filepath)[source]¶ Identify and extract the correct CDS sequence
Parameters: - pred_cds – dictionary of the predicted CDS
- cons_pred_cds_seq – path to a FASTA file for the conserved CDS sequences
- info_filepath – path to a CSV file with final information about the CDS
-
pylprotpredictor.check.
get_cds_obj
(cds_id, pred_cds)[source]¶ Find the CDS object given an id
Parameters: - cds_id – id of the CDS to find
- pred_cds – dictionary of the predicted CDS
Returns: a CDS object
-
pylprotpredictor.check.
import_cds
(cds_obj_filepath)[source]¶ Parameters: cds_obj_filepath – path to JSON file with collection of CDS objects Returns: dictionary of the CDS objects
-
pylprotpredictor.check.
parse_similarity_search_report
(pot_pyl_similarity_search, pred_cds)[source]¶ Parse the similarity search report and add information to the list of potential PYL CDS
Parameters: - pot_pyl_similarity_search – path to similarity search report of potential PYL CDS against a reference database
- pred_cds – dictionary of the predicted CDS
Write report¶
-
pylprotpredictor.write_report.
extract_row_number
(csv_filepath)[source]¶ Extract row number of a CSV file
Parameters: csv_filepath – path to a CSV file Returns: an integer corresponding to the number of lines in the CSV file
-
pylprotpredictor.write_report.
write_report
(pred_cds, tag_ending_cds, pot_pyl_cds, final_cds, report_filepath)[source]¶ Write HTML report to summarize the full analysis
Parameters: - pred_cds – path to CSV file with predicted CDS info
- tag_ending_cds – path to CSV file with TAG-ending CDS info
- pot_pyl_cds – path to CSV file with potential PYL CDS info
- final_cds_info – path to a CSV file with final information about the CDS
- report_filepath – path to HTML file in which writing the report