Usage within Python

To use ENASearch within Python:

>>> import enasearch

Functions

enasearch.build_retrieve_url(ids, display, result=None, download=None, file=None, offset=None, length=None, subseq_range=None, expanded=False, header=False)[source]

Build the URL to retrieve data or taxon

This function builds the URL to retrieve data or taxon on ENA. It takes several arguments, check their validity before combining them to build the URL.

Parameters:
  • ids – comma-separated identifiers for records other than Taxon
  • display – display option to specify the display format (accessible with get_display_options)
  • offset – first record to get
  • length – number of records to retrieve
  • download – download option to specify that records are to be saved in a file (used with file option, accessible with get_download_options)
  • file – filepath to save the content of the search (used with download option)
  • subseq_range – range for subsequences (limit separated by a -)
  • expanded – boolean to determine if a CON record is expanded
  • header – boolean to obtain only the header of a record
Returns:

a string with the build URL

enasearch.check_display_option(display)[source]

Check if a display id is in the list of output formats for a query on ENA

This function raises an error if the id is not in the list of possible displayable format

Parameters:display – display to check
enasearch.check_download_file_options(download, file)[source]

Check that download and file options are correctly defined

This function check:

  • A filepath is given
  • A download option is given
  • The download option is in the list of options for download of data from ENA
Parameters:
  • download – download option to specify that records are to be saved in a file (used with file option, accessible with get_download_options)
  • file – filepath to save the content of the data (used with download option)
enasearch.check_download_option(download)[source]

Check if an options is in the list of options for download of data from ENA

This function raises an error if the id is not in the list of possible download options

Parameters:download – download format to check
enasearch.check_length(length)[source]

Check if length (number of results for a query) is below the maximum

This function raises an error if the given length (or number of results for a query) is below the maximum value <lengthLimit>

Parameters:length – length value to test
enasearch.check_result(result)[source]

Check if a result id is in the list of possible results accessible on ENA

This function raises an error if the result is not in the list of possible results

Parameters:result – id of result to check
enasearch.check_returnable_fields(fields, result)[source]

Check that some field id correspond to returnable fields for a resut

This function raises an error if one of the ids is not in the list of possible returnable fields for the given result

Parameters:
  • fields – list of fields to check
  • result – id of the result (partition of ENA db), accessible with get_results
enasearch.check_sortable_fields(fields, result)[source]

Check that some field id correspond to sortable fields for a resut

This function raises an error if one of the ids is not in the list of possible sortable fields for the given result

Parameters:
  • fields – list of fields to check
  • result – id of the result (partition of ENA db), accessible with get_results
enasearch.check_subseq_range(subseq_range)[source]

Check that a range of sequences to extract is well defined

This function check:

  • The range is correctly built: 2 integer values separated by a -
  • The second value is higher than the first one
Parameters:download – range for subsequences (limit separated by a -)
enasearch.check_taxonomy_result(result)[source]

Check if a result id is in the list of possible results in ENA Taxon Portal

This function raises an error if the result is not in the list of possible taxonomy results

Parameters:result – id of result to check
enasearch.format_seq_content(seq_str, out_format)[source]

Format a string with sequences into a list of BioPython sequence objects (SeqRecord)

Parameters:
  • seq_str – string with sequences to format
  • out_format – fasta or fastq
Returns:

a list of SeqRecord objects with the sequences in the input string

enasearch.get_display_options(verbose=False)[source]

Return the possible formats to display the result of a query on ENA

Parameters:verbose – boolean to define the printing info
Returns:dictionary with the keys being the formats and the values a description of the formats
enasearch.get_download_options(verbose=False)[source]

Return the options for download of data from ENA

Parameters:verbose – boolean to define the printing info
Returns:dictionary with the options and the values a description of the options
enasearch.get_filter_fields(result, verbose=False)[source]

Return the filter fields of a result

This function returns the fields that can be used to build a query on a result on ENA. Each field is described in a dictionary with a short description and its type (text, number, etc).

Parameters:
  • result – id of the result (partition of ENA db), accessible with get_results
  • verbose – boolean to define the printing info
Returns:

dictionary with the keys being the fields ids and the values dictionary to describe the fields

enasearch.get_filter_types(verbose=False)[source]

Return the filters that can be used for the different type of data in a query on ENA

This function returns the filters that can be used for the different type of data (information available with the information on the filter fileds). For each type of data is given the operations applicable and a description of the type of expected values

Parameters:
  • result – id of the result (partition of ENA db), accessible with get_results
  • verbose – boolean to define the printing info
Returns:

dictionary with the keys being the type of data and the values dictionary to describe the filters for this type of data

enasearch.get_result(result, verbose=False)[source]

Return the description of a result (description, returnable and filter fields)

Parameters:
  • result – id of the result (partition of ENA db), accessible with get_results
  • verbose – boolean to define the printing info
Returns:

dictionary with a description of the result, the list of returnable fields and a dictionnary with the filter fields

enasearch.get_results(verbose=True)[source]

Return the possible results (type of data) in ENA (other than taxonomy)

Each result is described in a dictionary with a description of the result, the list of returnable fields associated with the result and a dictionnary with the filter fields associated with the result

Parameters:verbose – boolean to define the printing info
Returns:a dictionary with the keys being the result ids and the values dictionary to describe the results
enasearch.get_returnable_fields(result, verbose=False)[source]

Return the returnable fields of a result

This function returns the list of fields that can be extracted for a result in a query on ENA

Parameters:
  • result – id of the result (partition of ENA db), accessible with get_results
  • verbose – boolean to define the printing info
Returns:

list of fields that can be extracted for a result

enasearch.get_search_result_number(free_text_search, query, result, need_check_result=True)[source]

Get the number of results for a query on a result

This function builds a query on ENA to extract the number of results matching the query on ENA

Parameters:
  • free_text_search – boolean to describe the type of query
  • query – query string, made up of filtering conditions, joined by logical ANDs, ORs and NOTs and bound by double quotes
  • result – id of the result (partition of ENA db), accessible with get_results
Returns:

an integer corresponding to the number of results of a query on ENA

enasearch.get_search_url(free_text_search)[source]

Get the prefix for the URL to search ENA database

Parameters:free_text_search – boolean to describe the type of query
Returns:a string with the prefix of an URL to search ENA database
enasearch.get_sortable_fields(result, verbose=False)[source]

Return the sortable fields of a result

This function returns the fields that can be used to sort the output of a query for a result on ENA. Each field is described in a dictionary with a short description and its type (text, number, etc).

Parameters:
  • result – id of the result (partition of ENA db), accessible with get_results
  • verbose – boolean to define the printing info
Returns:

dictionary with the keys being the fields ids and the values dictionary to describe the fields

enasearch.get_taxonomy_results(verbose=False)[source]

Return description about the possible results accessible via the taxon portal.

Each taxonomy result is described with a short description.

Parameters:verbose – boolean to define the printing info
Returns:a dictionary with the keys being the result ids and the values dictionary to describe the results
enasearch.load_object(filepath)[source]

Load object from a pickle file

Parameters:filepath – path to pickle file with serialized data
enasearch.request_url(url, display, file=None)[source]

Run the URL request and return content or status

This function tooks an URL built to query or extract data from ENA and apply this URL. If a filepath is given, the function puts the result into the file and returns the status of the request. Otherwise, the results of the request is returned by the function in different format depending of the display format

Parameters:
  • url – URL to request on ENA
  • display – display option
  • length – number of records to retrieve
  • file – filepath to save the content of the search
Returns:

status of the request or the result of the request (in different format)

enasearch.retrieve_analysis_report(accession, fields=None, file=None)[source]

Retrieve analysis report from ENA

Parameters:
  • accession – accession id
  • fields – comma-separated list of fields to have in the report (accessible with get_returnable_fields with result=analysis)
  • file – filepath to save the content of the report
Returns:

requested run repor

enasearch.retrieve_data(ids, display, download=None, file=None, offset=None, length=None, subseq_range=None, expanded=False, header=False)[source]

Retrieve ENA data (other than taxon)

This function retrieves data (other than taxon) from ENA by:

  • Building the URL based on the ids to retrieve and some parameters to format the results
  • Requesting the URL to extract the data
Parameters:
  • ids – comma-separated identifiers for records other than Taxon
  • display – display option to specify the display format (accessible with get_display_options)
  • offset – first record to get
  • length – number of records to retrieve
  • download – download option to specify that records are to be saved in a file (used with file option, accessible with get_download_options)
  • file – filepath to save the content of the search (used with download option)
  • subseq_range – range for subsequences (limit separated by a -)
  • expanded – boolean to determine if a CON record is expanded
  • header – boolean to obtain only the header of a record
Returns:

data corresponding to the requested ids and formatted given the parameters

enasearch.retrieve_filereport(accession, result, fields=None, file=None)[source]

Retrieve a file (run or analysis) report

This function builds an URL to retrieve file (run or analysis) report from ENA and return the result of the request.

Parameters:
  • accession – accession id
  • result – read_run for a run report or analysis for an analysis report
  • fields – comma-separated list of fields to have in the report
  • file – filepath to save the content of the report
Returns:

requested file report

enasearch.retrieve_run_report(accession, fields=None, file=None)[source]

Retrieve run report from ENA

Parameters:
  • accession – accession id
  • fields – comma-separated list of fields to have in the report (accessible with get_returnable_fields with result=read_run)
  • file – filepath to save the content of the report
Returns:

requested run report

enasearch.retrieve_taxons(ids, display, result=None, download=None, file=None, offset=None, length=None, subseq_range=None, expanded=False, header=False)[source]

Retrieve data from the ENA Taxon Portal

This function retrieves data (other than taxon) from ENA by:

  • Formatting the ids to query then on the Taxon Portal
  • Building the URL based on the ids to retrieve and some parameters to format the results
  • Requesting the URL to extract the data
Parameters:
  • ids – comma-separated taxon identifiers
  • display – display option to specify the display format (accessible with get_display_options)
  • result – taxonomy result to display (accessible with result)
  • offset – first record to get
  • length – number of records to retrieve
  • download – download option to specify that records are to be saved in a file (used with file option, accessible with get_download_options)
  • file – filepath to save the content of the search (used with download option)
  • subseq_range – range for subsequences (limit separated by a -)
  • expanded – boolean to determine if a CON record is expanded
  • header – boolean to obtain only the header of a record
Returns:

data corresponding to the requested ids and formatted given the parameters

enasearch.search_all_data(free_text_search, query, result, display, download=None, file=None)[source]

Search ENA data and get all results (not size limited)

This function

  • Extracts the number of possible results for the query
  • Extracts the all the results of the query (by potentially running several times the search function)
Parameters:
  • free_text_search – boolean to describe the type of query
  • query – query string, made up of filtering conditions, joined by logical ANDs, ORs and NOTs and bound by double quotes
  • result – id of the result (partition of ENA db), accessible with get_results
  • display – display option to specify the display format
  • download – download option to specify that records are to be saved in a file (used with file option)
  • file – filepath to save the content of the search (used with download option)
Returns:

all results of the request in a format defined in the parameters

enasearch.search_data(free_text_search, query, result, display, offset=None, length=None, download=None, file=None, fields=None, sortfields=None)[source]

Search ENA data

This function

  • Builds the URL for a given query to search/extract data on ENA database
  • Formats the results given the option defined

The number of results for the query is limited at <lengthLimit>

Parameters:
  • free_text_search – boolean to describe the type of query
  • query – query string, made up of filtering conditions, joined by logical ANDs, ORs and NOTs and bound by double quotes
  • result – id of the result (partition of ENA db), accessible with get_results
  • display – display option to specify the display format (accessible with get_display_options)
  • offset – first record to get
  • length – number of records to retrieve
  • download – download option to specify that records are to be saved in a file (used with file option)
  • file – filepath to save the content of the search (used with download option)
  • fields – comma-separated list of fields to return (only if display=report)
  • sortfields – comma-separated list of fields to sort the results (only if display=report)
Returns:

results of the request in a format defined in the parameters

Data

The fields and their description, the formats, etc were extracted manually and stored in csv file in enasearch_data. They were then serialized to be quickly imported in enasearch script.

To update them, you can check the corresponding section in Contributing.