BLAST

BLAST stands for Basic Local Alignment Search Tool (Altschul et al 1990). It allows you to query a sequence database with a sequence in order to find entries in the database that contain similar sequences. When "BLAST-ing", you are able to specify either nucleotide or protein sequences and nucleotide sequences can be either DNA or RNA sequences. Sequences can be BLAST-ed against databases held at NCBI (see NCBI BLAST), or contained within your local Geneious database (Custom BLAST).

To run a BLAST search in Geneious Prime, select your query sequence or sequences and click the BLAST button in the toolbar. This operation can also be accessed by going to the Tools menu or by right-clicking (Ctrl+click on Mac OS X) on a sequence document and choosing BLAST. You can choose to BLAST either your currently selected sequence documents or a sequence you enter manually. If you choose to enter your sequence manually, then Geneious will display a large text box in which you can enter your query sequence as either unformatted text or FASTA format.

BLAST

Select your database using the first drop-down box. Databases are grouped together under their respective services. Then choose which kind of BLAST search you wish to run under Program. The available programs will depend on the database you have chosen.

Geneious Prime can perform seven different kinds of BLAST search:

  • blastn: Compares a nucleotide query sequence against a nucleotide sequence database.

  • Megablast: A variation on blastn that is faster but only finds matches with high similarity.

  • Discontiguous Megablast: A variation on blastn that is slower but more sensitive. It will find more dissimilar matches so it is ideal for cross-species comparison.

  • blastp: Compares an amino acid query sequence against a protein sequence database.

  • blastx: Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence.

  • tblastn: Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.

  • tblastx: Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

Three options are available for displaying your results:

  • Hit table: Returns one alignment for every hit against the database and displays them in a hit table. Each query displays a separate table and is also viewable as a query-centric alignment. This is suitable for less than 100 queries.

  • Query-centric alignment: Returns one alignment for each query, showing the hits aligned against the query sequence. This is well suited for large batch searches but it doesn't display a hit table.

  • Bin into 'has hit' vs. 'no hit' Returns two sequence lists: one containing queries which get a hit in the database, the other containing queries which don't. Details about the hits and alignments are discarded. This can be used to filter contamination (eg. human) from sequencing reads.

You can also specify how much of each matching sequence to retrieve from your database:

  • Matching region: Just the region of the database sequence which matches the query.

  • Matching region with annotations: The region of the database sequence which matches the query, plus any annotations on that sequence.

  • Extended region with annotations: The matching region plus additional flanking regions upstream and downstream.

  • Full sequence with annotations: The entire database sequence (this could be large and slow).

Geneious also allows you to specify most of the advanced options that are available in BLAST. To access the advanced options click the More Options button which is in the bottom left of the BLAST options. The available options vary depending on the kind of BLAST search you have selected. For details on each of the options you can hover your mouse over the option to see a short description or refer to the BLAST documentation from NCBI.

BLAST results

Once a search has started, a results subfolder will be created in the same folder as your query sequence. Search progress is shown in the document table. The search can be cancelled by clicking on the red square labelled Stop.

BLAST hit table

If you chose to return your results in a hit table, each search hit is displayed separately in the document table sorted by bit score. The bit score gives an indication of how good the alignment is; the higher the score, the better the alignment. In general terms, this score is calculated from a formula that takes into account the alignment of similar or identical residues, as well as any gaps introduced to align the sequences.

BLAST result

Search hits can also be sorted by other columns by clicking on the column header. Columns that may be useful to sort by include E-value, Percent Identity, Query Coverage or Grade. E value or "Expect value" represents the number of hits with at least this score that you would expect purely by chance, given the size of the database and query sequence. The lower the E-value, the more likely that the hit is real. The Grade column is a percentage calculated by Geneious by combining the query coverage, e-value and identity values for each hit with weights 0.5, 0.25 and 0.25 respectively. This allows you to sort hits such that the longest, highest identity hits are at the top.

Specifically, Grade = 50 * fractionCoverage + 25 * (maximum(0,1-eValue/10-20)) + 25 * (maximum(0,(percentIdentity - minGradedIdentity) / (100 - minGradedIdentity))) , where minGradedIdentity is 50 for nucleotide and 25 for protein sequences

You can also download the full database sequence that corresponds to a BLAST hit. To retrieve the full sequence select a BLAST alignment and go to File → Download Documents or click the Download Full Sequence(s) button located above the viewer tabs. The full sequence will be available in the Sequence View tab once the download has completed and the region that matches the query sequence will be annotated as BLAST Hit. In addition the annotations from the full sequence will be transferred over to the BLAST alignment and can be viewed in Alignment view.

BLAST result

Query-centric view

This view displays all of the hits to your query in a single alignment. Results of single BLAST searches can be viewed in query-centric view instead of a hit table by clicking the Query Centric View tab at the top of the document table. The query sequence is shown in yellow at the top, and the hits are aligned underneath.

BLAST result

Alternatively, you can choose to only return a query-centric alignment when you set up the BLAST search. This option is particularly useful for batch BLAST, as only one alignment per query is returned and all the results are displayed in a single folder. In this view each hit sequence in each alignment is annotated with a Search hit annotation. If you mouse over the annotation you can bring up the values for E-value, pairwise identity, Grade etc. To display these values in a table, switch to the Annotations tab in the sequence viewer and add these columns to the table by clicking the Columns button.

NCBI BLAST

Geneious Prime is able to BLAST to many different databases held at NCBI (see the tables below). These can be selected in the Databases drop down menu in the BLAST set up dialog. You must be able to connect to the internet from within Geneious Prime to BLAST to NCBI, and if you are behind a proxy server you may need to enter your proxy server settings under Tools → Preferences → Connection Settings.

Nucleotide BLAST databases

Database Nucleotide searches
Nucleotide collection (nt) All non-redundant GenBank+EMBL+DDBJ+PDB sequences (no EST, STS, GSS or HTGS sequences)
16S ribosomal RNA 16S rRNA sequences from bacteria and archaea
18S ribosomal RNA 18S rRNA sequences (Fungal)
28S ribosomal RNA 28S rRNA sequences (Fungal)
Environmental samples (env_nt) Nucleotide sequences from large environmental sequence projects
Expressed sequence tags (est) Database of GenBank + EMBL + DDBJ sequences from EST Divisions
EST human Human subset of est
EST mouse Mouse subset of est
EST others Non-Human, non-mouse subset of est
Genomic Survey Sequences (gss) Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences
High Throughput Genomic Sequences (htgs) Unfinished HTGS: phases 0, 1 and 2 (finished, phase 3 HTG sequences are in nr)
Human ALU repeat elements (alu_repeats) A small database of Human ALU repeat elements
Human RefSeqGene (RefSeq_Gene) NCBI transcript reference sequences from human
Internal transcribed spacer region (ITS) ITS region from fungal type and reference material
NCBI Genomes (chromosome) Complete genomes and chromosomes from the NCBI Reference Sequence project.
NCBI Reference Genomic Sequences (refseq_genomic) Genomic Reference sequences
Patented Protein Sequences (pat) Nucleotide sequences derived from the Patent division of GenBank
Protein Data Bank (PDB) Sequences derived from the 3D-structures of proteins from PDB
Reference RNA (refseq_rna) NCBI Transcript Reference Sequences
RefSeq Representative genomes Best quality and minimum redundancy genomes from NCBI Refseq Genomes
Sequence Tagged Sites (dbsts) Database of GenBank+EMBL+DDBJ sequences from STS Divisions
WGS Human Whole-genome shotgun contigs for Homo sapiens

Protein BLAST databases

Database Protein searches
Nucleotide collection (nr) All non-redundant GenBank coding region (CDS) translations+PDB+SwissProt+PIR+PRF
Metagenomic proteins (env_nr) Translations of sequences in env_nt
Patented Protein Sequences (pat) Protein sequences derived from the Patent division of GenBank
Protein Data Bank (PDB) Sequences derived from 3D structure Brookhaven PDB
Reference Proteins (refseq_protein) NCBI protein reference sequences
UniProtKB/SwissProt Non-redundant protein sequences information from EMBL

Edit BLAST Databases

You can edit display settings for NCBI BLAST databases, and change which BLAST databases are available via Geneious Prime by clicking on Edit Databases in the Tools → Add/Remove Databases → Set Up BLAST Services window. The actual databases on the BLAST server will not be changed by any edits made via this window. The following fields are available and may be edited:

  • Database Name: A unique, case-sensitive name for the database which is specified by the NCBI or other database server. This must be correct for Geneious to be able to find and search the database. The database name may be composed of multiple parts, e.g. 'wgs:9606' to access WGS sequences for Homo Sapiens.

  • Display Name: The name that is displayed in Geneious for that database. This can be any unique and non-empty value.

  • Description: Additional information to describe the database.

  • Nucleotide / Protein: This option specifies the molecule type of the sequences contained in the database. Either Nucleotide, Protein, or both options must be selected.

Custom BLAST

Custom BLAST allows you to create your own custom database from either FASTA files or sequences in your local folders, and BLAST against it. The Custom BLAST plugin requires access to NCBI BLAST+ binary files.

Setting up the Custom BLAST files through Geneious Prime

Geneious Prime provides a download manager to help you download and extract the Custom BLAST files. To use it, go to Tools → Add/Remove Databases → Set Up BLAST Services and select Custom BLAST from the Service drop-down box. Make sure Let Geneious do the setup is checked. Then click 'OK'. After a few seconds the compressed file containing all the files needed to run Custom BLAST will start downloading. You can click 'Pause' to pause the download. You can add and search Custom BLAST databases as soon as it has finished downloading and extracting. If you shut down Geneious with the file partially downloaded, you will need to start downloading it again from the beginning.

custom BLAST setup

Setting up the Custom BLAST files yourself

It is also possible to manually install the NCBI BLAST+ binary files. You can download the latest version of Blast+ from here:

https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Choose the appropriate installer for your operating system, download and extract it and install Blast+ an appropriate location on your computer.

You will then need to let Geneious know the location of the Blast+ installation. To do this, go to menu Tools → Add/Remove Databases → Set Up BLAST Services and set Service: to Custom BLAST. Enter your data location or click Browse to point Geneious to the location of the Blast+ folder. Uncheck the option to Let Geneious do the setup and click OK. Geneious will now use your manually installed Blast+ executables.

Adding Databases

Now that you have set up the executables, it is time to add databases to your BLAST.

Creating a database from local documents

To create a BLAST database from sequences in your local documents folders, first select the documents in Geneious that you want to use. Then go to Tools → Add/Remove Databases → Add BLAST Database and select Custom BLAST from the Service drop-down box. Enter a name for the database, and click 'OK'.

custom BLAST

Creating a database from a fasta file

To create a database from the sequences in a FASTA file, go to Tools → Add/Remove Data-bases → Add BLAST Database and select Custom BLAST from the Service drop-down box. Choose to Create from file on disk and then click Browse to navigate to the FASTA file that contains the sequences you want to BLAST. Enter a name for the database and click 'OK'. There are two requirements for a FASTA file to be suitable for creating a database from:

  • The FASTA file must contain only the same types of sequence (i.e. Nucleotide or Amino Acid)

  • The sequences in the FASTA file must all have unique names

If the file meets these requirements it will be added as a database, otherwise you will be informed of the problem.

Using Custom BLAST

Once you have added one or more databases, they will appear under Custom BLAST in the BLAST database drop down. These can be used in exactly the same way as the NCBI BLAST ones.

custom BLAST