Annotating your sequence from a custom annotation database.

October 12, 2023 01:22
Updated

The Annotate from Database tool allows you to automatically annotate one or more sequences by transferring annotated features from other "similar" sequences in your database (Geneious performs a Blast-like comparison to detect similarity).

By default, the Geneious Plasmid Features database will be used as the annotation Source. This database contains a comprehensive list of common plasmid features, including promoters, terminators, tags, rep origins and marker genes.

In addition to this you can also create and use your own annotation databases. We recommend you place your personal annotation databases in the Reference Features folder so they are easy to find and access. The Reference Features folder will always be located at the bottom of your Local folder.

Screen_Shot_2020-11-27_at_8.59.28_AM.png

To create a custom database to "annotate from", create a new folder in Geneious and place in it the sequences you want to use as the source. These sequences can be annotated or unannotated* nucleotide or protein sequences, such as reference genomes downloaded from Genbank, lists of peptides, BLAST hits, or your own previously annotated sequences.

To annotate your sequences from your custom database folder:

Select the sequence(s) you want to annotate and go to the Live Annotate & Predict tab on the right hand side of the sequence view (or alignment view) window, and tick the box next to Annotate From...
Note that if the sequence you want to annotate is a list of more than 100 sequences, you will need to open Annotate from Database via the Annotate and Predict menu.
Click on the folder name next to the label Source:
In the window that appears, select the folder where you placed the sequences you want to annotate from and click OK.
Choosing Best Match allows you to transfer only the best "match". If multiple annotations in the source database (of the same type) overlap with each other in the same region on the target sequence, then only the closest match of these is annotated. All primer annotations covering the same region are always annotated.

5. If necessary, adjust the Similarity slider until you see a preview of the annotations on your sequence. These will appear faded out when they are previewed.

6. Click Apply to add the annotations to your sequences (the annotations should then appear in a darker color). If you only want to apply some of the annotations, select the annotations you want (either directly on the sequence or in the annotations table) before you click Apply.

Screen_Shot_2020-11-27_at_8.56.09_AM.png

Transferred annotations will then appear with a comprehensive selection of Annotation qualifiers, detailing the source of the transferred annotation, and a hyperlink allowing you to view an alignment of the target region and matching annotation.

Screen_Shot_2020-11-27_at_9.06.31_AM.png

Advanced Options

The Advanced option (next to Apply button) enables you to restrict the operation to particular Annotation types, set Index lengths or adjust parameters for CDS boundaries and best matches.

*To use unannotated sequences in the Source folder (Geneious Prime 2019.2 onwards), turn on the option “Unannotated sequences (transferred as Misc Feature type annotations)”. Geneious will then treat sequences without any annotations as though they have an annotation of type ”misc feature” across the full length of the sequence, with the same name as the sequence name, and this will be transferred if there is a match.

The boundaries of CDS annotations can be automatically adjusted to fit the closest open reading frame, if this is within a specified distance of the Source CDS. This option can be configured under Adjust CDS boundaries by up to x bp to match nearest ORF. The Advanced preferences also include an adjustable value that determines the Best match threshold.

Screen_Shot_2020-11-27_at_9.03.42_AM.png

If you are annotating small genomes with features that are longer than 50bp, we recommend setting the Index Length to the maximum value of 15 for nucleotides or 6 for proteins . This will speed up the search on larger sequences. Similarly, if your features are very short (less than 20bp) you may need to adjust the Index length down in order to find the matches.

Annotating nucleotide sequences using a protein sequence database.

If you wish to use a set of protein sequences to annotate your nucleotide sequences, open the Advanced options and ensure that the Protein Sequences option is checked. Your nucleotide query sequence/s will be translated in all 6 frames for comparison to the protein sequences in the Source folder.

Screen_Shot_2020-11-27_at_9.07.44_AM.png

Using blast to annotate your sequences.

If you wish to annotate your genome by BLASTing previously identified ORFs, you can use Annotate from Database to transfer the results of the BLAST back onto your genome. This procedure can also be used if you are annotating a list of nucleotide sequences by BLASTing to a protein database with blastx.

The screenshot below shows a set of ORFs annotated on a mitochondrial genome. To annotate these via BLAST, select all the ORFs (either from the Annotations table or directly on the sequence), and perform a batch BLAST search, returning the matching region with annotations (see bottom screenshot).

Screen_Shot_2020-11-27_at_9.12.57_AM.png

The annotations from the BLAST results cannot be directly transferred back onto the original mitochondrial sequence, as the link between the BLAST result and the original genome is broken by extracting the ORFs during the BLAST process. However, the BLAST result folder can be used as the Source for Annotate from Database.

Select your original genome sequence where your ORFs were annotated, enable Annotate from Database, and select the BLAST result folder as the Source. As the results for each ORF are contained in a subfolder, open the Advanced options and ensure “Include subfolders” is checked. Also, add the Source annotation type to the list of types not to annotate (as this is specific to the BLAST hit and not the query sequence).

Screen_Shot_2020-11-27_at_9.14.11_AM.png

You should then see the annotations from your BLAST hits appear on the sequence. Don’t forget to click Apply to add them to the sequence.