Error from Vector NTI parser in multiple sequence GenBank File
Hello,
I am getting an error when trying to open a GenBank file that contains multiple sequences.
The file is being created on the fly by selecting different GenBank files from our LIMS database.
Some of the GenBank files have been created in Vector NTI and I am having a problem when a Vector NTI sequence is included in the combined GenBank file.
I have created a simple example (removing sequence details and comments) that still generates the problem I am seeing
LOCUS test_seq1 1 bp DNA circular SYN 15-JUN-2021
DEFINITION -.
ACCESSION -
KEYWORDS -.
SOURCE -.
ORGANISM
COMMENT This file is created by Vector NTI
http://www.invitrogen.com/
COMMENT ORIGDB|GenBank
COMMENT LSOWNER|
COMMENT VNTNAME|ABC1.1|
FEATURES Location/Qualifiers
ORIGIN
1 g
//
LOCUS test_seq2 1 bp DNA circular UNA 15-JUN-2021
DEFINITION
ACCESSION
VERSION
KEYWORDS .
SOURCE
ORGANISM .
ORIGIN
1 t
//
LOCUS test_seq3 1 bp DNA circular UNA 15-JUN-2021
DEFINITION
ACCESSION
VERSION
KEYWORDS .
SOURCE
ORGANISM .
ORIGIN
1 a
//
If I try and open the above genbank file containing 3 sequences, the first sequence loads and then I get the above error.
There is not an error if none of the included sequences have Vector NTI comments and not an error if all of the sequences have Vector NTI comments. The error seems to only occur when there is a mixture of sequences from different origins.
If I am correct in my theory what would you recommend to solve this issue?
Either removing comments from the Vector NTI sequences or adding comments to the non Vector NTI sequences so that they can be loaded from the same file?
Thanks very much,
James
The technical details from the error are below
com.biomatters.geneious.publicapi.plugin.DocumentImportException
at com.biomatters.iseek.plugin.fileimport.D.a(ImportExceptionData.java:84)
at com.biomatters.iseek.plugin.fileimport.E.a(Importage.java:266)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:683)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:650)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:535)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:388)
at com.biomatters.iseek.plugin.fileimport.v.run(FileImporterManager.java:309)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.biomatters.geneious.publicapi.plugin.DocumentImportException: reached end of file inside author information
at com.biomatters.iseek.plugin.fileimport.DocumentAggregatingImportCallback.a(DocumentAggregatingImportCallback.java:415)
at com.biomatters.iseek.plugin.fileimport.E.a(Importage.java:149)
... 6 more
Caused by: com.biomatters.geneious.publicapi.plugin.DocumentImportException: reached end of file inside author information
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.a.a(AuthorParser.java:119)
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.a.a(AuthorParser.java:102)
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.e.(VectorNtiCommentsParser.java:59)
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.VectorNtiSequenceImporter.importDocuments(VectorNtiSequenceImporter.java:80)
at com.biomatters.geneious.publicapi.plugin.DocumentFileImporter.importDocuments(DocumentFileImporter.java:311)
at com.biomatters.iseek.plugin.fileimport.DocumentAggregatingImportCallback.a(DocumentAggregatingImportCallback.java:404)
... 7 more
-
Official comment
Hi James,
you're right, some mixes of VNTI & Genbank style gb files can cause problems.
I'm afraid that with how your particular file looks like it would not be safe to add VNTI Comments into all sequences, because if those comment blocks are not complete or contain duplicate data the parser might not work properly.
It will work more reliably to remove any line that starts with either of these:COMMENT Vector_NTI_Display_Data[...]
COMMENT VNTI[...]
Also, if the first sequence has a FEATURE section (and no COMMENT VNTI), it should recognise it as non-VNTI Genbank format and import it correctly.
Hope that helps. -
Great I'll give stripping out those VNTI lines before loading a go.
Thanks very much for your help Jonas
0 -
Hi,
I have got a couple of follow up questions related to the VNTI comments that I hope you can help me with.
1) Are you planning a fix that will allow files containing a mix of VNTI and non VNTI sequences to be parsed by Geneious?
2) What do you use the VNTI comments for when you parse a VNTI file?
3) What would be the consequence of removing all the VNTI comments from a file? Is there any information or functionality that would be lost once it is loaded in Geneious?
Thanks very much,
James
0 -
Hi James,
Could you let us know where you got your VNTI and Genbank files from? The two examples that you gave both look like they're incomplete / some elements are missing that we would assume should be present in all VNTI or non-VNTI files.
(Feel free to contact our support team (support@geneious.com) with more details and some full files if you don't want to upload them into the forum)If VNTI & non-VNTI files are 'complete' according to what we have observed in the past, then sequence mixes should be processed properly.
The latest version of Geneious only parses the information present inCOMMENT VNTI[...]
and sets those as metadata on the imported sequences. These might contain some user defined fields (VNTUDF) and a few pre-defined fields (Author, Dates, Address, ...).
Removing those comments will remove that metadata, but otherwise the results should be the same.
0
Please sign in to leave a comment.
Comments
4 comments