Post

2 followers Follow
0
Avatar

Error from Vector NTI parser in multiple sequence GenBank File

Hello,

I am getting an error when trying to open a GenBank file that contains multiple sequences.

The file is being created on the fly by selecting different GenBank files from our LIMS database.

Some of the GenBank files have been created in Vector NTI and I am having a problem when a Vector NTI sequence is included in the combined GenBank file.

I have created a simple example (removing sequence details and comments) that still generates the problem I am seeing

LOCUS       test_seq1        1 bp    DNA     circular SYN 15-JUN-2021
DEFINITION -.
ACCESSION -
KEYWORDS -.
SOURCE -.
ORGANISM
COMMENT This file is created by Vector NTI
http://www.invitrogen.com/
COMMENT ORIGDB|GenBank
COMMENT LSOWNER|
COMMENT VNTNAME|ABC1.1|
FEATURES Location/Qualifiers
ORIGIN
1 g
//
LOCUS test_seq2 1 bp DNA circular UNA 15-JUN-2021
DEFINITION
ACCESSION
VERSION
KEYWORDS .
SOURCE
ORGANISM .
ORIGIN
1 t
//
LOCUS test_seq3 1 bp DNA circular UNA 15-JUN-2021
DEFINITION
ACCESSION
VERSION
KEYWORDS .
SOURCE
ORGANISM .
ORIGIN
1 a
//

If I try and open the above genbank file containing 3 sequences, the first sequence loads and then I get the above error.

There is not an error if none of the included sequences have Vector NTI comments and not an error if all of the sequences have Vector NTI comments. The error seems to only occur when there is a mixture of sequences from different origins.

If I am correct in my theory what would you recommend to solve this issue?

Either removing comments from the Vector NTI sequences or adding comments to the non Vector NTI sequences so that they can be loaded from the same file?

Thanks very much,

James

The technical details from the error are below

com.biomatters.geneious.publicapi.plugin.DocumentImportException
at com.biomatters.iseek.plugin.fileimport.D.a(ImportExceptionData.java:84)
at com.biomatters.iseek.plugin.fileimport.E.a(Importage.java:266)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:683)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:650)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:535)
at com.biomatters.iseek.plugin.fileimport.FileImporterManager.a(FileImporterManager.java:388)
at com.biomatters.iseek.plugin.fileimport.v.run(FileImporterManager.java:309)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.biomatters.geneious.publicapi.plugin.DocumentImportException: reached end of file inside author information
at com.biomatters.iseek.plugin.fileimport.DocumentAggregatingImportCallback.a(DocumentAggregatingImportCallback.java:415)
at com.biomatters.iseek.plugin.fileimport.E.a(Importage.java:149)
... 6 more
Caused by: com.biomatters.geneious.publicapi.plugin.DocumentImportException: reached end of file inside author information
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.a.a(AuthorParser.java:119)
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.a.a(AuthorParser.java:102)
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.e.(VectorNtiCommentsParser.java:59)
at com.biomatters.plugins.fileimportexport.vectorntiimporter.sequenceImporter.VectorNtiSequenceImporter.importDocuments(VectorNtiSequenceImporter.java:80)
at com.biomatters.geneious.publicapi.plugin.DocumentFileImporter.importDocuments(DocumentFileImporter.java:311)
at com.biomatters.iseek.plugin.fileimport.DocumentAggregatingImportCallback.a(DocumentAggregatingImportCallback.java:404)
... 7 more

 

James Morris

Official comment

Avatar

Hi James, 

you're right, some mixes of VNTI & Genbank style gb files can cause problems.
I'm afraid that with how your particular file looks like it would not be safe to add VNTI Comments into all sequences, because if those comment blocks are not complete or contain duplicate data the parser might not work properly.

It will work more reliably to remove any line that starts with either of these:

COMMENT     Vector_NTI_Display_Data[...]
COMMENT     VNTI[...]

 

Also, if the first sequence has a FEATURE section (and no COMMENT VNTI), it should recognise it as non-VNTI Genbank format and import it correctly.

Hope that helps.

Jonas Kuhn
Comment actions Permalink

Please sign in to leave a comment.

4 comments

0
Avatar

Great I'll give stripping out those VNTI lines before loading a go.

Thanks very much for your help Jonas

 

James Morris 0 votes
Comment actions Permalink
0
Avatar

Hi,

I have got a couple of follow up questions related to the VNTI comments that I hope you can help me with.

1) Are you planning a fix that will allow files containing a mix of VNTI and non VNTI sequences to be parsed by Geneious?

2) What do you use the VNTI comments for when you parse a VNTI file?

3) What would be the consequence of removing all the VNTI comments from a file? Is there any information or functionality that would be lost once it is loaded in Geneious?

Thanks very much,

James

James Morris 0 votes
Comment actions Permalink
0
Avatar

Hi James,

Could you let us know where you got your VNTI and Genbank  files from? The two examples that you gave both look like they're incomplete / some elements are missing that we would assume should be present in all VNTI or non-VNTI files. 
(Feel free to contact our support team (support@geneious.com) with more details and some full files if you don't want to upload them into the forum)

If VNTI & non-VNTI files are 'complete' according to what we have observed in the past, then sequence mixes should be processed properly.

The latest version of Geneious only parses the information present in

COMMENT     VNTI[...]

and sets those as metadata on the imported sequences. These might contain some user defined fields (VNTUDF) and a few pre-defined fields (Author, Dates, Address, ...).

Removing those comments will remove that metadata, but otherwise the results should be the same.

Jonas Kuhn 0 votes
Comment actions Permalink