Post

3 followers Follow
0
Avatar

PluginUtilities importDocuments with a stream

Hello,

I am creating a DatabaseService plugin to search and return genbank records from an oracle database.

I am able to search the database and return records in geneious, however the genebank information in the database is stored as a strings.

I can read the records by creating a temporary file using FileUtilities.createTempFile and writing each genbank string to the temp file with FileUtilities.writeTextToFile.

This works but as you can imagine is not a scalable solution.

Is it possible to create an AnnotatedPluginDocument with PluginUtilities.importDocuments using a stream rather than a File?

Thanks,

James

 

 

 

 

 

James Morris

Please sign in to leave a comment.

5 comments

0
Avatar

Hi James,

There isn't currently a method to import documents from a stream rather than a file.

If you could parse the data you need from a GenBank format string you could create a SequenceDocument and add that to your Geneious folder using WritableDatabaseService.addDocumentCopy(), but I'm guessing that the conversion of, e.g., a BioJava RichSequence to a SequenceDocument is more trouble than it's worth.

Have you really found that writing Genbank records to temp files is a huge bottleneck? How many sequences are you moving at once? In practice we find that disk IO isn't so limiting as long as it's done in a background thread.

Jessica 0 votes
Comment actions Permalink
0
Avatar

Hi Jessica,

Thanks very much for your answer.

I have tried returning a DefaultNucleotideSequence object constructed with a list of SequenceAnnotation objects created by parsing a genbank file. This works fine for extracting basic information such as the features start, end and name but to extract all the complex information available in the genbank file is as you say quite a bit of work.

 

The test setup I have using temp files and the importDocuments method works well and is quick enough with queries that return ~100 genbank sequences. I just wonder how this setup will scale in production if a user wanted to browse larger result sets where the number of sequences could be in the thousands.

 

Using a database service plugin is it possible to just return a list of placeholder sequences where the user can just see the name and description and the sequence and features are only loaded when the user clicks on the the sequence?

 

Thanks,

James

 

 

 

 

 

James Morris 0 votes
Comment actions Permalink
0
Avatar

Hi James,

Are you aware that we support various database systems (including Oracle) for Geneious data storage? It might make more sense to import the whole thing once into a Geneious-formatted Oracle (or other SQL) database. However...

We have an SummaryDocument interface that you could implement as the sort of placeholder you're talking about. If you open a big sequence from Genbank in Geneious you usually get a SummaryDocument instance. Take a look at the E. coli genome in the Sample Documents under Genomes/Bacteria (most of the sample genome sequences are SummaryDocuments until you download them).

However, you'd need to implement the behaviour yourself. You might even need to implement your own DocumentViewer, depending on what you want it to look like. SummaryDocuments don't normally download the sequence when users click on them, they require some explicit action from the user (e.g., clicking a "download sequence" button). I believe the idea is that the main idea of using SummaryDocuments is that the original summary document is replaced automatically when you download the full doc. But I haven't ever had to write my own SummaryDocument implementation, so I'm not sure about this, and I don't have much advice right off! 

That said, I think writing a SummaryDocument implementation is probably how I'd approach browsing a database's contents without importing the whole thing. But it's a lot more work than importing the whole database into a Geneious-formatted SQL database.

Jessica

Jessica 0 votes
Comment actions Permalink
0
Avatar

Hello,

As a follow up to your initial response can the ability to import documents from a stream rather than a file be added to the next release of your public API?

On that subject, how often do you release new versions of your public API? And when is the next one due?

Thanks,

James

 

James Morris 0 votes
Comment actions Permalink
0
Avatar

Hi James,

While this is an improvement we would love to make to our API, it's a lot harder for us to implement than it might seem. I'll log a feature request for adding stream support, but i'm afraid we are not likely to add it soon.

A new version of the API is released with every new version of Geneious Prime. We don't have a publicised release schedule but you can get a pretty good idea of our frequency from the Geneious Prime release notes.

Regards,

Richard

Richard Moir 0 votes
Comment actions Permalink