Export document fields to csv
I want to export certain document fields from several sequences/a list of sequences to a csv file.
In the GUI:
I can mark several sequence documents and export the selected fields to one csv file. But when including an export step in a workflow the file gets overwritten for each sequence. I checked “Export all documents to a single file” and selected “Select these documents when the operation completes” in the previous step that generates the sequence documents. Is there any setting I missed?
I also tried to run the export step on a list of sequences instead, but it seems the sequence list does not have the same fields as the individual sequences.
Using custom java code:
I am using the performOperation method. When using one sequence document I can access for instance:
documents.get(0).getFieldValue(DocumentField.AMBIGUITIES)
How can I access this field from all sequences in a sequence list?
-
Official comment
Hello Ida,
It sounds like you are trying to extract fields from selected documents, which may include Sequence Lists as well as individual Sequence documents, via a Workflow.
Step 1: Extract the sequences from the documents. This can be done by adding the step ‘For Each Sequence/Extract Sequences From List’.
Step 2: Collating the sequences so that the next operation includes all of them. This can be done by then adding the step ‘Group Sequences’.
Step 3: Exporting your fields to CSV. This can be done by then adding the Operation ‘Export’, and selecting the option ‘Expose all options’ will allow you to choose the fields you wish to export each time you use the workflow. You can also select default export fields via ‘Options’.
Kind Regards,
Neil (Prime Developer)
-
Hello Neil and thanks for the suggestions.
When using 'Group Sequences' and 'Export' in a workflow I do get a csv file with data from all sequences. The problem is I am getting a file with the following header:Name # Source Sequences % Identical Sites % Pairwise Identity Command (BBDuk) Created Date Description Free end gaps Mean Coverage Modified Output (BBDuk) Read Technology Sample Sequence Sequence List Name Topology URN
But what I selected in the export step was:# Source Sequences, Ambiguities, Mean Coverage, Sample
How can I export these data instead?0 -
Hello Ida,
Thank you for getting back to us again - sorry it has taken us so long to reply.
Unfortunately, it turns out there is a bug in the Export to CSV function that means it always exports every field. This has been fixed upstream, however the fix is not scheduled for release until 2023.1, early next year.
In the meantime, as this does not help you, we are working on a workflow to address this, and will hopefully post it here soon.
Neil (Prime Developer).
0
Please sign in to leave a comment.
Comments
3 comments