blobtools add - genomehubs/blobtoolkit GitHub Wiki
Additional data can be added to an existing BlobDir by parsing analysis output files into one or more fields using the blobtools add
command. This command can also be used to add metadata including links to external resources and full taxonomic information to a dataset. Currently supported analyses outputs include BLAST/Diamond sequence similarity searches, BAM/SAM/CRAM read mappings and BUSCO genome completeness assessments. Parsers are implemented as Python modules that convert the data to one of several generic datatypes (identifier, variable, category, array, array of arrays) so new analyses can be supported by adding an appropriate parser. The blobtools replace
command calls blobtools add --replace
to allow fields to be updated.
-
blobtools create
is a synonym forblobtools add
intended for use when creating a new dataset -
blobtools replace
callsblobtools add --replace
to allow fields to be updated
Add data to a BlobDir.
Usage:
blobtools add [--bed BED...] [--beddir DIRECTORY] [--bedtsv TSV...] [--bedtsvdir DIRECTORY]
[--busco TSV...] [--cov BAM...] [--hits TSV...] [--fasta FASTA] [--hits-cols LIST]
[--key path=value...] [--link path=url...] [--taxid INT] [--skip-link-test]
[--blobdb JSON] [--meta YAML] [--synonyms TSV...] [--trnascan TSV...]
[--text TXT...] [--text-delimiter STRING] [--text-cols LIST] [--text-header]
[--text-no-array] [--taxdump DIRECTORY] [--taxrule bestsum|bestsumorder[=prefix]]
[--threads INT] [--evalue NUMBER] [--bitscore NUMBER] [--hit-count INT]
[--update-plot] [--pileup-args key=value...] [--create] [--replace] DIRECTORY
Arguments:
DIRECTORY Existing Blob directory.
Options:
--bed BED BED format file.
--beddir DIRECTORY Directory containing one or more BED format files.
--bedtsv TSV TSV file with header row and bed-format columns 1-3.
--bedtsvdir DIRECTORY Directory containing one or more BED-like tsv files.
--busco TSV BUSCO full_table.tsv output file.
--cov BAM BAM/SAM/CRAM read alignment file.
--fasta FASTA FASTA sequence file.
--hits TSV Tabular BLAST/Diamond output file.
--hits-cols LIST Comma separated list of <column number>=<field name>.
[Default: 1=qseqid,2=staxids,3=bitscore,5=sseqid,10=qstart,11=qend,14=evalue]
--taxid INT Add ranks to metadata for a taxid.
--key path=value Set a metadata key to value.
--link path=URL Link to an external resource.
--skip-link-test Skip test to see if link URL can be resolved.
--meta YAML Dataset metadata.
--blobdb JSON Blobtools v1 blobDB.
--synonyms TSV TSV file containing current identifiers and synonyms.
--taxdump DIRECTORY Location of NCBI new_taxdump directory.
--taxrule rulename[=prefix]
Rule to use when assigning BLAST hits to taxa (bestsum, bestsumorder,
bestdistsum, bestdistsumorder, blastp).
An alternate prefix may be specified. [Default: bestsumorder]
--threads INT Number of threads to use for multithreaded tasks. [Default: 1]
--evalue FLOAT Set evalue cutoff when parsing hits file. [Default: 1]
--bitscore FLOAT Set bitscore cutoff when parsing hits file. [Default: 1]
--hit-count INT Number of hits to parse when inferring taxonomy. [Default: 10]
--update-plot Flag to use new taxrule as default category.
--text TXT Generic text file.
--text-delimiter STRING
Text file delimiter. [Default: whitespace]
--text-cols LIST Comma separated list of <column number>[=<field name>].
--text-header Flag to indicate first row of text file contains field names.
--text-no-array Flag to prevent fields in files with duplicate identifiers being
loaded as array fields.
--trnascan TSV tRNAscan2-SE output
--pileup-args key=val Key/value pairs to pass to samtools pileup.
--create Create a new BlobDir.
--replace Replace existing fields with matching ids.
Examples:
# 1. Add BUSCO scores to BlobDir
blobtools add --busco busco.full_table.tsv BlobDir