Additional data can be added to an existing BlobDir by parsing analysis output files into one or more fields using the blobtools add command. This command can also be used to add metadata including links to external resources and full taxonomic information to a dataset. Currently supported analyses outputs include BLAST/Diamond sequence similarity searches, BAM/SAM/CRAM read mappings and BUSCO genome completeness assessments. Parsers are implemented as Python modules that convert the data to one of several generic datatypes (identifier, variable, category, array, array of arrays) so new analyses can be supported by adding an appropriate parser. The blobtools replace command calls blobtools add --replace to allow fields to be updated.

Command line

blobtools create is a synonym for blobtools add intended for use when creating a new dataset
blobtools replace calls blobtools add --replace to allow fields to be updated

Add data to a BlobDir.

Usage:
    blobtools add [--bed BED...] [--beddir DIRECTORY] [--bedtsv TSV...] [--bedtsvdir DIRECTORY]
                  [--busco TSV...] [--cov BAM...] [--hits TSV...] [--fasta FASTA] [--hits-cols LIST]
                  [--key path=value...] [--link path=url...] [--taxid INT] [--skip-link-test]
                  [--blobdb JSON] [--meta YAML] [--synonyms TSV...] [--trnascan TSV...]
                  [--text TXT...] [--text-delimiter STRING] [--text-cols LIST] [--text-header]
                  [--text-no-array] [--taxdump DIRECTORY] [--taxrule bestsum|bestsumorder[=prefix]]
                  [--threads INT] [--evalue NUMBER] [--bitscore NUMBER] [--hit-count INT]
                  [--update-plot] [--pileup-args key=value...] [--create] [--replace] DIRECTORY

Arguments:
    DIRECTORY             Existing Blob directory.

Options:
    --bed BED             BED format file.
    --beddir DIRECTORY    Directory containing one or more BED format files.
    --bedtsv TSV          TSV file with header row and bed-format columns 1-3.
    --bedtsvdir DIRECTORY Directory containing one or more BED-like tsv files.
    --busco TSV           BUSCO full_table.tsv output file.
    --cov BAM             BAM/SAM/CRAM read alignment file.
    --fasta FASTA         FASTA sequence file.
    --hits TSV            Tabular BLAST/Diamond output file.
    --hits-cols LIST      Comma separated list of <column number>=<field name>.
                          [Default: 1=qseqid,2=staxids,3=bitscore,5=sseqid,10=qstart,11=qend,14=evalue]
    --taxid INT           Add ranks to metadata for a taxid.
    --key path=value      Set a metadata key to value.
    --link path=URL       Link to an external resource.
    --skip-link-test      Skip test to see if link URL can be resolved.
    --meta YAML           Dataset metadata.
    --blobdb JSON         Blobtools v1 blobDB.
    --synonyms TSV        TSV file containing current identifiers and synonyms.
    --taxdump DIRECTORY   Location of NCBI new_taxdump directory.
    --taxrule rulename[=prefix]
                          Rule to use when assigning BLAST hits to taxa (bestsum, bestsumorder,
                          bestdistsum, bestdistsumorder, blastp).
                          An alternate prefix may be specified. [Default: bestsumorder]
    --threads INT         Number of threads to use for multithreaded tasks. [Default: 1]
    --evalue FLOAT        Set evalue cutoff when parsing hits file. [Default: 1]
    --bitscore FLOAT      Set bitscore cutoff when parsing hits file. [Default: 1]
    --hit-count INT       Number of hits to parse when inferring taxonomy. [Default: 10]
    --update-plot         Flag to use new taxrule as default category.
    --text TXT            Generic text file.
    --text-delimiter STRING
                          Text file delimiter. [Default: whitespace]
    --text-cols LIST      Comma separated list of <column number>[=<field name>].
    --text-header         Flag to indicate first row of text file contains field names.
    --text-no-array       Flag to prevent fields in files with duplicate identifiers being
                          loaded as array fields.
    --trnascan TSV        tRNAscan2-SE output
    --pileup-args key=val Key/value pairs to pass to samtools pileup.
    --create              Create a new BlobDir.
    --replace             Replace existing fields with matching ids.

Examples:
    # 1. Add BUSCO scores to BlobDir
    blobtools add --busco busco.full_table.tsv BlobDir

blobtools add - genomehubs/blobtoolkit GitHub Wiki

Command line

⚠️ GitHub.com Fallback ⚠️

blobtools add - genomehubs/blobtoolkit GitHub Wiki

Command line

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️