blobtools create - genomehubs/blobtoolkit GitHub Wiki
The minimum requirement to create a new dataset with blobtools
is an assembly FASTA file. This is enough to create a new BlobDir directory containing a collection of JSON files in a BlobDir directory. Further data can be added as part of the blobtools create
command or using blobtools add
. The BlobDir format can be processed using blobtools filter
or visualised using blobtools view
and in the interactive BlobToolKit Viewer.
-
blobtools create
is a synonym forblobtools add
intended for use when creating a new dataset
Add data to a BlobDir.
Usage:
blobtools add [--bed BED...] [--beddir DIRECTORY] [--bedtsv TSV...] [--bedtsvdir DIRECTORY]
[--busco TSV...] [--cov BAM...] [--hits TSV...] [--fasta FASTA] [--hits-cols LIST]
[--key path=value...] [--link path=url...] [--taxid INT] [--skip-link-test]
[--blobdb JSON] [--meta YAML] [--synonyms TSV...] [--trnascan TSV...]
[--text TXT...] [--text-delimiter STRING] [--text-cols LIST] [--text-header]
[--text-no-array] [--taxdump DIRECTORY] [--taxrule bestsum|bestsumorder[=prefix]]
[--threads INT] [--evalue NUMBER] [--bitscore NUMBER] [--hit-count INT]
[--update-plot] [--pileup-args key=value...] [--create] [--replace] DIRECTORY
Arguments:
DIRECTORY Existing Blob directory.
Options:
--bed BED BED format file.
--beddir DIRECTORY Directory containing one or more BED format files.
--bedtsv TSV TSV file with header row and bed-format columns 1-3.
--bedtsvdir DIRECTORY Directory containing one or more BED-like tsv files.
--busco TSV BUSCO full_table.tsv output file.
--cov BAM BAM/SAM/CRAM read alignment file.
--fasta FASTA FASTA sequence file.
--hits TSV Tabular BLAST/Diamond output file.
--hits-cols LIST Comma separated list of <column number>=<field name>.
[Default: 1=qseqid,2=staxids,3=bitscore,5=sseqid,10=qstart,11=qend,14=evalue]
--taxid INT Add ranks to metadata for a taxid.
--key path=value Set a metadata key to value.
--link path=URL Link to an external resource.
--skip-link-test Skip test to see if link URL can be resolved.
--meta YAML Dataset metadata.
--blobdb JSON Blobtools v1 blobDB.
--synonyms TSV TSV file containing current identifiers and synonyms.
--taxdump DIRECTORY Location of NCBI new_taxdump directory.
--taxrule rulename[=prefix]
Rule to use when assigning BLAST hits to taxa (bestsum, bestsumorder,
bestdistsum, bestdistsumorder, blastp).
An alternate prefix may be specified. [Default: bestsumorder]
--threads INT Number of threads to use for multithreaded tasks. [Default: 1]
--evalue FLOAT Set evalue cutoff when parsing hits file. [Default: 1]
--bitscore FLOAT Set bitscore cutoff when parsing hits file. [Default: 1]
--hit-count INT Number of hits to parse when inferring taxonomy. [Default: 10]
--update-plot Flag to use new taxrule as default category.
--text TXT Generic text file.
--text-delimiter STRING
Text file delimiter. [Default: whitespace]
--text-cols LIST Comma separated list of <column number>[=<field name>].
--text-header Flag to indicate first row of text file contains field names.
--text-no-array Flag to prevent fields in files with duplicate identifiers being
loaded as array fields.
--trnascan TSV tRNAscan2-SE output
--pileup-args key=val Key/value pairs to pass to samtools pileup.
--create Create a new BlobDir.
--replace Replace existing fields with matching ids.
Examples:
# 1. Add BUSCO scores to BlobDir
blobtools add --busco busco.full_table.tsv BlobDir