blobtools create - genomehubs/blobtoolkit GitHub Wiki
The minimum requirement to create a new dataset with blobtools is an assembly FASTA file. This is enough to create a new BlobDir directory containing a collection of JSON files in a BlobDir directory. Further data can be added as part of the blobtools create command or using blobtools add. The BlobDir format can be processed using blobtools filter or visualised using blobtools view and in the interactive BlobToolKit Viewer.
-
blobtools createis a synonym forblobtools addintended for use when creating a new dataset
Add data to a BlobDir.
Usage:
blobtools add [--bed BED...] [--beddir DIRECTORY] [--bedtsv TSV...] [--bedtsvdir DIRECTORY]
[--busco TSV...] [--cov BAM...] [--hits TSV...] [--fasta FASTA] [--hits-cols LIST]
[--key path=value...] [--link path=url...] [--taxid INT] [--skip-link-test]
[--blobdb JSON] [--meta YAML] [--synonyms TSV...] [--trnascan TSV...]
[--text TXT...] [--text-delimiter STRING] [--text-cols LIST] [--text-header]
[--text-no-array] [--taxdump DIRECTORY] [--taxrule bestsum|bestsumorder[=prefix]]
[--threads INT] [--evalue NUMBER] [--bitscore NUMBER] [--hit-count INT]
[--update-plot] [--pileup-args key=value...] [--create] [--replace] DIRECTORY
Arguments:
DIRECTORY Existing Blob directory.
Options:
--bed BED BED format file.
--beddir DIRECTORY Directory containing one or more BED format files.
--bedtsv TSV TSV file with header row and bed-format columns 1-3.
--bedtsvdir DIRECTORY Directory containing one or more BED-like tsv files.
--busco TSV BUSCO full_table.tsv output file.
--cov BAM BAM/SAM/CRAM read alignment file.
--fasta FASTA FASTA sequence file.
--hits TSV Tabular BLAST/Diamond output file.
--hits-cols LIST Comma separated list of <column number>=<field name>.
[Default: 1=qseqid,2=staxids,3=bitscore,5=sseqid,10=qstart,11=qend,14=evalue]
--taxid INT Add ranks to metadata for a taxid.
--key path=value Set a metadata key to value.
--link path=URL Link to an external resource.
--skip-link-test Skip test to see if link URL can be resolved.
--meta YAML Dataset metadata.
--blobdb JSON Blobtools v1 blobDB.
--synonyms TSV TSV file containing current identifiers and synonyms.
--taxdump DIRECTORY Location of NCBI new_taxdump directory.
--taxrule rulename[=prefix]
Rule to use when assigning BLAST hits to taxa (bestsum, bestsumorder,
bestdistsum, bestdistsumorder, blastp).
An alternate prefix may be specified. [Default: bestsumorder]
--threads INT Number of threads to use for multithreaded tasks. [Default: 1]
--evalue FLOAT Set evalue cutoff when parsing hits file. [Default: 1]
--bitscore FLOAT Set bitscore cutoff when parsing hits file. [Default: 1]
--hit-count INT Number of hits to parse when inferring taxonomy. [Default: 10]
--update-plot Flag to use new taxrule as default category.
--text TXT Generic text file.
--text-delimiter STRING
Text file delimiter. [Default: whitespace]
--text-cols LIST Comma separated list of <column number>[=<field name>].
--text-header Flag to indicate first row of text file contains field names.
--text-no-array Flag to prevent fields in files with duplicate identifiers being
loaded as array fields.
--trnascan TSV tRNAscan2-SE output
--pileup-args key=val Key/value pairs to pass to samtools pileup.
--create Create a new BlobDir.
--replace Replace existing fields with matching ids.
Examples:
# 1. Add BUSCO scores to BlobDir
blobtools add --busco busco.full_table.tsv BlobDir