Usage - Saskia-Oosterbroek/decona GitHub Wiki

Usage

Decona works on all fastq files in your working directory. It is a good idea to have an empty directory with just the files you want to run. A results folder will appear in your working directory after a successful run. Example

# Example 1:
$ decona -d -l 800 -m 1200 -q 10 -c 0.80 -n 100 -M 
# Example 2:
$ decona -n 50 -M -r -B North_Sea_fish.fasta

Example 1 will: Demultiplex, filter for read length 800-1200 bp and quality score 10, cluster reads at 80% ID, make consensuses of clusters larger than 100 sequences, polish with Medaka.

Example 2 will: filter for (defaults) quality score 10 and read lengths longer than 300. Cluster at (default) 80% read identity. Create consensus sequences from clusters that are larger than 50 sequences. Recluster consensus sequences at 99% ID, original data is used to form new clusters. Consensus sequences are BLASTed against "North_Sea_fish.fasta" for which a new blast database is created.

Command Function
-h help
-v version
-T multithreading default 4
-p plot read length distribution histogram. Not sure what your average read length is? Try this: $ decona -p (plots then exits program)
-f folder structure: your fastq files are already demultiplexed and stored in barcode folders such as data already demultiplexed by MinION Mk1C.
Filtering:
-d demultiplex samples
-q quality score (default 10)
-l minimum length (default 300, this is also the absolute minimum. Decona was not designed for shorter reads.)
-m maximum length
Clustering
-c clustering percentage, 0.8 = 80% identity
-w clustering word length (default 5 ) [ -n 7 for thresholds 0.88 ~ 0.9 / -n 6 for thresholds 0.85 ~ 0.88 / -n 5 for thresholds 0.80 ~ 0.85
-n cluster size: minimum amount of reads in a cluster to continue to consensus step (default 100)
-i gives info about % sequences assigned to clusters
-r re-cluster consensus sequences (use a second round of clustering). It may happen that multiple clusters will arise containing one species. Reclustering will cluster the original fasta's based on the polished result at 99%. This may be especially important if you would like to do variant calling.
-g clustering algorithm: 1 or 0, default 1.
If set to 1, the program will cluster reads into the most similar cluster that meets the threshold (accurate but slow mode)
If set to 0 a sequence is clustered to the first cluster that meets the threshold (fast cluster)
Polishing
-M polish consensus sequences with Medaka
-s snip/variant calling with Medaka
BLAST
-B yourblastdatabase.fasta, fasta file can be used as blast database. (NCBI BLAST+ tool needs to be installed!
-b /path/to/existing/blast/database/existing-data-base-file.fasta : Use this option if you already have a BLAST+ database on your system