Usage - Saskia-Oosterbroek/decona GitHub Wiki
Usage
Decona works on all fastq files in your working directory. It is a good idea to have an empty directory with just the files you want to run. A results folder will appear in your working directory after a successful run. Example
# Example 1:
$ decona -d -l 800 -m 1200 -q 10 -c 0.80 -n 100 -M
# Example 2:
$ decona -n 50 -M -r -B North_Sea_fish.fasta
Example 1 will: Demultiplex, filter for read length 800-1200 bp and quality score 10, cluster reads at 80% ID, make consensuses of clusters larger than 100 sequences, polish with Medaka.
Example 2 will: filter for (defaults) quality score 10 and read lengths longer than 300. Cluster at (default) 80% read identity. Create consensus sequences from clusters that are larger than 50 sequences. Recluster consensus sequences at 99% ID, original data is used to form new clusters. Consensus sequences are BLASTed against "North_Sea_fish.fasta" for which a new blast database is created.
Command | Function |
---|---|
-h | help |
-v | version |
-T | multithreading default 4 |
-p | plot read length distribution histogram. Not sure what your average read length is? Try this: $ decona -p (plots then exits program) |
-f | folder structure: your fastq files are already demultiplexed and stored in barcode folders such as data already demultiplexed by MinION Mk1C. |
Filtering: | |
-d | demultiplex samples |
-q | quality score (default 10) |
-l | minimum length (default 300, this is also the absolute minimum. Decona was not designed for shorter reads.) |
-m | maximum length |
Clustering | |
-c | clustering percentage, 0.8 = 80% identity |
-w | clustering word length (default 5 ) [ -n 7 for thresholds 0.88 ~ 0.9 / -n 6 for thresholds 0.85 ~ 0.88 / -n 5 for thresholds 0.80 ~ 0.85 |
-n | cluster size: minimum amount of reads in a cluster to continue to consensus step (default 100) |
-i | gives info about % sequences assigned to clusters |
-r | re-cluster consensus sequences (use a second round of clustering). It may happen that multiple clusters will arise containing one species. Reclustering will cluster the original fasta's based on the polished result at 99%. This may be especially important if you would like to do variant calling. |
-g | clustering algorithm: 1 or 0, default 1. |
If set to 1, the program will cluster reads into the most similar cluster that meets the threshold (accurate but slow mode) | |
If set to 0 a sequence is clustered to the first cluster that meets the threshold (fast cluster) | |
Polishing | |
-M | polish consensus sequences with Medaka |
-s | snip/variant calling with Medaka |
BLAST | |
-B | yourblastdatabase.fasta, fasta file can be used as blast database. (NCBI BLAST+ tool needs to be installed! |
-b | /path/to/existing/blast/database/existing-data-base-file.fasta : Use this option if you already have a BLAST+ database on your system |