De novo assembly analysis mode - eresearchqut/ontvisc GitHub Wiki
You can run a de novo assembly using either Flye or Canu.
If the data analysed was derived using RACE reactions, a final primer check can be performed after the de novo assembly step using the --final_primer_check
option. The pipeline will check for the presence of any residual universal RACE primers at the end of the assembled contigs.
-
Canu
(--canu):Canu options can be specified using the
--canu_options
parameter. If you do not know the size of your targetted genome, you can ommit the--canu_genome_size [genome size of target virus]
. However, if your sample is likely to contain a lot of plant RNA/DNA material, we recommend providing an approximate genome size. For instance RNA viruses are on average 10 kb in size (seeHolmes 2009
), which would correspond to--canu_genome_size 0.01m
.By default the pipeline will perform an homology search against the contigs generated using NCBI nt. Alternatively, you can perform an homology search against a viral genome reference (using
--blast_vs_ref
) or a viral database--blast_mode localdb
.Example:
# Check for the presence of adapters # Perform de novo assembly with Canu # Blast the resulting contigs to a reference. nextflow run eresearchqut/ontvisc -resume -profile {singularity, docker} \ --adapter_trimming \ --analysis_mode denovo_assembly --canu \ --canu_options 'useGrid=false' \ --canu_genome_size 0.01m \ --blast_vs_ref \ --reference /path/to/reference/reference.fasta
-
Flye
(--flye):The running mode for Flye can be specified using
--flye_mode '[mode]'
. Since Flye was primarily developed to run on uncorrected reads, the mode is set by default to--flye_mode 'nano-raw'
in the nextflow.config file, for regular ONT reads, pre-Guppy5 (ie <20% error). Alternatively, you can specify thenano-corr
mode for ONT reads that were corrected with other methods (ie <3% error) and thenano-hq
mode for ONT high-quality reads: Guppy5+ SUP or Q20 (ie <5% error).If additional flye parameters are required, list them under the
--flye_options
parameter. Please refer to theFlye manual
for available options.
For instance, use--genome-size [genome size of target virus]
to specify the estimated genome size (e.g. 0.01m),--meta
for metagenome samples with uneven coverage,--min-overlap
to specify a minimum overlap between reads (automatically derived by default).Example:
# Perform de novo assembly with Flye # Blast the resulting contigs to a reference. nextflow run eresearchqut/ontvisc -resume -profile {singularity, docker} \ --analysis_mode denovo_assembly --flye \ --flye_options '--genome-size 0.01m --meta' \ --flye_mode 'nano-raw' \ --blast_threads 8 \ --blastn_db /path/to/ncbi_blast_db/nt