De novo assembly analysis mode - eresearchqut/ontvisc GitHub Wiki

You can run a de novo assembly using either Flye or Canu.

If the data analysed was derived using RACE reactions, a final primer check can be performed after the de novo assembly step using the --final_primer_check option. The pipeline will check for the presence of any residual universal RACE primers at the end of the assembled contigs.

  • Canu (--canu):

    Canu options can be specified using the --canu_options parameter. If you do not know the size of your targetted genome, you can ommit the --canu_genome_size [genome size of target virus]. However, if your sample is likely to contain a lot of plant RNA/DNA material, we recommend providing an approximate genome size. For instance RNA viruses are on average 10 kb in size (see Holmes 2009), which would correspond to --canu_genome_size 0.01m.

    By default the pipeline will perform an homology search against the contigs generated using NCBI nt. Alternatively, you can perform an homology search against a viral genome reference (using --blast_vs_ref) or a viral database --blast_mode localdb.

    Example:

    # Check for the presence of adapters
    # Perform de novo assembly with Canu
    # Blast the resulting contigs to a reference.
    nextflow run eresearchqut/ontvisc -resume -profile {singularity, docker} \
                                --adapter_trimming \
                                --analysis_mode denovo_assembly --canu \
                                --canu_options 'useGrid=false' \
                                --canu_genome_size 0.01m \
                                --blast_vs_ref  \
                                --reference /path/to/reference/reference.fasta
    
  • Flye (--flye):

    The running mode for Flye can be specified using --flye_mode '[mode]'. Since Flye was primarily developed to run on uncorrected reads, the mode is set by default to --flye_mode 'nano-raw' in the nextflow.config file, for regular ONT reads, pre-Guppy5 (ie <20% error). Alternatively, you can specify the nano-corr mode for ONT reads that were corrected with other methods (ie <3% error) and the nano-hq mode for ONT high-quality reads: Guppy5+ SUP or Q20 (ie <5% error).

    If additional flye parameters are required, list them under the --flye_options parameter. Please refer to the Flye manual for available options.
    For instance, use --genome-size [genome size of target virus] to specify the estimated genome size (e.g. 0.01m), --meta for metagenome samples with uneven coverage, --min-overlap to specify a minimum overlap between reads (automatically derived by default).

    Example:

    # Perform de novo assembly with Flye
    # Blast the resulting contigs to a reference.
    nextflow run eresearchqut/ontvisc -resume -profile {singularity, docker} \
                                --analysis_mode denovo_assembly --flye \
                                --flye_options '--genome-size 0.01m --meta' \
                                --flye_mode 'nano-raw' \
                                --blast_threads 8 \
                                --blastn_db /path/to/ncbi_blast_db/nt