ONT Documentation - TheJacksonLaboratory/mmrSVD GitHub Wiki

mmrSVD Oxford Nanopore Data Pipeline

Summary of methods for processing short-read data (--workflow ont)

Input data can be raw FASTQ reads, or a previously mapped BAM file. If a BAM file is provided, the workflow begins variant calling steps.

Parameters for ONT methods

  • --sampleID
    • Default: <STRING>
    • The sample ID for the input data (required).
  • --pubdir
    • Default: /<PATH>
    • Description: The directory that the saved outputs will be stored.
  • --cacheDir
    • Default: /projects/omics_share/meta/containers
    • Description: This is directory that contains cached Singularity containers. JAX users should not change this parameter.
  • -w
    • Default: /<PATH>
    • The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
  • --keep_intermediate
    • Default: FALSE
    • Controls whether intermediate analysis files are copied to the pubdir. For this pipeline, this includes unsorted and sorted alignment files.

The following parameters include mutually exclusive options for specifying input data, including --fastq1 OR, --bam

  • --fastq1
    • Default: /<PATH>
    • The path to a single FASTQ file, or one of a pair of FASTQs for paired-end data.
  • --bam
    • /<PATH>
    • The path to a BAM input data if alignment has already been performed outside this pipeline.

The following parameters refer to the reference genome.

  • --fasta
    • Default: /<PATH>
    • Path to the reference genome in FASTA format.
  • --fasta_index
    • Default: /<PATH>
    • Optional paramter to specify index for reference genome. If not provided, pipeline will generate an index.

Filtering and trimming parameters:

  • quality
    • Default: 10
    • NanoFilt parameter for minimum read length
  • --length
    • Default: 400
    • NanoFilt parameter for maximum read length
  • --headcrop
    • Default: 10
    • NanoFilt parameter to trim N nucleotides from the start of the read --tailcrop
    • Default: 20
    • NanoFilt parameter to trim N nucleotides from the end of the read

The following optional parameters allow specification of targeted regions for adaptive sequencing:

  • --targ_chr
    • Default: null
    • Specify targeted chromosome if data were generated using adaptive sequencing mode.
  • --targ_start
    • Default: null
    • Specify targeted start coordinate if data were generated using adaptive sequencing mode.
  • --targ_end
    • Default: null
    • Specify targeted end coordinate if data were generated using adaptive sequencing mode.

The following parameters specify reference data and annotations. The --genome_build paramter controls whether the default versions are GRCm38 or GRCm39:

  • --genome_build
    • Default: GRCm38
    • Parameter that controls reference data used for alignment and annotation. GRCm38 is the default value, GRCm39 is an accepted alternate value.
  • --tandem_repeats
    • /ref_data/ucsc_mm10_trf_chr_sorted.bed
    • BED file that lists the coordinates of centromeres and telomeres to exclude as alignment targets. Note: default path refers to a location within the containers qquay.io/jaxcompsci/pbsv-td_refs:2.8.0--refv0.2.0 and quay.io/jaxcompsci/sniffles-td_refs:2.0.7--refv0.2.0, which require this file.
  • --sv_ins_ref
    • Default: /ref_data/variants_freeze5_sv_INS_mm39_to_mm10_sorted.bed.gz
    • BED file that lists previously indentified insertion SVs. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --sv_del_ref
    • Default: /ref_data/variants_freeze5_sv_DEL_mm39_to_mm10_sorted.bed.gz
    • BED file that lists previously indentified deletion SVs. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --sv_inv_ref
    • Default: /ref_data/variants_freeze5_sv_INV_mm39_to_mm10_sorted.bed.gz
    • BED file that lists previously indentified inversion SVs. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --reg_ref
    • Default: /ref_data/mus_musculus.GRCm38.Regulatory_Build.regulatory_features.20180516.gff.gz
    • BED file that lists regulatory features. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --genes_bed
    • Default: /ref_data/Mus_musculus.GRCm38.102.gene_symbol.bed
    • BED file that lists gene symbol IDs and coordinates. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --exons_bed
    • Default: /ref_data/Mus_musculus.GRCm38.102.exons.bed
    • BED file that lists exons and coordinates. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.

SURVIVOR merging parameters:

  • --surv_dist
    • Default: 1000
    • Maximum distance between breakpoints for merging SVs.
  • --surv_supp
    • Default: 1
    • The number of callers (out of 4) required to support an SV.
  • --surv_type
    • Default: 1
    • Boolean (0/1) that requires SVs to be the same type for merging.
  • --surv_strand
    • Default: 1
    • Boolean (0/1) that requires SVs to be on the same strand for merging.
  • --surv_min
    • Default: 30
    • Minimum length (bp) to output SVs.

Pipeline Default Outputs

Naming Convention Description
mmrSVD_ont_report.html Nextflow autogenerated report
trace.txt Nextflow trace of processes
${sampleID}/${sampleID}_ONT_NS_struct_var.vcf VCF output combining merged PBSV and Sniffles calls annotated for overlap with exonic regions
${sampleID}/${sampleID}.survivor_joined_results.csv Table of SVs annotated with overlaps of previously identified SVs (beck), genes, exons, regulatory regions
${sampleID}/alignments/${sampleID}.q30.bam Analysis-ready alignment of reads
${sampleID}/alignments/${sampleID}.q30.bam.bai Index for analysis-ready alignment of reads
${sampleID}/stats/nanostat_${fastq1}.fastq_pass.fastq_${sampleID} NanoStat pre-Porechop log
${sampleID}/stats/nanostat_${sampleID}_porechop_NanoFilt_${sampleID} NanoStat post-Porechop log
${sampleID}/unmerged_calls/${sampleID}_nanosv_sorted_prefix.vcf SV calls from NanoSV
${sampleID}/unmerged_calls/${sampleID}_sniffles_sorted_prefix.vcf SV calls from Sniffles
⚠️ **GitHub.com Fallback** ⚠️