ONT Documentation - TheJacksonLaboratory/mmrSVD GitHub Wiki
- Filtering and QC with NanoStat, Porechop, NanoQC, and NanoFilt
- Map to reference genome using minimap2
- SV calling with Sniffles and NanoSV
- Merge SV calls with SURVIVOR
- Annotation of results based on intersection with previously identified mouse SVs, genic and exonic regions.
Input data can be raw FASTQ reads, or a previously mapped BAM file. If a BAM file is provided, the workflow begins variant calling steps.
-
--sampleID- Default:
<STRING> - The sample ID for the input data (required).
- Default:
-
--pubdir- Default:
/<PATH> - Description: The directory that the saved outputs will be stored.
- Default:
-
--cacheDir- Default:
/projects/omics_share/meta/containers - Description: This is directory that contains cached Singularity containers. JAX users should not change this parameter.
- Default:
-
-w- Default:
/<PATH> - The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
- Default:
-
--keep_intermediate- Default:
FALSE - Controls whether intermediate analysis files are copied to the
pubdir. For this pipeline, this includes unsorted and sorted alignment files.
- Default:
The following parameters include mutually exclusive options for specifying input data, including --fastq1 OR, --bam
-
--fastq1- Default:
/<PATH> - The path to a single FASTQ file, or one of a pair of FASTQs for paired-end data.
- Default:
-
--bam/<PATH>- The path to a BAM input data if alignment has already been performed outside this pipeline.
The following parameters refer to the reference genome.
-
--fasta- Default:
/<PATH> - Path to the reference genome in FASTA format.
- Default:
-
--fasta_index- Default:
/<PATH> - Optional paramter to specify index for reference genome. If not provided, pipeline will generate an index.
- Default:
Filtering and trimming parameters:
-
quality- Default:
10 - NanoFilt parameter for minimum read length
- Default:
-
--length- Default:
400 - NanoFilt parameter for maximum read length
- Default:
-
--headcrop- Default:
10 - NanoFilt parameter to trim N nucleotides from the start of the read
--tailcrop - Default:
20 - NanoFilt parameter to trim N nucleotides from the end of the read
- Default:
The following optional parameters allow specification of targeted regions for adaptive sequencing:
-
--targ_chr- Default:
null - Specify targeted chromosome if data were generated using adaptive sequencing mode.
- Default:
-
--targ_start- Default:
null - Specify targeted start coordinate if data were generated using adaptive sequencing mode.
- Default:
-
--targ_end- Default:
null - Specify targeted end coordinate if data were generated using adaptive sequencing mode.
- Default:
The following parameters specify reference data and annotations. The --genome_build paramter controls whether the default versions are GRCm38 or GRCm39:
-
--genome_build- Default:
GRCm38 - Parameter that controls reference data used for alignment and annotation. GRCm38 is the default value, GRCm39 is an accepted alternate value.
- Default:
-
--tandem_repeats/ref_data/ucsc_mm10_trf_chr_sorted.bed- BED file that lists the coordinates of centromeres and telomeres to exclude as alignment targets. Note: default path refers to a location within the containers
qquay.io/jaxcompsci/pbsv-td_refs:2.8.0--refv0.2.0andquay.io/jaxcompsci/sniffles-td_refs:2.0.7--refv0.2.0, which require this file.
-
--sv_ins_ref- Default:
/ref_data/variants_freeze5_sv_INS_mm39_to_mm10_sorted.bed.gz - BED file that lists previously indentified insertion SVs. Note: default path refers to a location within the container
quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
- Default:
-
--sv_del_ref- Default:
/ref_data/variants_freeze5_sv_DEL_mm39_to_mm10_sorted.bed.gz - BED file that lists previously indentified deletion SVs. Note: default path refers to a location within the container
quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
- Default:
-
--sv_inv_ref- Default:
/ref_data/variants_freeze5_sv_INV_mm39_to_mm10_sorted.bed.gz - BED file that lists previously indentified inversion SVs. Note: default path refers to a location within the container
quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
- Default:
-
--reg_ref- Default:
/ref_data/mus_musculus.GRCm38.Regulatory_Build.regulatory_features.20180516.gff.gz - BED file that lists regulatory features. Note: default path refers to a location within the container
quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
- Default:
-
--genes_bed- Default:
/ref_data/Mus_musculus.GRCm38.102.gene_symbol.bed - BED file that lists gene symbol IDs and coordinates. Note: default path refers to a location within the container
quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
- Default:
-
--exons_bed- Default:
/ref_data/Mus_musculus.GRCm38.102.exons.bed - BED file that lists exons and coordinates. Note: default path refers to a location within the container
quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
- Default:
SURVIVOR merging parameters:
-
--surv_dist- Default: 1000
- Maximum distance between breakpoints for merging SVs.
-
--surv_supp- Default: 1
- The number of callers (out of 4) required to support an SV.
-
--surv_type- Default: 1
- Boolean (0/1) that requires SVs to be the same type for merging.
-
--surv_strand- Default: 1
- Boolean (0/1) that requires SVs to be on the same strand for merging.
-
--surv_min- Default: 30
- Minimum length (bp) to output SVs.
| Naming Convention | Description |
|---|---|
mmrSVD_ont_report.html |
Nextflow autogenerated report |
trace.txt |
Nextflow trace of processes |
${sampleID}/${sampleID}_ONT_NS_struct_var.vcf |
VCF output combining merged PBSV and Sniffles calls annotated for overlap with exonic regions |
${sampleID}/${sampleID}.survivor_joined_results.csv |
Table of SVs annotated with overlaps of previously identified SVs (beck), genes, exons, regulatory regions |
${sampleID}/alignments/${sampleID}.q30.bam |
Analysis-ready alignment of reads |
${sampleID}/alignments/${sampleID}.q30.bam.bai |
Index for analysis-ready alignment of reads |
${sampleID}/stats/nanostat_${fastq1}.fastq_pass.fastq_${sampleID} |
NanoStat pre-Porechop log |
${sampleID}/stats/nanostat_${sampleID}_porechop_NanoFilt_${sampleID} |
NanoStat post-Porechop log |
${sampleID}/unmerged_calls/${sampleID}_nanosv_sorted_prefix.vcf |
SV calls from NanoSV |
${sampleID}/unmerged_calls/${sampleID}_sniffles_sorted_prefix.vcf |
SV calls from Sniffles |