Alignment Pipeline - a-lud/nf-pipelines GitHub Wiki
This sub-workflow handles the alignment of short-read DNA sequence data to reference genomes. The aim is to be a pretty general pipeline that enables quick and easy alignment using current aligners BWA2 and Minimap2.
The pipeline aligns samples to their specified genomes, filtering out low quality alignments (as specified by the user), along with removing unmapped reads to reduce the footprint of the output files. Additionally, duplicate alignments are marked using the software Sambamba. After alignment, alignment statistics are generated using MosDepth and flagstat which are presented in a MultiQC report.
The current version of the alignment pipeline has the following arguments:
--seqdir string Directory path to paired-end reads.
--sheet string CSV file of two columns '<sample.basename>,<reference>'.
--platform string Specify the sequencing platform. Options: illumina, mgi.
--aligner string Aligner to use for short-read mapping. Options: bwa2, minimap2.
--mapq integer Minimum mapping quality threshold. Default 10.
This argument requires a directory path to where the FASTQ files listed in the sample sheet are located (see sheet below).
Provide the file path to a CSV file that contains two columns (without column names).
- Basename of the FASTQ files in the
seqdir(i.e. whatever comes before_R?.fastq.gz) - The file path to the reference genome you want to align the sample to
The pipeline will search the seqdir for files that match the basename you provide and create a data-channel.
An example of the CSV file is shown below
sample-AA,/home/a1645424/al/hydrophis-major/hydmaj-chromosome/reference-1.fa
sample-BB,/home/a1645424/al/hydrophis-major/hydmaj-chromosome/reference-2.fa
sample-CC,/home/a1645424/al/hydrophis-major/hydmaj-chromosome/reference-3.fa
Where sample-AA would match a file with the following extension - sample-AA_R?.fastq.gz
Provide the sequencing platform the sequence data was generated on so it can be added to the BAM read-group.
Choose which alignment tool you'd like to use. The current options include BWA-MEM2 and Minimap2. These are both fast, proven alignment tools that are suitable to most data-types.
Specify a minimum mapping quality threshold that alignments must meet.
