Multiple Sequence Alignment - a-lud/nf-pipelines GitHub Wiki
The multiple sequence alignment (MSA) pipeline aligns sequences in multi-fasta files, converts the aligned peptides to nucleotides using cogent3
and finally trims low quality blocks from the alignments using GBlocks
.
Input
The pipeline is quite simple to run, users need only provide the following:
- Aligner: What aligner you want to use out of
muscle
,clustal-omgea
,mafft
andt-coffee
- Whether to convert the peptide alignments to neucleotide
- Whether to trim alignments using
GBlocks
Pipeline logistics
Multiple sequence alignment is generally pretty quick, although run-time varies based on sample number per alignment. Currently, I've been running alignments with no more than 6-8 samples, so I have implemented a single nextflow
process that performs the alignment for each file through GNU-parallel
.
This saves on excessive job submissions to the cluster and generally runs considerably faster due to not having to wait for each job to submit.
Output
The output files for each step of the process are fasta
alignment files. Log files are produced for each GNU-parallel
command, which are also copied to the output directories.