Multiple Sequence Alignment - a-lud/nf-pipelines GitHub Wiki

The multiple sequence alignment (MSA) pipeline aligns sequences in multi-fasta files, converts the aligned peptides to nucleotides using cogent3 and finally trims low quality blocks from the alignments using GBlocks.

Input

The pipeline is quite simple to run, users need only provide the following:

  • Aligner: What aligner you want to use out of muscle, clustal-omgea, mafft and t-coffee
  • Whether to convert the peptide alignments to neucleotide
  • Whether to trim alignments using GBlocks

Pipeline logistics

Multiple sequence alignment is generally pretty quick, although run-time varies based on sample number per alignment. Currently, I've been running alignments with no more than 6-8 samples, so I have implemented a single nextflow process that performs the alignment for each file through GNU-parallel.

This saves on excessive job submissions to the cluster and generally runs considerably faster due to not having to wait for each job to submit.

Output

The output files for each step of the process are fasta alignment files. Log files are produced for each GNU-parallel command, which are also copied to the output directories.