AB Generate alignment - mendessoares/BuddySuite GitHub Wiki
--generate_alignment, -ga
Description
Generate a multiple sequence alignment using third party alignment tools. Basic default parameters are built into the wrapper for 'quick-and-dirty' alignments with any of the supported tools, or you can specify further parameters as desired. All necessary format conversions are handled by AlignBuddy and the output will be returned in the same format as the input (unless over-ridden with the -o flag, as is normal). This is particularly useful if aligning sequences in a richly annotated format like GenBank, as the annotations are re-mapped back onto the new alignment at the end of the job.
As the job runs, any output the tool normally generates will be streamed to stderr for your reference (suppressible with the '-q' flag). If the tool generates files as part of its normal operation, these are sent to a temporary directory and deleted once AlignBuddy finishes the job. To save these files, specify a directory with the '-k' flag.
Supported Tools
The following alignment programs are currently supported by AlignBuddy:
Program | $PATH names |
---|---|
MAFFT | mafft |
PRANK | prank |
PAGAN | pagan |
MUSCLE | muscle |
ClustalW2 | clustal, clustalw, clustal2, or clustalw2 |
ClustalOmega | clustalomega or clustalo |
The binaries for these programs are not included with the BuddySuite, so must be installed separately and resolve as part of the system $PATH. AlignBuddy will attempt to run tools not officially supported, but please tread carefully if attempting this. It would be better to let us know you are regularly using a non-supported tool; we can probably start supporting you.
Arguments
While optional, the 'Tool' argument must be present if using the 'Arguments' argument. They must also be in order.
Tool ( str )
Optional. If not set, AlignBuddy will try to find an alignment program on your system and will execute the first one it detects. Otherwise, specify the name of the alignment tool you wish to run.
Arguments ( str )
Optional. This can only be used if an alignment tool is specified as the first argument. Enclose all additional tool specific arguments in double quotes (so AlignBuddy doesn't try to interpret them itself).
Examples
Input file: Mnemiopsis_Panxs.gb
LOCUS Mle-Panxα3 200 aa UNA 02-JAN-2015
DEFINITION cDNA - ML036514a.
ACCESSION Mle-Panxα3
VERSION Mle-Panxα3
KEYWORDS .
SOURCE
ORGANISM .
.
FEATURES Location/Qualifiers
CDS order(1..50,51..111,112..152,153..183,184..200)
/created_by="User"
/label="ML036514a"
/modified_by="User"
TMD1 29..49
TMD2 132..152
ORIGIN
1 mlllgslgti knlsifkdls lddwldqmnr tfmflllcfm gtivavsqyt gkniscdgft
61 kfgedfsqdy cwtqglytik eaydlpesqi pypgiipenv pacrehalkn ggkivcpped
121 qvkpltrarh lwyqwipfyf wviapvfylp ymfvkrmgld rmkpllkims dyyhcttetp
181 seeiivkcad wvynsivdrl
//
LOCUS Mle-Panxα4 200 aa UNA 02-JAN-2015
DEFINITION cDNA and genomic - ML129317a.
ACCESSION Mle-Panxα4
VERSION Mle-Panxα4
KEYWORDS .
SOURCE
ORGANISM .
.
FEATURES Location/Qualifiers
TMD1 28..48
TMD2 131..151
ORIGIN
1 mviellagyk glspfkdatv ddswdqinrc yvfiamvvmg avttmrqysg tliacdgftk
61 fhpqfaedyc wsigmytvre aydlpssmva ypgvipwdmp acvprllkng trtkcgsekd
121 vmpsekiyhl wyqwasfyfw ivailyyapy imfkqlggge ykplikllcl asgspeqqmq
181 diqervvkwl ffrfktyifa
//
LOCUS Mle-Panxα6 200 aa UNA 02-JAN-2015
DEFINITION cDNA - ML25993a.
ACCESSION Mle-Panxα6
VERSION Mle-Panxα6
KEYWORDS .
SOURCE
ORGANISM .
.
FEATURES Location/Qualifiers
CDS order(1..42,43..92,93..125,126..171,172..200)
/created_by="User"
/label="ML25993"
/modified_by="User"
TMD1 28..48
TMD2 131..151
ORIGIN
1 mlleilanfk gatpfkeivl ddkwdqinrc ymfllcvifg tvvtfrqytg giiacdgltk
61 fsaafaedyc wtqglytike aydivdnslp ypgllpedap pclsrrlvsg griecppadl
121 yleptrvhht wyqwipfyfw visiafigpy ivykqlgvne lkpilamlhn pvdgddvtkd
181 qiskvsrwla iklnifiqek
//
LOCUS Mle-Panxα5 200 aa UNA 02-JAN-2015
DEFINITION cDNA - ML223536a.
ACCESSION Mle-Panxα5
VERSION Mle-Panxα5
KEYWORDS .
SOURCE
ORGANISM .
.
FEATURES Location/Qualifiers
CDS order(1..49,50..94,95..135,136..200)
/created_by="User"
/label="ML223536a"
/modified_by="User"
TMD1 28..48
TMD2 133..153
ORIGIN
1 miywvwavfk rmapfkvvtl ddrwdqmnrs fmmpltmsfa ylidygiiag stikctgfed
61 sfrseafvde ycwtqgiytl reaydlentk ipypgiipeg fpncmpyerw dgmkvecpke
121 eqylkptrvy hlyyqhiqly fwlvctlfyl pymvgiclgf nytkplinll hnpltrdeee
181 lealldkaar slrlrldiys
//
Usage example 1
If no arguments are passed in, AlignBuddy will try to find an alignment program on your system. In this example, MAFFT is found.
$: alb Mnemiopsis_Panxs.gb -ga
Output
nseq = 4
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
done.
scoremtx = 1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = p
Making a distance matrix ..
1 / 4
done.
Constructing a UPGMA tree ...
0 / 4
done.
Progressive alignment 1/2...
STEP 1 / 3 f
Reallocating..done. *alloclen = 1404
STEP 3 / 3 d
done.
Constructing a UPGMA tree ...
0 / 4
done.
Progressive alignment 2/2...
STEP 1 / 3 f
Reallocating..done. *alloclen = 1404
STEP 3 / 3 d
done.
disttbfast (aa) Version 7.186 alg=A, model=BLOSUM62, 1.53, -0.00, -0.00, noshift, amax=0.0
0 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
Returning to AlignBuddy...
LOCUS Mle-Panxα3 212 aa UNK 01-JAN-1980
DEFINITION
ACCESSION Mle-Panxα3
VERSION Mle-Panxα3
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS order(1..50,51..113,114..154,155..192,193..209)
/created_by="User"
/label="ML036514a"
/modified_by="User"
TMD1 29..49
TMD2 134..154
ORIGIN
1 mlllgslgti knlsifkdls lddwldqmnr tfmflllcfm gtivavsqyt gkniscdgft
61 k--fgedfsq dycwtqglyt ikeaydlpes qipypgiipe nvpacrehal knggkivcpp
121 edqvkpltra rhlwyqwipf yfwviapvfy lpymfvkrmg ldrmkpllki msdyyhctte
181 tp-------s eeiivkcadw vynsivdrl- --
//
LOCUS Mle-Panxα4 212 aa UNK 01-JAN-1980
DEFINITION
ACCESSION Mle-Panxα4
VERSION Mle-Panxα4
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
TMD1 29..49
TMD2 134..154
ORIGIN
1 -mviellagy kglspfkdat vddswdqinr cyvfiamvvm gavttmrqys gtliacdgft
61 k--fhpqfae dycwsigmyt vreaydlpss mvaypgvipw dmpacvprll kngtrtkcgs
121 ekdvmpseki yhlwyqwasf yfwivailyy apyimfkqlg ggeykplikl lc----lasg
181 sp----eqqm qdiqervvkw lffrfktyif a-
//
LOCUS Mle-Panxα6 212 aa UNK 01-JAN-1980
DEFINITION
ACCESSION Mle-Panxα6
VERSION Mle-Panxα6
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS order(2..43,44..95,96..128,129..182,183..212)
/created_by="User"
/label="ML25993"
/modified_by="User"
TMD1 29..49
TMD2 134..154
ORIGIN
1 -mlleilanf kgatpfkeiv lddkwdqinr cymfllcvif gtvvtfrqyt ggiiacdglt
61 k--fsaafae dycwtqglyt ikeaydivdn slpypgllpe dappclsrrl vsggriecpp
121 adlyleptrv hhtwyqwipf yfwvisiafi gpyivykqlg vnelkpilam l--------h
181 npv-dgddvt kdqiskvsrw laiklnifiq ek
//
LOCUS Mle-Panxα5 212 aa UNK 01-JAN-1980
DEFINITION
ACCESSION Mle-Panxα5
VERSION Mle-Panxα5
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS order(2..50,51..95,96..136,137..209)
/created_by="User"
/label="ML223536a"
/modified_by="User"
TMD1 29..49
TMD2 134..154
ORIGIN
1 -miywvwavf krmapfkvvt lddrwdqmnr sfmmpltmsf aylidygiia gstikctgfe
61 dsfrseafvd eycwtqgiyt lreaydlent kipypgiipe gfpncmpyer wdgmkvecpk
121 eeqylkptrv yhlyyqhiql yfwlvctlfy lpymvgiclg fnytkplinl l--------h
181 npltrdeeel ealldkaars lrlrldiys- --
//
Usage example 2
Pass in extra parameters to further refine your alignment.
$: alb Mnemiopsis_Panxs.gb -ga clustalomega "--iter=2" -o clustal
Output
Using 24 threads
Read 4 sequences (type: Protein) from /Volumes/Zippy/.sysTemp/tmpijrd7orz/tmp.fa
not more sequences (4) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done. CPU time: 0.00u 0.01s 00:00:00.01 Elapsed: 00:00:00
Guide-tree computation done.
Progressive alignment progress done. CPU time: 0.02u 0.00s 00:00:00.02 Elapsed: 00:00:00
Iteration step 1 out of 2
Computing new guide tree (iteration step 1032320)
Calculating pairwise aligned identity distances...
Pairwise identity calculation progress done. CPU time: 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00
Guide-tree computation done.
Computing HMM from alignment
Progressive alignment progress done. CPU time: 0.06u 0.01s 00:00:00.06 Elapsed: 00:00:00
Iteration step 2 out of 2
Computing new guide tree (iteration step 1032320)
Calculating pairwise aligned identity distances...
Pairwise identity calculation progress done. CPU time: 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00
Guide-tree computation done.
Computing HMM from alignment
Progressive alignment progress done. CPU time: 0.07u 0.00s 00:00:00.07 Elapsed: 00:00:00
Alignment written to /Volumes/Zippy/.sysTemp/tmpijrd7orz/result
Returning to AlignBuddy...
CLUSTAL X (1.81) multiple sequence alignment
Mle-Panxα3 MLLLGSLGTIKNLSIFKDLSLDDWLDQMNRTFMFLLLCFMGTIVAVSQYT
Mle-Panxα4 -MVIELLAGYKGLSPFKDATVDDSWDQINRCYVFIAMVVMGAVTTMRQYS
Mle-Panxα6 -MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYT
Mle-Panxα5 -MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIA
Mle-Panxα3 GKNISCDGFTK--FGEDFSQDYCWTQGLYTIKEAYDLPESQIPYPGIIPE
Mle-Panxα4 GTLIACDGFTK--FHPQFAEDYCWSIGMYTVREAYDLPSSMVAYPGVIPW
Mle-Panxα6 GGIIACDGLTK--FSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPE
Mle-Panxα5 GSTIKCTGFEDSFRSEAFVDEYCWTQGIYTLREAYDLENTKIPYPGIIPE
Mle-Panxα3 NVPACREHALKNGGKIVCPPEDQVKPLTRARHLWYQWIPFYFWVIAPVFY
Mle-Panxα4 DMPACVPRLLKNGTRTKCGSEKDVMPSEKIYHLWYQWASFYFWIVAILYY
Mle-Panxα6 DAPPCLSRRLVSGGRIECPPADLYLEPTRVHHTWYQWIPFYFWVISIAFI
Mle-Panxα5 GFPNCMPYERWDGMKVECPKEEQYLKPTRVYHLYYQHIQLYFWLVCTLFY
Mle-Panxα3 LPYMFVKRMGLDRMKPLLKIMSDYYHCTTETPSEEIIVKCADWVYNSIVD
Mle-Panxα4 APYIMFKQLGGGEYKPLIKLLCLAS-GSPEQQMQDIQERVVKWLFFRFKT
Mle-Panxα6 GPYIVYKQLGVNELKPILAMLHNPVDGDD--VTKDQISKVSRWLAIKLNI
Mle-Panxα5 LPYMVGICLGFNYTKPLINLLHNPLTRDE-EELEALLDKAARSLRLRLDI
Mle-Panxα3 RL---
Mle-Panxα4 YIFA-
Mle-Panxα6 FIQEK
Mle-Panxα5 YS---
Usage example 3
Keep all temporary files
$: alb Mnemiopsis_Panxs.gb -ga clustalw2 -o phylip-sequential -k ~/alignment_files
Output
Returning to AlignBuddy...
4 205
Mle-Panxα4 -MVIELLAGYKGLSPFKDATVDDSWDQINRCYVFIAMVVMGAVTTMRQYSGTLIACDGFTK--FHPQFAEDYCWSIGMYTVREAYDLPSSMVAYPGVIPWDMPACVPRLLKNGTRTKCGSEKDVMPSEKIYHLWYQWASFYFWIVAILYYAPYIMFKQLGGGEYKPLIKLLCLAS-GSPEQQMQDIQERVVKWLFFRFKTYIFA-
Mle-Panxα6 -MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTGGIIACDGLTK--FSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPEDAPPCLSRRLVSGGRIECPPADLYLEPTRVHHTWYQWIPFYFWVISIAFIGPYIVYKQLGVNELKPILAMLHNPVDGDD--VTKDQISKVSRWLAIKLNIFIQEK
Mle-Panxα3 MLLLGSLGTIKNLSIFKDLSLDDWLDQMNRTFMFLLLCFMGTIVAVSQYTGKNISCDGFTK--FGEDFSQDYCWTQGLYTIKEAYDLPESQIPYPGIIPENVPACREHALKNGGKIVCPPEDQVKPLTRARHLWYQWIPFYFWVIAPVFYLPYMFVKRMGLDRMKPLLKIMSDYYHCTTETPSEEIIVKCADWVYNSIVDRL---
Mle-Panxα5 -MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAGSTIKCTGFEDSFRSEAFVDEYCWTQGIYTLREAYDLENTKIPYPGIIPEGFPNCMPYERWDGMKVECPKEEQYLKPTRVYHLYYQHIQLYFWLVCTLFYLPYMVGICLGFNYTKPLINLLHNPLTRDE-EELEALLDKAARSLRLRLDIYS---
A new directory was created:
$: ls ~/alignment_file
>>> result tmp.dnd tmp.fa