CCMgen - susannvorberg/CCmpredPy GitHub Wiki
Generate a multiple sequence alignment from the MRF probability model specified by coupling potentials and a user-specified phylogenetic tree.
ccmgen.py [options] rawfile outalnfilerawfile should be a MessagePack-formatted raw coupling potential file as generated by the -b option in CCMpred.
outalnfile is the filename where the sampled alignment should be written, in the format specified by --aln-format (default: FASTA).
-
-n <nseq>,--num-sequences <nseq>: Set the number of sequences to generate -
--like-aln <reference_msa>: Set--num-sequencesand--mutation-rate-neffto have the same number of sequences and number of effective sequences asreference_msa. -
--aln-format <format>: Parse and write all subsequent alignment files specified on the command line in another format. Supports all BioPython Bio.SeqIO file formats pluspsicov.
-
User-specified (
--seq0-file <seq_file>): Provide the initial sequence from a file (useful if you have e.g. an ancestral sequence reconstruction). Make sure that the sequence identifier matches the name of the root node in the sampling phylogeny. -
MRF-generated (
--seq0-mrf <generations>): Start out with an all-alanine sequence and use the MRF model to mutate the sequence for several (e.g. 500) generations.
-
User-specified (
--mutation-rate <rate>): Give a user-specified mutation rate, measured in number of substitutions per unit of evolutionary distance on the phylogentic tree. -
Target Neff (
--mutation-rate-neff <neff>): Set the mutation rate to approximately hit a target number of effective sequences (Neff, calculated as$Neff = \sum_{n=1}^N \frac{1}{1 + ID_n}$ , where IDn is the number of other sequences in the MSA with 80% sequence identity to sequence n).
-
User-specified (
--tree-newick <tree_file>): Evolve the sequences according to an evolutionary tree, e.g. from a phylogenetic reconstruction program -
Binary tree (
--tree-binary): A binary tree with equally distributed branch lengths -
'Star-shaped' tree (
--tree-star): A tree where all leaf nodes are direct descendents of the root node.
Both binary and star-shaped tree will be generated to have a total evolutionary depth of 1 by default. You can adjust the sampled alignment target diversity by adjusting the mutation rate parameters.
Generate sequences using potentials in data/1atzA.braw.gz and write results to data/sampled.fasta. Set mutation rate and number of sequences to match the alignment in data/1atzA.fasta:
ccmgen.py --like-aln data/1atzA.fasta \
--tree-star \
--seq0-mrf 500 \
data/1atzA.braw.gz \
data/sampled.fasta