CCMgen - susannvorberg/CCmpredPy GitHub Wiki

CCMgen

Generate a multiple sequence alignment from the MRF probability model specified by coupling potentials and a user-specified phylogenetic tree.

ccmgen.py [options] rawfile outalnfile

rawfile should be a MessagePack-formatted raw coupling potential file as generated by the -b option in CCMpred.

outalnfile is the filename where the sampled alignment should be written, in the format specified by --aln-format (default: FASTA).

Options

  • -n <nseq>, --num-sequences <nseq>: Set the number of sequences to generate
  • --like-aln <reference_msa>: Set --num-sequences and --mutation-rate-neff to have the same number of sequences and number of effective sequences as reference_msa.
  • --aln-format <format>: Parse and write all subsequent alignment files specified on the command line in another format. Supports all BioPython Bio.SeqIO file formats plus psicov.

Initial Sequence Options

  • User-specified (--seq0-file <seq_file>): Provide the initial sequence from a file (useful if you have e.g. an ancestral sequence reconstruction). Make sure that the sequence identifier matches the name of the root node in the sampling phylogeny.
  • MRF-generated (--seq0-mrf <generations>): Start out with an all-alanine sequence and use the MRF model to mutate the sequence for several (e.g. 500) generations.

Mutation Rate Options

  • User-specified (--mutation-rate <rate>): Give a user-specified mutation rate, measured in number of substitutions per unit of evolutionary distance on the phylogentic tree.
  • Target Neff (--mutation-rate-neff <neff>): Set the mutation rate to approximately hit a target number of effective sequences (Neff, calculated as $Neff = \sum_{n=1}^N \frac{1}{1 + ID_n}$, where IDn is the number of other sequences in the MSA with 80% sequence identity to sequence n).

Phylogenetic Tree Options

  • User-specified (--tree-newick <tree_file>): Evolve the sequences according to an evolutionary tree, e.g. from a phylogenetic reconstruction program
  • Binary tree (--tree-binary): A binary tree with equally distributed branch lengths
  • 'Star-shaped' tree (--tree-star): A tree where all leaf nodes are direct descendents of the root node.

Both binary and star-shaped tree will be generated to have a total evolutionary depth of 1 by default. You can adjust the sampled alignment target diversity by adjusting the mutation rate parameters.

Examples

Simple example

Generate sequences using potentials in data/1atzA.braw.gz and write results to data/sampled.fasta. Set mutation rate and number of sequences to match the alignment in data/1atzA.fasta:

ccmgen.py --like-aln data/1atzA.fasta \
	--tree-star \
	--seq0-mrf 500 \
	data/1atzA.braw.gz \
	data/sampled.fasta
⚠️ **GitHub.com Fallback** ⚠️