Automatic model selection (MOOSE) - amkozlov/raxml-ng GitHub Wiki
Since v2.0-beta3. RAxML-NG offers automatic model selection capabilities similar to ModelFinder in IQTree.
We call this feature MOOSE, which stands for MOdel Optimization and SElection.
Standalone model testing and selection
Command line:
raxml-ng --moose --msa prot21.fa --data-type AA
In this mode, MOOSE will determine the best-fit model and create following output files which you can use for subsequent analysis, e.g. for ML tree search:
Model testing results saved to: /home/alex/prot21.fa.raxml.moose.xml
Best-fit model saved to: /home/alex/prot21.fa.raxml.moose.bestModel
Binary MSA file saved to: /home/alex/prot21.fa.raxml.rba
Automatic model selection before ML tree search
For this, just specify --model DNA or --model AA in the command line, e.g.:
raxml-ng --msa dna56.phy --model DNA
raxml-ng --fast --msa prot21.fa --model AA
MOOSE options
You can customize model selection with --moose-options flag. In standalone mode, options can also be given directly after the --moose command.
Following options are available:
| Option | Meaning |
|---|---|
criterion=AIC | AICc | BIC |
information criterion to use for model selection (default: BIC) |
ic-delta=VALUE |
significance threshold for IC score difference (default: 10.0) |
freerate-categories=n[-m] |
test FreeRate models with n categories (optionally up to and including m) |
rhas=VAL1,VAL2,... |
list of rate heterogeneity across sites (RHAS) models (default: R,I+R,E,I,G,I+G) |
substitution-models=VAL1,VAL2,... |
list of substitutions models (default: all supported) |
heuristics=VAL1,VAL2,... | OFF |
heuristics to skip models (default: rhas,freerate) |
Examples:
- Run tree search with the best-fit DNA model, determined using
AICcriterion with significance delta of5.0, considering Gamma and FreeRate heterogeneity models with up to 5 categories (+G,+R2,+R3,+R4,+R5).
raxml-ng --msa dna56.phy --model DNA --moose-options criterion=aic/ic-delta=5/rhas=E,R,G/freerate-categories=2-5
- Run standalone model selection, without using heuristics and only considering three matrices (
LG,WAG, andFLAVI):
raxml-ng --moose heuristics=OFF/substitution-models=LG,WAG,FLAVI --msa prot21.fa --model AA
Heuristics
TODO
Parallelization
TODO
Further info
For more comprehensive model testing, please use ModelTest-NG.
For massively parallel model testing and phylogenetic inference with multiple HPC compute nodes and thousands of genes, please consider ParGenes.