Automatic model selection (MOOSE) - amkozlov/raxml-ng GitHub Wiki

Since v2.0-beta3. RAxML-NG offers automatic model selection capabilities similar to ModelFinder in IQTree.

We call this feature MOOSE, which stands for MOdel Optimization and SElection.

Standalone model testing and selection

Command line:

raxml-ng --moose --msa prot21.fa --data-type AA

In this mode, MOOSE will determine the best-fit model and create following output files which you can use for subsequent analysis, e.g. for ML tree search:

Model testing results saved to: /home/alex/prot21.fa.raxml.moose.xml
Best-fit model saved to: /home/alex/prot21.fa.raxml.moose.bestModel
Binary MSA file saved to: /home/alex/prot21.fa.raxml.rba

Automatic model selection before ML tree search

For this, just specify --model DNA or --model AA in the command line, e.g.:

raxml-ng --msa dna56.phy --model DNA

raxml-ng --fast --msa prot21.fa --model AA

MOOSE options

You can customize model selection with --moose-options flag. In standalone mode, options can also be given directly after the --moose command.

Following options are available:

Option Meaning
criterion=AIC | AICc | BIC information criterion to use for model selection (default: BIC)
ic-delta=VALUE significance threshold for IC score difference (default: 10.0)
freerate-categories=n[-m] test FreeRate models with n categories (optionally up to and including m)
rhas=VAL1,VAL2,... list of rate heterogeneity across sites (RHAS) models (default: R,I+R,E,I,G,I+G)
substitution-models=VAL1,VAL2,... list of substitutions models (default: all supported)
heuristics=VAL1,VAL2,... | OFF heuristics to skip models (default: rhas,freerate)

Examples:

  1. Run tree search with the best-fit DNA model, determined using AIC criterion with significance delta of 5.0, considering Gamma and FreeRate heterogeneity models with up to 5 categories (+G, +R2, +R3, +R4, +R5).

raxml-ng --msa dna56.phy --model DNA --moose-options criterion=aic/ic-delta=5/rhas=E,R,G/freerate-categories=2-5

  1. Run standalone model selection, without using heuristics and only considering three matrices (LG,WAG, and FLAVI):

raxml-ng --moose heuristics=OFF/substitution-models=LG,WAG,FLAVI --msa prot21.fa --model AA

Heuristics

TODO

Parallelization

TODO

Further info

For more comprehensive model testing, please use ModelTest-NG.

For massively parallel model testing and phylogenetic inference with multiple HPC compute nodes and thousands of genes, please consider ParGenes.