Cov phylogenetic tree quality by monophylicity - ababaian/serratus GitHub Wiki
The goal is to assess agreement between a tree and Cov taxonomy by measuring the degree of monophylicity. In an optimal tree, all species would be monophyletic. Given a number of candidate trees, we would pick the tree with most monophyletic species, or possibly a tree where more important species are monophyletic (e.g SARS).
TaxId Seqs Taxa Mono Names 28295 159 1 mono Porcine epidemic diarrhea virus 694014 283 1 mono Avian coronavirus ... 694007 4 2 POLY Tylonycteris bat coronavirus HKU4, Tylonycteris pachypus bat coronavirus HKU4-related 1335626 12 3 POLY Middle East respiratory syndrome-related coronavirus, Bat coronavirus, Hypsugo bat coronavirus HKU25 11137 8 2 POLY Human coronavirus 229E, Rousettus aegyptiacus bat coronavirus 229E-related 51 taxa, 42 mono, 9 polyphyletic
s3://serratus-public/rce/monophy/
See runme.bash
for an example.
To run the analysis, you need a rooted tree in usearch tabbed format.
Root placement is not important, if you have an unrooted tree you can use any convenient method. With raxml:
raxml -f I -m GTRCAT -t $intree -n rooted
To convert a rooted Newick tree to usearch tabbed:
usearch -tree_cvt tree.newick -tabbedout tree.tsv