metric robinson foulds distance - tahiri-lab/KMeansPhyloTreesClustering GitHub Wiki
🌳 Robinson-Foulds (RF) Distance
🔬 Definition
The Robinson-Foulds (RF) distance is a metric used to measure the topological difference between two phylogenetic trees.
It compares how trees partition the same set of taxa.
🧠 Mathematical Formulation
Let:
- T1 and T2 be two trees
- S1 = set of splits in T1
- S2 = set of splits in T2
Then:
RF(T1, T2) = |S1 - S2| + |S2 - S1|
Equivalent form:
RF(T1, T2) = 2 × (number of non-common splits)
🌳 Interpretation
- RF = 0 → trees are identical
- Low RF → trees are similar
- High RF → trees are very different
📌 Example
Tree 1:
(A, B) | (C, D)
Tree 2:
(A, C) | (B, D)
👉 No common splits
RF = 2
🧬 Biological Meaning
- Each split = evolutionary separation
- RF measures disagreement between evolutionary hypotheses
👉 Low RF:
- similar ancestry assumptions
👉 High RF:
- different evolutionary scenarios
⚙️ Use in This Project
RF distance is used to:
- compute similarity between phylogenetic trees
- build a distance matrix
- serve as input for clustering