metric robinson foulds distance - tahiri-lab/KMeansPhyloTreesClustering GitHub Wiki

🌳 Robinson-Foulds (RF) Distance

🔬 Definition

The Robinson-Foulds (RF) distance is a metric used to measure the topological difference between two phylogenetic trees.

It compares how trees partition the same set of taxa.


🧠 Mathematical Formulation

Let:

  • T1 and T2 be two trees
  • S1 = set of splits in T1
  • S2 = set of splits in T2

Then:

RF(T1, T2) = |S1 - S2| + |S2 - S1|

Equivalent form:

RF(T1, T2) = 2 × (number of non-common splits)

🌳 Interpretation

  • RF = 0 → trees are identical
  • Low RF → trees are similar
  • High RF → trees are very different

📌 Example

Tree 1:

(A, B) | (C, D)

Tree 2:

(A, C) | (B, D)

👉 No common splits

RF = 2

🧬 Biological Meaning

  • Each split = evolutionary separation
  • RF measures disagreement between evolutionary hypotheses

👉 Low RF:

  • similar ancestry assumptions

👉 High RF:

  • different evolutionary scenarios

⚙️ Use in This Project

RF distance is used to:

  • compute similarity between phylogenetic trees
  • build a distance matrix
  • serve as input for clustering