Robinson Foulds (RF) Distance - tahiri-lab/KMeansPhyloTreesClustering GitHub Wiki

🔬 Robinson-Foulds (RF) Distance

General Definition

The Robinson-Foulds (RF) distance is a metric used to measure the difference between two tree structures.

It works by comparing how two trees partition the same set of elements.


How it works

Each tree can be represented as a set of splits (partitions).

A split divides the elements into two groups.

Example:

(A, B) | (C, D)

The RF distance counts how many splits are different between two trees.


Formula

RF(T1, T2) = (splits in T1 not in T2) + (splits in T2 not in T1)

Properties

  • RF = 0 → trees are identical
  • Higher RF → trees are more different
  • Only topology is considered (not branch lengths)

🌳 RF Distance in Bioinformatics

In bioinformatics, phylogenetic trees represent evolutionary relationships between species or genes.

Each split corresponds to:

  • a separation between groups of species
  • a hypothesis about common ancestry

Interpretation

  • If two trees have similar splits → they represent similar evolutionary hypotheses
  • If RF distance is high → the trees suggest different evolutionary histories

Role in this project

RF distance is used to:

  • quantify differences between phylogenetic trees
  • provide a similarity measure for clustering