Matches over 50 cM - jonathanbrecher/sharedclustering GitHub Wiki

Matches over 50 cM

By default, Shared Clustering will generate clusters that include matches over 50 cM. Most people have a reasonable number of matches over 50 cM. The clusters generated from matches over 50 cM are also usually fairly easy to interpret, often showing four distinct regions that include matches related to the test taker through each of their four grandparents.

Clusters over 50 cM

In this example, the test taker happened to have more matches on their maternal side than on their paternal side, so the two maternal-side clusters are larger than the two on the paternal side. The test-taker also had several maternal half-siblings and first cousins among their matches; those relatives share two grandparents and form a thick band connecting the two clusters of other relatives who shared only one of the test-taker's maternal grandparents.

Clusters over 50 cM, annotated

In rare cases, the clusters over 50 cM may generate eight clusters that represent great-grandparents instead of four clusters that represent grandparents.

The great-grandparent cluster are very impressive. Unfortunately, the clusters are rarely that clear. This needs an usual combination of grandparents who came from different geographic areas (or ethnicities), and a good number of relatives from each great-grandparent who happened to get themselves tested.

The example shown here is the clearest that I've seen. This was provided by a member of the Shared Clustering User Group on Facebook. He also shared that his parents met in the military, and originally came from very different areas. His grandparents also came from fairly different areas as well.

The tester here actually identified all 9 of these clusters, representing all 8 of his great-grandparents, plus one great-great-grandparent. I'm impressed!

Ideal case of eight(ish) great-grandparents

The science behind 50 cM matches

The 50 cM cutoff is not arbitrary. Visible clusters form when all of the matches in the cluster match each other better than they match any other matches. The matches might not all share the exact same DNA segments, but most of the different pairs of matches within the cluster should share some DNA segment between them.

Sometimes the DNA sharing is quite complex within a cluster. Match A might match B, B match C, C match D and E, while D and E both match A.

Matches over 50 cM almost always share more than one DNA segment with the test taker and with each other. The clusters "hold together" best when the matches share more than one DNA segment.

Matches under 50 cM may share more than one DNA segment with the test taker. The further under 50 cM, the less likely that the matche will share more than one segment with the test taker.

When generating clusters over 20 cM, most of the matches will share only a single segment with the test taker. It's still possible to generate clusters, but at the 20 cM level each cluster usually represents a single segment. In other words, each member of a 20 cM cluster shares the exact same DNA segment with each other.

At the 20 cM level, each cluster might contain extremely distant matches, including matches whose most recent common ancestor is 8-15 generations distant.

The 50 cM clusters are easier to interpret because the members of each cluster pretty reliably have a common ancestor no more distant than the second- or third- great-grandparents of the test taker.