Home - jonathanbrecher/sharedclustering GitHub Wiki

Shared Clustering is a tool that allows an advanced or expert genetic genealogist to extract more information -- and more useful information -- from Ancestry DNA shared match lists. Rather than providing a single list of matches ordered only by the strength of the match, Shared Clustering divides that list into smaller clusters of matches that are likely related to each other. Knowing which matches are related to each other can often be a huge help in figuring out how those people are related to you, the test taker.

Click here to download the Shared Clustering application for Windows

If the previous link doesn't work for you, click here to download the same thing in *.zip format. But try the main link first. It seems to work fine for most people.

See also an introductory video, about 15 minutes long.

What is a cluster?

Each cluster contains a group of people who are more similar to each other than to other matches. Similarity is determined by the shared match lists of each match. In a perfect cluster, every member of the cluster appears in the shared match lists of every other member. Not all clusters are perfect. Clusters reflect biology, and biology is not perfect. The idea behind clusters is similarity, not perfection.

In most cases, a cluster represents a group of people who have some DNA segment shared among them. That makes sense. Clusters are built from the shared match lists reported by Ancestry, and Ancestry builds those lists from the DNA data.

Clusters are displayed visually as red squares in a spreadsheet, along with other identifying information about each match within the cluster:

Clusters overview