Matches over 90 cM - jonathanbrecher/sharedclustering GitHub Wiki
Many people may be tempted to start exploring their cluster diagrams by generating clusters that show only their closest matches, for example only their matches over 90 cM. This is usually a bad idea. Limiting the matches included in cluster diagram will indeed produce a smaller diagram. A smaller diagram is not always a simpler diagram, though.
For many people, there simply aren't enough matches over 90 cM to produce a diagram worth looking at. These, like all examples on this page, are real examples from real people:
Looking at only the matches over 90 cM "should" limit the matches to third cousins and closer. That means that a cluster diagram "should" show four distinct clusters representing matches related to the test taker through each of the test taker's four grandparents. This is how the Leeds Method works.
That's the idea. It doesn't always work.
Some people happen to have more relatives who have been tested on one branch of the family than on others. This has nothing to do with the test taker, and nothing to do with genealogy. It's mostly a matter of random chance, and which relatives happened to get themselves tested.
In this example, the cluster at the top left probably indicates relatives through one grandparent. The cluster in the middle has a large amount of gray that shows an association with the first cluster, so the second cluster probably represents relatives through the spouse of the first grandparent. By process of elimination, the smaller area at the bottom right then would likely represent relatives though the other two grandparents in some way:
This example shows two clear clusters at the top left and a large one at the bottom right. That could mean that there are no relatives via the fourth grandparent who got tested:
This example is even less clear, with one large cluster at top left and maybe as few as one or as many as three clusters (depending on how you count) at bottom right:
This example is from an Ashkenazi test taker with significant endogamy. The two clusters in the middle of the diagram really do represent relatives through two different grandparents. The rest of the diagram has basically no information. The sort-of cluster at the bottom right is simply endogamic noise with no genealogical significance:
Sometimes a tester will get lucky, and they really will be able to see their matches nicely divided into four clear, distinct clusters:
Even when there are four clusters, the clusters are rarely split evenly:
And sometimes there might even be more than four clusters:
There is certainly no harm in generating cluster diagrams that contain only your closest matches. Just keep in mind that diagrams of this sort tend to be very difficult to interpret, and these very limited diagrams also tend to be very different from cluster diagrams generated from your full set of matches. You shouldn't let yourself get distracted by spending too long looking only at an over-90 cM cluster diagram. There is MUCH more information to be learned when you include the rest of your matches in your clusters.
Clusters over 90 cM can be generated using the Advanced Options section of the Cluster tab. The closest clusters that Shared Clustering generates by default are those for matches over 50 cM