Interpreting clusters with red overlap, Example 1 - jonathanbrecher/sharedclustering GitHub Wiki

Here is an excellent example showing red overlap between clusters. This was provided by a member of the Shared Clustering User Group, reposted with permission.

At first glance, this might look like a very complex cluster, very scary to try to analyze. Actually, there is a lot of very useful information that can be gleaned directly from the cluster diagram.

Full example

There are three areas of red overlap in this diagram, shown as plus-shaped areas within the diagram. If you don't see them right away, here is each one highlighted:

Overlap area #1

Overlap area #2

Overlap area #3

These plus-shaped areas are not accidents. They are an important part of the information in the clusters. One easy way to see them is to look at the values in the Shared Centimorgans column. Almost all of the matches in this diagram have shared centimorgan values between 20-30 cM -- except in the plus-shaped areas. You can enlarge one of the images above to see the raw data, or shown enlarged here for the three regions:

Centimorgan values in overlap areas

At the 20 cM level used for this diagram, each cluster usually represents a single DNA segment. The plus-shaped regions are areas of overlap between two clusters. The matches in these regions have segments associated with both clusters. It follows naturally that the matches who share two segments also have higher shared centimorgans than any match who shares only a single one of those segments.

That information also translates to genealogy. This tester was fortunate to identify several of the matches in this cluster and place them in their tree exactly:

Identified relatives in overlap areas

Generally speaking, the matches withing the overlap areas are genealogically closer relatives than the ones in a single cluster on one side or the other of the overlap. In this example, most of the identified matches in single clusters turned out to be 6Cs and 7Cs, while the ones in overlap areas tended to be 5Cs.

It's unfortunate that none of the matches have been identified in the largest overlap area near the top of the diagram. This overlap area is both physically largest (stretches the entire height / width of this cluster) and also has the highest shared centimorgan values. Based on the clusters alone, it would make sense if these matches turned out to be 4Cs or 3C to the test taker. Maybe further research will confirm that prediction!