Interpreting clusters with red overlap - jonathanbrecher/sharedclustering GitHub Wiki

The key to getting the most information out of a cluster analysis is understanding the various ways that clusters can overlap. Overlap is easiest to interpret when it involves the red areas of the clusters.

Overlapping clusters with small overlap

Overlapping clusters with a small overlap

Red overlap between clusters represents matches that share more than one DNA segment with the test taker. These are some of the most useful parts of a clustering analysis for several reasons.

First, because the matches within the overlap area share (at least) two DNA segments with the test taker, they tend to have much higher total shared centimorgans. In the example above, nearly all of the matches have shared centimorgans in the 20-25 cM range -- except for the one match that overlaps both clusters. That one came in at 160 cM, comfortably in the second to third cousin range for this test taker. That's strong enough that you might have already identified the shared ancestor for that match. That's a huge help for identifying the two clusters that overlap.

Second, if can can identify the common ancestor for you and the overlapping match, you know that the DNA segments represented by those two clusters coexisted in that ancestral pair. This is a GOLDEN tool for breaking through brickwalls. You don't know immediately whether the two segments co-descended, both from the nth-great grandfather or grandmother, or whether they descended independently, one each from the paternal and maternal branches. That's a subject for further research!

Overlapping clusters with large overlap

Overlapping clusters with a large overlap

Overlapping clusters with a large overlap tend to be much less useful than ones with small overlap. These clusters typically represent a largish segment shared among the matches in the overlap area, with the two non-overlapping clusters representing slightly smaller segments on the "left side" or the "right side" of the larger ones. In the example above, matches in the overlap area had shared centimorgans in the 25-30 cM range, while the rest were in the 20-25 cM range.

Clusters of this sort likely represent a long period of descent, giving time for the presumed original longer segment to get shorter in various ways on either end. Members of this sort of overlapping cluster likely are related through a single common line of descent. That's a big difference from the clusters with small overlap shown earlier. There may be a large spread of common ancestors for the matches within these clusters, with a single cluster possibly including a mix of third cousins through eighth or even more distant cousins.ons between the three clusters!

Overlapping clusters with large overlap but only a single segment

Overlapping clusters with a large overlap but only a single segment

The Shared Segments column can often tell the story. In this example, nearly all of the 57 matches in this diagram share only one segment with the test taker. That includes the matches in the overlap area!

The matches in the overlap region share 23-28 cM with the test taker, while the others share 20-24 cM with the test taker. This has to be an example of a single 28 cM segment that was shortened at one end or the other. When two segments are shortened on either end by just enough, they share less than 20 cM with each other -- below the cutoff point where Ancestry reports that they aren't shared at all.

Of course the next question is 'What does this mean to me?'

In the 'normal' case of two segments, the matches in the overlap area usually indicate more recent relatives to the test taker than the rest.

In this case, since there is definitely only one segment involved, it might mean that the matches in the overlap area indicate more distant matches, following a path of descent where the original longer segment didn't get shortened.

More research is needed in this area. Do you have an example like this in their own clusters, with enough identified matches to say what this sort of cluster means in your case?