Interpreting clusters with red overlap, Example 2 - jonathanbrecher/sharedclustering GitHub Wiki

Here is another excellent example showing red overlap between clusters. This was provided by a member of the Shared Clustering User Group, reposted with permission.

This is another example that might look like a very complex cluster, very scary to try to analyze. As with the first example, there is a lot of very useful information that can be gleaned directly from the cluster diagram.

(The examples here are very large. You can right-click on these images and expand them at full size in a new window, or you can download the original Excel data.)

Full example

There are several important details here, all tied together.

Entire cluster represents only one single segment

The most important detail is that in the entire cluster, almost all of the matches (85 out of 125) share only a single segment with the test taker. Without access to a chromosome browser we don't know which segment is represented by this cluster, but we know that we're talking about only one segment, somewhere.

It is true that some of the matches share more than one segment. That's OK. That means that those people share this segment and some other(s) elsewhere. The important part is that this cluster can be explained as tied to only a single segment. In fact, if you can exclude all of the matches that share more than one segments, the cluster containing only the remaining single-segment matches looks essentially unchanged from the original:

Single-segment matches

The matches in the long, thick band near the bottom also share only a single segment with the test taker. The largest of those is 48.0 cM. So we know that everything here is coming from a single, large 48.0 cM segment.

Largest single-segment match

Ancestry requires at least 20 cM to report a shared match

Ancestry only reports a shared match when all three people (the test taker, the match, and the shared match) share at least 20 cM. If any pair of those people share less than 20 cM, that pair is reported as "not a match".

That all leads to some interesting results.

Obviously, if one person shares the first 20 cM of a 48 cM segment, and another shares the last 20 cM, those people don't have anything in common and won't match at all.

Largest single-segment match

Less obviously, if one person shares the first 20 cM of a 48 cM segment, and someone skips the first 2 cM and then shares from the 2 cM - 22 cM positions, those people won't show a match either. They DO match, with an overlap of 18 cM. But Ancestry reports an 18 cM match as "not a match" when it comes to shared matches.

Largest single-segment match

That all means that it is easily possible to have a collection of segments that all overlap a single larger segment of 48 cM, while the matches with the smaller segments might or might not be reported as being shared matches with each other.

Analysis

You can now go back to the original cluster diagram and draw some conclusions based on the subclusters that are visible:

Boxed subclusters

Based on the largest single segments within the non-overlapping parts of the subclusters, it is likely that this cluster diagram represents a single segment of 48 cM, with smaller segments of roughly 26.0, 30.9, 26.7 cM, etc.

You can also make predictions by looking at which parts of the subclusters overlap with which other subclusters. For example, The top left subcluster (A) includes matches that do not overlap with any other subcluster. This subcluster is likely at one of the two ends of the overall 48 cM segment.

The second subcluster (B) overlaps (A) and (C) and (F), but it does not overlap with (D) or (E). Or at least it has no reported overlap with (D) or (E), which means that it could still have some overlap less than 20 cM.

This analysis can be continued, eventually predicting that these matches actually represents portions of the chromosome like this:

Arrangement of subclusters

Of course, this is just a prediction. The actual results could be confirmed exactly, if all of the matches were will to transfer their results to a site with a chromosome browser.

Conclusion

Unfortunately, there are few conclusions that can be made from the cluster diagram itself.

With 4 matches sharing over 40 cM with the test taker, it is very likely that that large segment came from some specific shared ancestor. In this case, the large size of the segment suggests that the shared ancestor might be recent enough to be easily traceable.

As for the other matches, the only certain information is that they don't share the full 48 cM segment. Somewhere along their path of descent, this segment recombined and only shorter lengths remain shared with the test taker. This might be a clue leading to other conclusions. For example, it's possible that some distant common ancestor shared the full 48.0 segment and each of their children inherited a slightly smaller piece of that segment. So each subcluster might contain groups of people who are more closely related to each other than they are to the test taker. This cannot be proved from the cluster diagram alone. The cluster diagram simply reports what segments are present now, with no knowledge of how those segments got to be there in the first place. The rest needs to be confirmed by the test taker, using traditional sources to confirm or disprove what the clusters suggest.