Caveats - jonathanbrecher/sharedclustering GitHub Wiki

There can be exceptions to the general rule that clusters represent groups of people who share the same DNA segment.

Fortunately, exceptions seem to be, well, exceptions. Most clusters do seem to highlight people who share a DNA segment, and are related through the the genealogical path taken by that segment.

Small clusters

Small clusters can be influenced by quirks of the algorithm that identifies shared matches. If a cluster includes only a few matches, there is a chance that those people could be seen as shared matches for different reasons. For example, consider a three-person cluster containing Sam, Peter, and Mary. Suppose that Sam is related to Peter through some paternal segment, while Sam is related to Mary through some maternal segment. Then suppose that Peter and Mary are related to each other through some entirely different third segment. In this situation, all three people do match each other, so all three could be in the same cluster even though there is no single DNA segment that is shared by all three.

This concern is only a factor in very small clusters. As more people are included in the cluster, the more that the commonalities reinforce themselves and force out the quirks. This ability to highlight the good information while minimizing the bad information is one of the great strengths of clustering.

Large clusters

Even in large clusters, there is no guarantee even with a shared segment that the segment is identical by descent (IBD). The segment could have been inherited from different sources, and simply identical by state (IBS). Clustering can't tell the difference.