Identifying maternal and paternal matches, example - jonathanbrecher/sharedclustering GitHub Wiki

This is a good example of using Shared Clustering in combination with other tools and information to identify an unknown father.

In the DNA Detectives Facebook group, a member (Rachel) asked about cases where all matches were maternal. She couldn't find any paternal matches at all. I was skeptical.

After she reached out to me offline, I'm pretty sure I've figured out what was going on. This is "the rest of the story", posted with her permission.

Clusters analysis

The first thing I did was generate clusters for the matches at Ancestry. Since she was looking for a bio-father, I limited the clusters to include only the matches that at 90 cM or higher:

Clusters over 90 cM

I could easily see that there was one big cluster at the top left of the diagram and a double-cluster at the bottom right.

Without knowing anything about the tester, my first guess was that one cluster was maternal and one was paternal. If that was true, then it would follow that the double-cluster at bottom represented two grandparents, and maybe the other two grandparents weren't well separated in the cluster at top. Rachel knew that the top cluster was definitely maternal, so I focused on the bottom double-cluster.

Then I looked at trees. Amazingly, 16 out of 18 people in bottom cluster have trees on Ancestry. The same uncommon name was present in four of those trees, so I could quickly throw those trees into the excellent What Are the Odds tool by DNA Painter.

What Are the Odds

That gives a strong hypothesis that the test taker is a grandchild of Sampson. According to the other public trees on Ancestry, Sampson had 9 children, but only two boys besides Thomas and John. Since the test taker is looking for a father, my guess is that it's down to one of those other two sons of Sampson. That's as far as I can get with the DNA. The next step would be to reach out to individual people.

I was able to build a tree for one other match in the lower double-cluster, who turned out to be a third cousin related through Sampson's wife. That supports the idea that the double-cluster includes matches through the tester's paternal grandparents. The four matches in the thick band in the middle of that double-cluster are likely 1C1R and 1C2R to the test taker, sharing both grandparents.

Cluster analysis hypothesis

"All maternal"?

All which raises an interesting question... Why did it look like all of the matches were maternal?

It turns out that Rachel hit a bit of bad luck. She looked at the highest matches, and their shared matches. Even though the clusters look very clear, they're not quite completely separate. There is one stray dot in the cluster diagram, where a 800 cM match in the large upper cluster also is a shared match to a 396 cM person in the lower double-cluster. I didn't even see it at first, but it's there. Those two people must be related to each other through some different branch, unrelated to the test taker. It's hard to discount a stray shared match like that, when looking only at the shared matches. It's very very very easy to discount it when looking at the full cluster diagram.

False match

It's also worth pointing out that this analysis could have been done without clusters at all. The clusters didn't change the centimorgans, or the trees. Clustering simply provided a spotlight on exactly which matches were most interesting to look at, and the centimorgans and trees made a lot more sense with that little bit of guidance.

Maternal matches

Although Rachel was focused on researching the paternal side, I also looked at the maternal side a little bit. Clusters generated over 90 cM would normally show matches divided into four groups, one for each grandparent. This diagram has two clusters that I could attribute to paternal grandparents. Why was there only one maternal cluster?

The answer is... not clear.

Clusters are often easy to interpret by themselves. Sometimes they need some extra help. In this case, the Shared Centimorgans column is interesting. Within the large maternal cluster, it's clear that the strongest matches are grouped in a band in the middle. similar to what was seen with the paternal matches.

False match

Using that band of strongest matches as a hint, I can sort of see some white areas at the top right and bottom left, as seen with the overlapping clusters for the paternal matches. The white areas are nowhere near as clear as on the paternal side. That's a hint that something else is going on here (the lower matches not shown in this diagram suggest some endogamy several generations back). But it still gives enough hint to let me make a hypothesis about the maternal matches also:

Maternal hypothesis

This maternal analysis would have been very difficult without clustering. Possible, yes. There isn't anything that clustering does that couldn't be done by hand. In this case, the close overlap would be hard to spot without the rest of the hints from the cluster diagram. In fact, it would be difficult to perform this analysis even with other clustering techniques. I don't think that the Leeds method would be able to resolve the maternal grandparents here. Credit one for Shared Clustering!