The saved similarity report - jonathanbrecher/sharedclustering GitHub Wiki

Shared Clustering saves its similarity output in the first several columns of a .xlsx spreadsheet. The headers of each row and column provide other information about the matches in each similarity group.

Spreadsheet Overview

The results shown her are for a known third cousin of mine, with names changed. Due to severe endogamy, I had downloaded her results using the 'Endogamy special' downloading option: all matches, but only top 200 shared matches. Denise, Melissa, Bonnie, my father, cgussie, and myself are all confirmed cousins and those are the Test IDs that I used to calculate similarity. Sarah is a first cousin to the test taker; I did not include her in the Test IDs because I'm most interested in more distant matches to the test taker.

Larry was not among the Test IDs that I provided, but he is indeed also a confirmed cousin, so Shared Clustering was right to find him even with his relatively low shared centimorgans.

Rose, M.S., Alane, and ssenden are not known to me. Those would be good leads for continued research on this family branch. They might be related to the test taker and to me, or they might be false results. Sometimes a match will have a public tree that makes things clear right away. More often, there's no way to know without more research.

Shared matches total

This is the total number of shared matches between the test taker and this match, down to the lowest shared centimorgans limit specified when generating the similarity report.

Shared matches with overlap

Shared matches total

This is the number of shared matches that this match shares, and that were also specified as Test IDs, down to the lowest shared centimorgans limit specified when generating the similarity report.

The 'most similar' matches are the ones that have high numbers of shared matches with overlap, and where the shared matches with overlap are a large faction of the total shared matches. By default, the report has the most similar matches at the top of the list.

It can also be useful to resort the list to look at the matches with the highest overlap, ignoring the total shared matches.

Name

This is the name of the match, as shown in the Ancestry DNA results.

Test ID

Test Guid

The Test ID is a way that Ancestry uses to identify each test uniquely. It's possible for two people to have the same names. It is not possible for two people to have the same guid. Even if one person takes the test twice, each test will have a different ID.

Shared Clustering can use the test IDs to produce custom reports showing a subset of the total matches.

Link

The Link is a clickable hyperlink that will open the test results for that match in your default browser.

Shared centimorgans

The Shared centimorgans value is a standard measure of the strength of a match. Higher values indicate stronger matches. Most values tend to be fairly low.

The Ancestry DNA website only shows matches of at least 20 centimorgans in the shared match lists. Shared Clustering can include matches down to 6 centimorgans, which is the absolute lowest limit of the results returned by Ancestry.

Shared segments

While the shared centimorgans value shows the total strength of each match, the shared segments value shows how many segments those centimorgans are split into. Weak matches tend to share only a single segment. Strong matches can share a dozen or more segments.

Tree type

It can be helpful to know if each match has a tree linked to their DNA results. If you are trying to do research based on family names or locations and the match doesn't have any tree linked to their results at all, there's no point even opening the test results.

Possible tree types include:

None
Unlinked
Private
Public

Tree size

Sometimes trees have only one or two people. That's hardly any more helpful than having no tree at all.

Unlinked trees always report a zero tree size.

Starred

Any matches marked as Starred on the Ancestry site will be shown with an asterisk.

This column will only be shown if at least one match is starred.

Shared ancestor hint

Any matches with a Shared Ancestor Hint on the Ancestry site will be shown with an asterisk.

This column will only be shown if at least one match has a hint.

Note

If you used the note feature on the Ancestry website to attach comments to your matches, those notes will be repeated here for convenience. This can save a lot of time when interpreting the clusters, since you have easy access to the research you have already completed.