Quick start for similarity - jonathanbrecher/sharedclustering GitHub Wiki

Similarity is a weaker type of analysis than clustering and should be used only when clustering itself is not an option. If you have fewer than 100,000 total matches, you definitely should try clustering first.

If you have significant endogamy, similarity may be able to isolate information about other matches that are most similar to known relatives. This can provide additional leads in cases where clustering produces only a very few very large clusters that are not helpful for further research.

Similarity is typically useful only when you have already identified some relatives that you can use as a filter. The matches that are most similar to the ones you already know may be related to you in ways that are similar to the known ones.

Download

Go to the Download tab:

Download default

  1. Enter your Ancestry DNA name and password, then click Sign In.
  2. Select the 'Endogamy special' option.
  3. Click Get DNA Matches. You will be prompted for a location to save the downloaded data.
  4. Wait until the progress bar reports that it is Done. For people with heavy endogamy, even the 'Endogamy special' downloading option can easily take over 24 hours.

Similarity

Go to the Similarity tab:

Similarity default

  1. Select the file that you just downloaded
  2. Select the file where you will save the similarity report.
  3. In the 'Test IDs to compare for similarity' section, enter the IDs for one or more matches that you have already identified. You will get better results if you enter the IDs for at least 4-5 matches, but the MUST all be related to each other in similar ways (for example, all of them mutual third cousins) or the results will be meaningless.
  4. Click Process Saved Data.
  5. Wait until the progress bar reports that it is Done.
  6. Review the saved report file. The matches most similar to the ones you provided will be listed earliest in the file. Most likely only the first few matches will be strong leads, but it's hard to predict where the exact cutoff will fall for useful matches.