Data privacy - jonathanbrecher/sharedclustering GitHub Wiki

The Shared Clustering application runs on your computer, using your username and password to request information from the Ancestry web site. Your username and password go to Ancestry. Information about your DNA matches comes back to you, and your cluster diagram is generated on your computer. None of your information goes anywhere else.

Ancestry password

Your Ancestry password is a private matter between you and Ancestry. The Shared Clustering application doesn't do anything with your password except send it to Ancestry. Your password isn't even saved on your own computer -- you need to type it in each time you use Shared Clustering.

There are many reasons that you should not share your password. Those are good reasons for any site. If you do ever need to share access to your DNA results, Ancestry has a special way to share access without sharing your password. That is a good approach, when you want to share your results with someone else. When you are using the Shared Clustering application, you can use your own password to access your own data, with confidence that nobody will see that password besides you and Ancestry.

Shared match data

There is no private information in the shared match data used to create your cluster diagram. Everything in the cluster diagram is visible to you when looking at your matches on the Ancestry website directly. Some of the information shown in the cluster diagram is information that you entered yourself, including your notes and any "colored dot" assignments. The rest of the information in the cluster diagram simply repeats the matches shown on the Ancestry site, with their order rearranged a bit.

All of the "magic" of clustering comes from rearranging the order of your matches so that similar matches are near to each other. The default ordering of matches on the Ancestry website is simply in decreasing order of shared centimorgans. There is no private information about a descending order. The new ordering shown in the cluster diagram didn't exist before the cluster diagram was created, so it is not private either.

Names of matches

Some people feel that the names of matches constitute private information. That is not the normal definition. Although Ancestry has not provided an official statement on this matter, they apparently do not themselves feel that names of matches are private information, otherwise they wouldn't have shared those names with you in the first place.

There is also the important point that the names of matches shown by Ancestry are not necessarily names of real people. Anyone who tests on Ancestry can choose to display any name that they want, or any initials, or any made-up phrase, or they can display a name that is totally blank. The names shown by Ancestry are names that each person has already chosen that they want to share.

Ancestry also provides a way for at test taker to keep their results completely private. Those testers do no participate in DNA matching. They do not appear in your match list or among your shared matches when you view your test results on the Ancestry website. Those matches are not available to Shared Clustering. They will not appear in any cluster diagrams produced by Shared Clustering, under any name or in any form at all.

Despite all of that, some people are still not comfortable sharing cluster diagrams that contain any information about their matches at all. For that reason, the Shared Clustering application allows you to create fully anonymized cluster diagrams remove or obfuscate any information that might be used to identify your matches even in theory. That includes not only the names of the matches, but also the links to the Ancestry website, any common ancestors, and all user-entered information such as the notes and "colored dot" labels. An anonymized cluster diagram still preserves the overall "shape" of the clusters (the red squares) and the shared centimorgans and shared segments. None of that information can be used to identify any match or the test taker, even in theory.

Open Source

Since the Shared Clustering is Open Source, that means that there is no private information even in the application itself. Anyone who cares to look at the source code for Shared Clustering can confirm for themselves that nothing secret or surprising is happening to your data, your name, your password, or anything else anywhere in the clustering process.