accreditation - nextstrain/flora GitHub Wiki

This table aims to give credit for sequences to the appropriate entities / people. I refer to this as accreditation however am open to better terminology. I foresee 3 different "entities":

Published paper
Individual point of contact (e.g. Shirley for the Mass. mumps samples)
Institution (e.g. those seen on nextstrain.org/flu)

Maintaining multiple sources for a single sequence will be hard, so I propose using the highest available in the above list. Designing the schema for different entities is hard - below is a first attempt. The primary key is especially hard. Note that missing values are fine, and will not be displayed in auspice. The primary key here will be a field in the sequences table which then looks up this information. (I believe this is more maintainable than having the list of sequences here, but am open to different thoughts on this).

primary key: pubmed ID (if available) or just use name (see next line)? If there's no pubmed ID we will normally be adding this information in manually. I don't have a good name for this.
name: Black et al, CDC, Jen Gardy (for unpublished stuff).
link: a way to contact the entity. E.g. a mailto, the URL of the paper, etc.
title: The publication title, or a working title
journal: Only useful for published papers. E.g. bioRxiv (2017)

(We could have a field with the list of accessions (the primary key of the sequences table) to which this credit is linked, however this would require keeping the two in sync...)

We could write simple scripts to associate / update a number of sequences to a source. Such a JSON would look like:

{ sequences: [
   { accession: "mass123", credit: "shirley wohl"},
   { accession: "mass124", credit: "shirley wohl"}
], accreditation: [
   { credit: "shirley wohl", name: "shirley wohl", link: "mailto:..." }
]}