DB Schema for 529Renew - CameronD73/529renew GitHub Wiki
Schema for use with Sqlite-WASM
The change imposed by the demise of WebSQL requires substantial code change, so I might as well remove a few kludges and perhaps prepare for a few improvements.
Whether the added complexity makes up for the more efficient storage remains to be seen. In particular, the "chromosome 100 and 200" special entries have been shifted to the DNAmatches table.
A Profile person refers to a DNA test kit that is managed under this account. The DNArelative table contains data relating to a match between a profile person and one other. The DNAmatch table contains match data between any two testers of interest, which may or may not include a profile person.
tables as at DB structure V5 (Mar 2025)
profiles
IDprofile TEXT (P.K.)
pname TEXT UNIQUE not NULL
Simply the UIDs of each kit managed under this account, along with a short name the user can use to identify the kit (does not have to match the name in the idalias table)
idalias
Unique data for a single person who has done a DNA test with 23 and Me.
IDText TEXT primary key, -- the tester's unique ID
name TEXT, -- the tester's name
"date" TEXT, -- the date they were added to the DB
hapMat TEXT, -- maternal haplogroup
hapPat TEXT, -- maternal haplogroup
bYear INTEGER -- a birth year, if they have shared it.
sex TEXT
familySurnames TEXT -- null if no values are given, otherwise json-encoded array of names
familyLocations TEXT -- null if no values are given, otherwise json-encoded array of places
familyTreeURL TEXT -- null if no values are given, the url listed for a person's external tree
lastUpdated TEXT -- value auto-updated by trigger when any part of record is updated - used to decide on saving
DNArelatives
Information about relationship between a profile person and another tester (everybody who has a line on 23andMe's "DNA Relatives" page. There is exactly one record for each such pair of people. Comments entered on 23 and Me are specific to a relative in relation to a profile, so cannot be in idalias table because that table only has one record per tester.
IDprofile TEXT PK, -- the ID of the profile person
IDrelative TEXT PK, -- the unique ID of the tester who shares DNA with the profile
ICWScanned INTEGER -- null if unscanned; bit-mapped value:
-- 1 if ICWs between this relative and profile have been scanned
-- 2 if we have done a "triangulation" run of ICWs for this profile/relative pair. (not yet implemented)
-- 8 if user requested a forced rescan
datescanned TEXT, -- the date this pair's relatives in common were last checked for triangulations.
side TEXT, -- if maternal/paternal has been indicated.
comment TEXT, -- the comment field from the import
knownRel TEXT -- if tester has assigned a known family relationship between profile and tester - else emepty string
lastUpdated TEXT -- value auto-updated by trigger when any part of record is updated - used to decide on saving
DNAmatches
data for any two people who share DNA. There is usually one record for each pair of people, but can be two if they switch between hiding and revealing their segments.
ID1 TEXT PK, -- the unique ID of tester 1
ID2 TEXT PK, -- the unique ID of tester 2
ishidden INTEGER PK, -- 0 if segments are available, 1 if segment details are hidden
pctshared REAL, -- the percent shared dna
cMtotal REAL -- the calculated or supplied centiMorgan value
nsegs INTEGER -- the number of shared segments
hasSegs INTEGER -- boolean: 1 if we have stored matching segment data; 0 otherwise
lastUpdated TEXT -- value auto-updated by trigger when any part of record is updated - used to decide on saving
largestSeg REAL -- the size in cM of the largest shared segment between these two testers.
predictedRelOld TEXT -- the original "predicted relationship" 23 and me suggests between these two
predictedRel TEXT -- the "predicted relationship" 23 and me suggests (algorithm updated early 2025 and slowly rolled out).
ibdsegs, ibdsegsFull
Data for one shared segment of DNA between two testers. There is one record for each segment reported by 23andMe . 23andMe reports "any" and "full-match" segments separately where I use "any" to mean at least half-IBD match. Where a particular segment is reported as a "full-IBD" match, such as between siblings, there will also be a separate entry in the "any" category that covers the same range as the full segment (and is often larger). There is no plan to use the full-ibd information, but at least keeping track of them separately can avoid conflicts in primary keys and will reduce redundant information in GDAT.
ID1 and ID2 are always saved in alphabetical order to make later search easier.
ID1 TEXT PK, -- the unique ID of tester 1
ID2 TEXT PK, -- the unique ID of tester 2
chromosome INTEGER, -- chromosome number, or 23 for X
start INTEGER, -- the MBase address of the start of the shared segment region
end INTEGER, -- the MBase address of the end of the shared segment region
cM REAL -- the calculated or supplied centiMorgan value
snps INTEGER -- the number of SNPs in this shared segment
ICWsets
A set of 3-way in-common matches. There is no real distinction between id2 and id3, so they are inserted in alphabetical order to make later search easier (sometimes).
to be decided: how to handle if more than one is a profile person? at the moment two separate entries should exist for p1-p2-rel and p2-p1-rel
IDprofile TEXT, -- the ID of the profile person
ID2 TEXT, -- the ID of the first relative
ID3 TEXT, -- the ID of the 3rd tester
chromosome INTEGER -- one of the following: a chromosome on which the 3-way overlap (quasi-triangulation) occurs
-- 0 - there is known to be no overlap
-- -1 - we don't know (from the days we could see DNA segs),
-- -2 - at least one tester has hidden dna
start INTEGER, -- the MBase address of the start of the mutual overlap region (or zero)
end INTEGER -- the MBase address of the end of the mutual overlap region (or zero)
This table records all the 3-way ICW values reported by 23 and Me, and whether there is actually any overlapping match for all 3 testers. We have to allow for the possibility of multiple records for one set of 3 people, especially when two may be closely related. Consequently, we leave ROWID as the primary key and impose no uniqueness constraints (otherwise it would have to be p1, id2, id3, chromosome, start)
Now that all DNA is hidden, I don't think the distinction between chromosome -2 and -1 is clear-cut.
ICWSegXref
ICWset_row INTEGER -- a ROWID value in the ICWsets table
segment_row INTEGER -- a ROWID value in the ibdsegs table
This is a cross-reference table identifying all matching segment pairs in a 3-way ICW
settings
A copy of the settings normally kept in local storage.
DBversion
just a version number in case we ever need to modify the schema.