DB Schema for 529Renew - CameronD73/529renew GitHub Wiki

Schema for use with Sqlite-WASM

The change imposed by the demise of WebSQL requires substantial code change, so I might as well remove a few kludges and perhaps prepare for a few improvements.

Whether the added complexity makes up for the more efficient storage remains to be seen. In particular, the "chromosome 100 and 200" special entries have been shifted to the DNAmatches table.

A Profile person refers to a DNA test kit that is managed under this account. The DNArelative table contains data relating to a match between a profile person and one other. The DNAmatch table contains match data between any two testers of interest, which may or may not include a profile person.

tables as at DB structure V5 (Mar 2025)

profiles

    IDprofile   TEXT (P.K.)
    pname       TEXT UNIQUE not NULL

Simply the UIDs of each kit managed under this account, along with a short name the user can use to identify the kit (does not have to match the name in the idalias table)

idalias

Unique data for a single person who has done a DNA test with 23 and Me.

    IDText   TEXT primary key,          -- the tester's unique ID
    name     TEXT,          -- the tester's name
    "date"   TEXT,          -- the date they were added to the DB
    hapMat   TEXT,          -- maternal haplogroup
    hapPat   TEXT,          -- maternal haplogroup
    bYear    INTEGER        -- a birth year, if they have shared it.
    sex      TEXT
    familySurnames   TEXT   -- null if no values are given, otherwise json-encoded array of names
    familyLocations  TEXT   -- null if no values are given, otherwise json-encoded array of places
    familyTreeURL    TEXT   -- null if no values are given, the url listed for a person's external tree
    lastUpdated      TEXT   -- value auto-updated by trigger when any part of record is updated - used to decide on saving

DNArelatives

Information about relationship between a profile person and another tester (everybody who has a line on 23andMe's "DNA Relatives" page. There is exactly one record for each such pair of people. Comments entered on 23 and Me are specific to a relative in relation to a profile, so cannot be in idalias table because that table only has one record per tester.

    IDprofile    TEXT PK,       -- the ID of the profile person
    IDrelative   TEXT  PK,      -- the unique ID of the tester who shares DNA with the profile
    ICWScanned   INTEGER        -- null if unscanned; bit-mapped value:
                                    -- 1 if ICWs between this relative and profile have been scanned
                                    -- 2 if we have done a "triangulation" run of ICWs for this profile/relative pair. (not yet implemented)
                                    -- 8 if user requested a forced rescan
    datescanned  TEXT,          -- the date this pair's  relatives in common were last checked for triangulations.
    side         TEXT,          -- if maternal/paternal has been indicated.
    comment      TEXT,          -- the comment field from the import
    knownRel     TEXT           -- if tester has assigned a known family relationship between profile and tester - else emepty string
    lastUpdated  TEXT           -- value auto-updated by trigger when any part of record is updated - used to decide on saving

DNAmatches

data for any two people who share DNA. There is usually one record for each pair of people, but can be two if they switch between hiding and revealing their segments.

    ID1     TEXT  PK,       -- the unique ID of tester 1
    ID2     TEXT  PK,       -- the unique ID of tester 2
    ishidden  INTEGER PK,      -- 0 if segments are available, 1 if segment details are hidden
    pctshared  REAL,        -- the percent shared dna
    cMtotal    REAL         -- the calculated or supplied centiMorgan value
    nsegs      INTEGER      -- the number of shared segments
    hasSegs    INTEGER      -- boolean: 1 if we have stored matching segment data; 0 otherwise
    lastUpdated  TEXT       -- value auto-updated by trigger when any part of record is updated - used to decide on saving
    largestSeg   REAL       -- the size in cM of the largest shared segment between these two testers.
    predictedRelOld  TEXT   -- the original "predicted relationship" 23 and me suggests between these two
    predictedRel  TEXT      -- the "predicted relationship" 23 and me suggests (algorithm updated early 2025 and slowly rolled out).

ibdsegs, ibdsegsFull

Data for one shared segment of DNA between two testers. There is one record for each segment reported by 23andMe . 23andMe reports "any" and "full-match" segments separately where I use "any" to mean at least half-IBD match. Where a particular segment is reported as a "full-IBD" match, such as between siblings, there will also be a separate entry in the "any" category that covers the same range as the full segment (and is often larger). There is no plan to use the full-ibd information, but at least keeping track of them separately can avoid conflicts in primary keys and will reduce redundant information in GDAT.

ID1 and ID2 are always saved in alphabetical order to make later search easier.

    ID1     TEXT  PK,       -- the unique ID of tester 1
    ID2     TEXT  PK,       -- the unique ID of tester 2
    chromosome INTEGER,     -- chromosome number, or 23 for X
    start      INTEGER,     -- the MBase address of the start of the shared segment region
    end        INTEGER,     -- the MBase address of the end of the shared segment region
    cM         REAL         -- the calculated or supplied centiMorgan value
    snps       INTEGER      -- the number of SNPs in this shared segment

ICWsets

A set of 3-way in-common matches. There is no real distinction between id2 and id3, so they are inserted in alphabetical order to make later search easier (sometimes).

to be decided: how to handle if more than one is a profile person? at the moment two separate entries should exist for p1-p2-rel and p2-p1-rel

    IDprofile   TEXT,          -- the ID of the profile person
    ID2         TEXT,          -- the ID of the first relative 
    ID3         TEXT,          -- the ID of the 3rd tester
    chromosome  INTEGER        -- one of the following: a chromosome on which the 3-way overlap (quasi-triangulation) occurs
                               --                       0 - there is known to be no overlap
                               --                       -1 - we don't know (from the days we could see DNA segs),
                               --                       -2 - at least one tester has hidden dna
    start       INTEGER,       -- the MBase address of the start of the mutual overlap region (or zero)
    end         INTEGER        -- the MBase address of the end of the mutual overlap region (or zero)

This table records all the 3-way ICW values reported by 23 and Me, and whether there is actually any overlapping match for all 3 testers. We have to allow for the possibility of multiple records for one set of 3 people, especially when two may be closely related. Consequently, we leave ROWID as the primary key and impose no uniqueness constraints (otherwise it would have to be p1, id2, id3, chromosome, start)

Now that all DNA is hidden, I don't think the distinction between chromosome -2 and -1 is clear-cut.

ICWSegXref

    ICWset_row    INTEGER      -- a ROWID value in the ICWsets table
    segment_row   INTEGER      -- a ROWID value in the ibdsegs table

This is a cross-reference table identifying all matching segment pairs in a 3-way ICW

settings

A copy of the settings normally kept in local storage.

DBversion

just a version number in case we ever need to modify the schema.