Test Metadata and how we store it - CDCgov/prime-simplereport GitHub Wiki

What is test metadata?

The data that tells us about the test that was conducted, beyond the raw "positive, negative, inconclusive" result data. Specifically, though not necessarily exhaustively:

  • the type of device (or the specific device) on which the test was run
  • the type of specimen that was collected to run the test, including where it was taken from
  • the name of the actual biochemical assay being run
  • the class of biochemical marker being assayed (e.g. antigen/antibody/RNA)
  • the disease being diagnosed by the test

We currently store all of this, implicitly or explicitly, in the DeviceType object.

And that's why we need a wiki page to explain it.

SimpleReport Data Model Evolution

Original Data Model: most things are implicit, the DeviceType owns everything else

  • the type of device on which the test was run is a first-class object. It is identified by its make and model (this is a thing some of us misunderstood because we were confused about the different taxonomy codes that exist)
  • the type of specimen is implicit: it is always a nasal swab
  • the identity of the actual biochemical assay being run, as described by its LOINC code, is a property of the device type (each device type can only one be used to record one assay in this configuration)
  • the class of biochemical marker being assayed is implicit in the LOINC code, but we assumed everything was an antigen test (this was close to correct)
  • the disease being diagnosed is in all cases COVID-19

Next: swab types and assay names sneak in

  • As an intermediate step before making specimen types fully functional, we added a "swab type" field to DeviceType, allowing each DeviceType to have its own swab type, but still requiring all users of the same type of device to be using the same type of swab
  • In order to enable printing with an assay name instead of a LOINC code, we implemented a ghastly shim that provides all the human-readable names that go along with the LOINC codes that are currently active in production

First class SpecimenTypes

  • In this model, SpecimenType has a many-to-many relationship to DeviceType: a given method of specimen collection (finger stick, nasal swab, etc.) can be used for many device types, and a given device type (Abbott IDNow, for example) may be able to use several types of specimen collection.
  • each test now links to a specimen-type/device-type tuple (imaginatively titled "DeviceSpecimenType")
  • each facility now has a set of DeviceSpecimenTypes representing the combinations that they employ
  • SpecimenType has the collection location code as a secondary value because some public health departments insist on it, even though it's implied by a sufficiently precise specimen type code

Multiplex devices

We then expanded to support multiplex devices that support multiple diseases. The eng design for this was pretty complicated, but is documented here

A note on how we represent positive / negative / inconclusive results

In day-to-day language, we use "positive" and "negative" to refer to the outcomes of a test we're taking. In public health however, more technical / nuanced terms are used for test outcomes.

To public health, there is an important difference between a test device reporting a positive result versus a device detecting the presence of a biochemical marker. This difference goes back to the CDC's condition case definitions, but a TL;DR is that only certain tests can tell conclusively that a patient has a disease, while other tests only "presume" positive results. For instance, a rapid COVID test can't say definitely that you have COVID: you need something like a PCR test to confirm a positive case.

Since SimpleReport deals primarily with rapid tests, most (though not all!) of our test devices can only report whether a disease is detected / not detected, rather than positive / negative. An exception is some of our RT-PCR devices for COVID that can report positives. We've made the design decision to not expose this complexity to the frontend and use the colloquial positive / negative language in all our user-facing copy, but this may change in the future.

We represent test status and other clinical entities in the backend / in data using SNOMED codes, a taxonomy used in public health to map clinical terms and their relationships to computer-parsable strings. There are separate SNOMEDs for positive vs reactive / presumptive positive / detected (and the corresponding counterparts for negative). In the app as of this writing, we don't differentiate these codes: all "positives" are mapped to "detected - 260373001" and "negatives" to "not detected - 260415000", even when this is technically incorrect.

The team determined in April of 2024 ahead of our STI expansion that just using the "detected / not detected" codes is good enough, since all the rapid STI tests we'd be supporting in our pilot can only indicate detected / not detected. The team may revisit this in the future as the pilot unfolds.