N3C - sporedata/researchdesigneR GitHub Wiki

General description

  • The National COVID Cohort Collaborative (N3C) is a large national data repository designed to analyze patient-level data from multiple clinical centers to reveal patterns in COVID-19 patients. N3C aggregates electronic health record (EHR) data for all patients who tested for COVID-19 (positive, negative, and uncertain) from multiple health systems across the US, making it the largest COVID database in the United States and one of the largest in the world. As of August of 2022, N3C has data on over 12.5 million individuals, made available as an EHR-based limited data set of COVID-19 with information on patients and controls.

  • N3C data comes from numerous medical facilities and healthcare organizations and thus presents a risk of lack of consistency in its collection. To mitigate this human error, the OMPO-CDM is used for data collection. The OMOP-CDM (Observational Medical Outcomes Partnership Common Data Model) is a data standard designed to standardize the content and structure of observational data. It enables efficient analyses through the standard capture (same way across all participating institutions) of information for reliable evidence. In essence, OMOP-CDM ensures consistency in data collection across multiple institutions.

  • Naming a disease entails defining it, and designating a standard code for the disease improves care, patient engagement, and research by facilitating patient classification and information interchange. Phecodes are manually compiled diagnostic codes used in phenome-wide association studies (https://www.vumc.org/wei-lab/phecode). Phecode groups relevant ICD-10 and ICD-10-CM codes into clinically meaningful phenotypes, allowing researchers to leverage accumulated ICD-10 and ICD-10-CM data for phenome-wide association studies (PheWAS) using the electronic health record (EHR). The N3C is a testament to the partnership of over 290 organizations. The N3C is uniquely well-suited to describe early Long COVID ICD-10-CM code adoption due to its scale and demographic and geographic diversity. For this reason, a machine learning-based computable phenotype definition is applied to Long COVID using the N3C data.

  • N3C can be used in simulating Bayesian adaptive trial planning, causal analyses, predictive models (with regular updates) - decision support, searching for sites based on estimated case volume, pharmacovigilance, and spatio-temporal analyses.

  • Video Athena Search

  • OHDSI for CDM - Vocabulary table.

Logic Liaison templates

ECMO_during_covid_hospitalization_indicator, integer: Value of 1 if patient had observation, procedure, device_exposure, or condition related to "Kostka - ECMO" between first_COVID_hospitalization_start_date and first_COVID_hospitalization_end_date

The hospitalization timeframe can be adjusted using the template parameters in the visits_of_interest node or modified more in-detail by altering the code of that node itself. If you wish to find ECMO regardless of the hospitalization timeframes while still only looking after the Covid index date, you could modify the LL_concept_set_fusion_SNOMED dataset to be “during, post” or just “post” in the 4th (pre_during_post) column instead of just “during”.

The num_day_after_index is speaking to the number of days in which a hospitalization must start and the covid diagnosis must appear after the index event to qualify as a covid associated hospitalization only. The visits_of_interest node does not put a time limit for occurrence on any of the comorbidities as it is only looking at ED and hosp visits. Each flag for the comorbidity is created any time it is mentioned in the EHR and only split into occurrence before or post the covid index date.

Factors to consider when using database (for research)

Considerations for Using Real-World Dates from LL Source Data:

Dates are skewed by +/- 180 days when using de-identified L2 data. If using L2 data, be careful not to make any references in your reasoning to actual dates (such as "recent" occurrences, time series analysis, policy changes, the COVID-19 era, etc.). A few sites alter dates even in the limited dataset L3 data. To locate sites that change dates in L3 data, you must consult the manifest table (stored with the OMOP tables), and then you must decide as a research team whether you need to remove those sites.

Use cases and companion methods

Variable categories

  • Health system-related

    • Location
    • Care site
    • Provider
  • Patient-related

    • Death
    • Person
    • Specimen
    • Observation
    • Measurements
    • Drug exposure
    • Device exposure
    • Visit occurrence
    • Observation period
    • Condition occurrence

Limitations

Related publications / Literature

Data access

Institutions

An Institutional Data Use Agreement (DUA) with the National Center for Advancing Translational Sciences (NCATS) is required to get institutional access to N3C data [https://ncats.nih.gov/files/NCATS_N3C_Data_Use_Agreement.pdf]. Of importance, DUAs need to be signed by Authorized Institutional Officials who have the authority to bind all users at their institution to the terms of the DUA.

Researchers

For researchers to request access to N3C data, their home institutions must have an Institutional Data Use Agreement (DUA) in place with the National Center for Advancing Translational Sciences (NCATS) [See https://covid.cd2h.org/duas for the list of institutions with active DUAs]. Once a DUA is in place, researchers must register with N3C [https://labs.cd2h.org/registration/]. Of importance, researchers must register using the same email address used for their DUA. Upon approval, researchers will receive an email with directions for signing into their N3C Data Enclave account.

To access N3C data, researchers will need to complete the required training, including a course titled “Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher, Emergency Preparedness Refresher” before submitting a Data Use Request (DUR) [See https://irtsectraining.nih.gov/public.aspx]. Also, researchers who request access to de-identified data or the Limited Data Set (LDS) must have completed their institution’s recommended training requirements on human subjects research. Upon completing the training requirements, researchers must fill out and submit a DUR through the N3C Data Enclave.

For more information, visit https://ncats.nih.gov/n3c or https://covid.cd2h.org/

SporeData data dictionaries

References