Process Overview - National-Clinical-Cohort-Collaborative/Data-Ingestion-and-Harmonization GitHub Wiki

Data Ingestion and Harmonization Process Overview

Receive data and reply a “data-received" handshake notification confirming data has been received. Dataset details are noted on the manifest.csv
Reconstitute the CDM data into native database structure.
Apply COVID-19 specific code translation/ harmonization
tbd: generate drug_era, dose_era, condition_era derived tables.
Run CDM specific data quality checks – conduct baseline data integrity checks, at minimum check the data counts. Notify "data-ready-for-ingestion" status.
Check the currency of the value sets and dynamically generate the code look up table to map them into OMOP vocabulary.
Run ETL using the value set crosswalk table into OMOP database structure.
Run OMOP Data Characterization (DC) query to validate the data for final “data-accepted-for-research” status.
Contribute the dataset into N3C FedHub data repository.
Contributd the dataset into N3C FedHub safe Harbor data repository using the Safe Harbor rules.
Provide OMOP DC dashboard metrics on N3C FedHub data SFTP outbound folder for data Partner's access.