Process Overview - National-Clinical-Cohort-Collaborative/Data-Ingestion-and-Harmonization GitHub Wiki
Data Ingestion and Harmonization Process Overview
-
Receive data and reply a “data-received" handshake notification confirming data has been received. Dataset details are noted on the manifest.csv
-
Reconstitute the CDM data into native database structure.
-
Apply COVID-19 specific code translation/ harmonization
-
tbd: generate drug_era, dose_era, condition_era derived tables.
-
Run CDM specific data quality checks – conduct baseline data integrity checks, at minimum check the data counts. Notify "data-ready-for-ingestion" status.
-
Check the currency of the value sets and dynamically generate the code look up table to map them into OMOP vocabulary.
-
Run ETL using the value set crosswalk table into OMOP database structure.
-
Run OMOP Data Characterization (DC) query to validate the data for final “data-accepted-for-research” status.
-
Contribute the dataset into N3C FedHub data repository.
-
Contributd the dataset into N3C FedHub safe Harbor data repository using the Safe Harbor rules.
-
Provide OMOP DC dashboard metrics on N3C FedHub data SFTP outbound folder for data Partner's access.