Process Overview - National-Clinical-Cohort-Collaborative/Data-Ingestion-and-Harmonization GitHub Wiki

Data Ingestion and Harmonization Process Overview

  • Receive data and reply a “data-received" handshake notification confirming data has been received. Dataset details are noted on the manifest.csv

  • Reconstitute the CDM data into native database structure.

  • Apply COVID-19 specific code translation/ harmonization

  • tbd: generate drug_era, dose_era, condition_era derived tables.

  • Run CDM specific data quality checks – conduct baseline data integrity checks, at minimum check the data counts. Notify "data-ready-for-ingestion" status.

  • Check the currency of the value sets and dynamically generate the code look up table to map them into OMOP vocabulary.

  • Run ETL using the value set crosswalk table into OMOP database structure.

  • Run OMOP Data Characterization (DC) query to validate the data for final “data-accepted-for-research” status.

  • Contribute the dataset into N3C FedHub data repository.

  • Contributd the dataset into N3C FedHub safe Harbor data repository using the Safe Harbor rules.

  • Provide OMOP DC dashboard metrics on N3C FedHub data SFTP outbound folder for data Partner's access.