Initial EHR Ingestion [DRAFT] - Analyticsphere/ehr-pipeline-documentation GitHub Wiki

1. Schema validation

  • Use Python + DuckDB
  • Confirm delivery has all excepted v5.3 CDM fields and tables
  • Identify any fields and tables in delivery that do not exist in v5.3
  • Identify rows with invalid data types
  • Basic checks to ensure DQD will run (i.e. at least 1 run in cdm_source table, vocab tables are present, etc.)
  • Generate report:
    • Failing fields + rows with failing data types
    • Vocabulary version (select vocabulary_version from vocabulary where vocabulary_id = 'None')
    • Row counts per table

2. Load CSV files into BQ tables

  • Use Python + DuckDB to automate
  • Can do manually for now

3. Execute DataQualityDashboard

  • Collect/store results.json and results.csv files
  • Write results to BQ (manually as package doesn't work to do this automatically)

4. Execute Achilles

  • Run package as-is
  • Run achilles_results_concept_counts script (pull from Atlas package)

5. Refresh OHDSI tools

  • Make API calls to refresh ATLAS and Ares

Future state TODOs:

  • Harmonize vocabulary versions between different deliveries (how to map to new standards, etc.)
  • Set up Broadsea (ATLAS, HADES, Ares, etc.) on cloud VM
  • Containerize R, RStudio, Java, rJava, OHDSI library installation