Initial EHR Ingestion [DRAFT] - Analyticsphere/ehr-pipeline-documentation GitHub Wiki

Use Python + DuckDB
Confirm delivery has all excepted v5.3 CDM fields and tables
Identify any fields and tables in delivery that do not exist in v5.3
Identify rows with invalid data types
Basic checks to ensure DQD will run (i.e. at least 1 run in cdm_source table, vocab tables are present, etc.)
Generate report:
- Failing fields + rows with failing data types
- Vocabulary version (select vocabulary_version from vocabulary where vocabulary_id = 'None')
- Row counts per table

Harmonize vocabulary versions between different deliveries (how to map to new standards, etc.)
Set up Broadsea (ATLAS, HADES, Ares, etc.) on cloud VM
Containerize R, RStudio, Java, rJava, OHDSI library installation