Linkage and Deidentification - gpcnetwork/grouse-cms GitHub Wiki

Following the principles described in README/Linkage and Deidentification section, the linkage and deidentification process can be summarised in the diagram below:

linkage-deid

  • A: [load source] The source SDAs files were first uploaded to a designated, encrypted S3 bucket via secured upload (TLS/SSL)
  • B: [configurattion and preparation] The same as steps B to D from the ELT process above
  • C: [decrypt and decompress] Run ./src/stage/decrypt.py in the configured developer environment
  • D: [extract and load] Run ./src/stage/stage_cms_care.py, ./src/stage/stage_xwalk.py, and ./src/stage/stage_cdm.py to stage all needed source files onto snowflake
  • E-F: [match and align] Run stored procedures ./src/link_deid/stored_procedures/cdm_link_deid_stg.sql and ./src/link_deid/dml/cdm_link_deid_stg.sqlto create intermediate tables specifying all de-identification parameters
  • G-I: [deidentify and secure share] Run stored procedures ./src/link_deid/stored_procedures/cdm_link_deid.sql and ./src/link_deid/dml/cdm_link_deid.sql and create LDS and De-identified tables and views of all the CDM data.
⚠️ **GitHub.com Fallback** ⚠️