Linkage and Deidentification - gpcnetwork/grouse-cms GitHub Wiki
Following the principles described in README/Linkage and Deidentification section, the linkage and deidentification process can be summarised in the diagram below:
- A: [load source] The source SDAs files were first uploaded to a designated, encrypted S3 bucket via secured upload (TLS/SSL)
- B: [configurattion and preparation] The same as steps B to D from the ELT process above
- C: [decrypt and decompress] Run
./src/stage/decrypt.py
in the configured developer environment - D: [extract and load] Run
./src/stage/stage_cms_care.py
,./src/stage/stage_xwalk.py
, and./src/stage/stage_cdm.py
to stage all needed source files onto snowflake - E-F: [match and align] Run stored procedures
./src/link_deid/stored_procedures/cdm_link_deid_stg.sql
and./src/link_deid/dml/cdm_link_deid_stg.sql
to create intermediate tables specifying all de-identification parameters - G-I: [deidentify and secure share] Run stored procedures
./src/link_deid/stored_procedures/cdm_link_deid.sql
and./src/link_deid/dml/cdm_link_deid.sql
and create LDS and De-identified tables and views of all the CDM data.