Database anonymization - CDCgov/prime-simplereport GitHub Wiki

Anonymize a database

Overview

  1. Check for sensitive fields using the detect script from PostgreSQL Anonymizer (Our script is working, but PostgreSQL Anonymizer is not, I've reported a bug (here)[https://gitlab.com/dalibo/postgresql_anonymizer/-/issues/300])
  2. Generate fake data (if needed)
  3. Create a db_dump from your source database using the steps described below
  4. Restore the anonymized database to a new database
  5. Sync/Create users in Okta and Non-production testing environment

To ensure access to any database created from an anonymized dump, please make sure you have an account in the source database. In the future, we can use the Okta API to grant proper permissions based on need. https://github.com/CDCgov/prime-simplereport/issues/3962

Create an anonymized local database

  1. start your database
  2. restore the snapshot
    1. Docker DB: yarn anon:dump
    2. Local DB: yarn anon:dump:localdb

Restore an anonymized local PostgreSQL dump

  1. start your database
  2. restore the snapshot
    1. yarn anon:restore
  3. Restart your apps

Automated report of potentially sensitive columns (only to be used as an aid and not a source of truth)

  1. start your database
  2. restore the snapshot
    1. yarn anon:detect

Completely removes anon from a database

  1. start your database
  2. restore the snapshot
    1. yarn anon:remove

At this point, it's a pile of scripts and some local docker changes, nothing that gets used in our automated processes or remote environments yet.