Backup Plan - FaqiangMei/MHA-Survey-Portal GitHub Wiki
This document outlines what data must be backed up, how often backups run, where artifacts are stored, and how to verify restores for the Health application.
- Protect production data (users, survey responses, feedback, notifications) against loss or corruption.
- Provide auditable retention windows for compliance with TAMU policies.
- Enable rapid recovery during outages or misconfigured deployments.
| Asset | Location | Notes |
|---|---|---|
| PostgreSQL database | Heroku Postgres | Primary system of record; includes roles, survey data, metrics |
| Uploaded files | Active Storage local/S3 (depending on environment) | If S3 is used, rely on bucket-level versioning; otherwise archive /storage/
|
| Application configuration |
config/credentials.yml.enc, .env files |
Store copies in secure vaults/password managers |
| Infrastructure configs |
docker-compose.yml, Procfile, scripts |
Version-controlled in Git; tag releases before major deployments |
-
Daily automated snapshot:
heroku pg:backups schedule DATABASE_URL --at '02:00 UTC' -
Weekly full export: Download Monday’s snapshot to encrypted storage (
heroku pg:backups:download) -
Before major releases: Run
heroku pg:backups:capture --app <app-name> - Retention policy: Keep 7 daily, 4 weekly, and 3 monthly snapshots; purge older dumps quarterly
- Responsible role: Primary on-call administrator (rotating), documented in runbook
-
Weekly snapshot:
heroku pg:backups:capture --app <staging-app> - Retention: Keep last 4 snapshots to reproduce test data if needed
- Optional: Mirror production daily schedule if staging stores critical pilot data
-
Ad-hoc: Developers may dump local data when reproducing issues
docker compose exec db pg_dump -U dev_user -F c health_development > backups/health-dev-$(date +%Y%m%d).dump
-
Ensure
backups/is ignored by Git and stored securely when containing real user data
- Encrypted S3 bucket (recommended) with limited IAM access and object versioning
- On-premises secure file share managed by TAMU IT, if cloud storage is restricted
- Track download locations and access logs for auditability
- Monthly restore test: Use staging to restore the most recent production snapshot and run smoke tests (sign in, load dashboards, generate reports)
-
Checksum verification: After download, run
shasumor equivalent to record integrity hashes - Alerting: Enable notifications for failed Heroku backups via add-ons (e.g., PGBackups to Slack/email)
-
Declare incident and notify stakeholders (product owner, advisors if impacted).
-
Stop writes (scale web dynos to zero or enable maintenance mode).
-
Identify the latest usable backup (
heroku pg:backups --app <app-name>). -
Restore snapshot:
heroku pg:backups:restore b123 DATABASE_URL --app <app-name> --confirm <app-name>
-
Monitor restore status (
heroku pg:backups:info b123). -
Run post-restore migrations:
heroku run rails db:migrate --app <app-name>. -
Validate functionality (sign-in, survey workflows, reporting exports).
-
Re-enable web dynos and communicate resolution.
-
Log the incident and lessons learned in the operations runbook.
- Maintain current on-call contacts and access credentials in
doc/operations.md(private repository). - Review this plan twice per semester to adjust for infrastructure changes or policy updates.
- Cross-reference with System Administration for broader maintenance procedures.