Production Database Migration Guide - sgajbi/portfolio-analytics-system GitHub Wiki
Database migrations in production must be safe, tested, and reversible. This guide outlines:
- Pre-checks before migration.
- Applying migrations with minimal downtime.
- Rollback and recovery strategy.
- Forward-only migrations in production (downgrades are for dev/staging only).
- Zero downtime goal — migrations should not block event processing for long periods.
- Pre-tested migrations — all changes must be validated in staging with production-like data before live deployment.
- Confirm migration script exists in
alembic/versions/
with descriptive message. - Validate migration correctness by running in staging.
- Check models match DB schema after applying migration.
- Ensure database backups are taken:
pg_dump -Fc -h <prod-host> -U <user> <db-name> > backup_before_migration.dump
-
Notify:
- DevOps team
- Application owners
- Operations team (Ops will monitor during migration)
-
Schedule maintenance window if downtime is expected.
-
For schema changes that affect write operations:
- Pause Kafka consumers (to avoid writes during migration).
- Disable CI/CD auto-deploy temporarily.
- Execute Alembic upgrade:
alembic upgrade head
- Check table definitions:
\d processed_events;
- Confirm new columns/indexes exist.
- Check existing records for integrity.
- Restart Kafka consumers.
- Re-enable CI/CD deployments.
- Trigger a test transaction in ingestion.
- Confirm correlation ID and idempotent processing still work.
- Query API to confirm results:
curl -H "X-Correlation-ID: TEST:uuid" https://<api>/portfolios/1001/positions
- Monitor logs in Splunk for errors.
- Check Kafka consumer lag.
- Verify no increase in DB error rates.
⚠ Direct downgrade is discouraged in production. Instead:
-
Apply forward-only fix migration if rollback is needed.
-
If data corruption occurs:
- Restore from backup.
- Redeploy previous stable application version.
- Group non-breaking changes (adding columns, indexes) separately from breaking changes (renaming/dropping columns).
- Use default values + nullable columns for safe schema evolution.
- Large data changes (backfills) should be handled via batch jobs, not Alembic migrations.