Disaster recovery plan - codemagic-ci-cd/company-handbook GitHub Wiki

Disaster recover objectives

RTO (Recovery Time Objective)

  • 8h (based on the largest SLA)

RPO (Recovery Point Objective)

  • 4 hours for users and teams data, application configurations.
  • 7 days for build history.

Major goals of a disaster recovery plan

Hardware and Software Inventory

information services backup overview

Storage: disaster recovery

Database: Backup and Restore policy

International escalations procedure

DRP steps

This plan assumes all MongoDB nodes are unavailable and describes replacing the cluster with restoring data from backup files.

  1. Checklist before start
    • GCP us-east1-b zone operational as expected
    • Granted access to download latest backup files
  2. Start new instances for the new cluster using the same instance type and MongoDB version as existing cluster.
  3. Provisioning new nodes to setup monitoring (see https://github.com/codemagic-dev/ansible/blob/main/setup_grafana_monitoring.yml)
  4. Ensure MongoDB connected to the new cluster.
  5. Download latest backup files to the master host.
  6. Restore the files in the following order:
    1. backup file with all prefix
    2. applications file
    3. teams file
    4. users file
    5. audit_log file
  7. Run the cluster and ensure data is available in app and vmm databases.
  8. Update DNS settings to point to new IP-addresses.
  9. Restart backend and worker services and monitor logs that MongoDB connection established successfully.

DRP test

  • The test should be conducted using standalone MongoDB configuration.
  • The test should be conducted using temporary VPC with default firewall settings prohibited any outside connection using MongoDB ports.
  • The test doesn’t include steps related to production environment, like: configure monitoring (step 3), DNS update (step 8), and restart production services (step 9).
  • The hosts and VPC should be deleted after the test results are recorded in this document.
    • Disk requirements: ~100GB available disk space