MongoDB backup and restore procedure - dmwm/WMCore GitHub Wiki

MongoDB backup and restore procedure

The MongoDB as a Service, deployed on cmsweb k8s cluster, will have the following procedure to backup databases.

backup procedure

The backup procedure is automatic and will be done within cmsmongo image [1] and [2]

  • we will run cronjob within cmsmongo docker image to backup specific databases
    • backup MongoDB databases:
      • preprod databases: msPileupDBPreProd, msOutputDBPreProd and msUnmergedDBPreProd
      • production databases: msPileupDBProd, msOutputDBProd and msUnmergedDBProd
    • frequencies: 4 hour (production) and 12 hours for preproduction
    • retention policy: 10 days
    • EOS location: /eos/project-c/cmsweb-http/mongodb/backups/<preprod,prod>
restore procedure

The restore procedure will be manual and will be done by CMSWEB operators. It will include the following steps:

  • stop running MongoDB pods
    • if MongoDB pods are not accessible
      • remove MongoDB deployment
      • mount MongoDB PVC to some node (cinder volume)
      • remove existing MongoDB database file(s) on PVC storage
      • restore MongoDB database file(s) from backup
      • redeploy again MongoDB
    • if MongoDB pods are accessible
      • login to the pod
      • stop mongodb process
      • restore database files from backup
      • start mongodb process

Tools

We provide all tools to perform MongoDB backups, and they are accessible from cmsmongo image [1]:

  • mongo_manage.sh script to perform backup and restoration of MongoDB database to CERN EOS storage partition
0 2 * * * export AGE_KEY="/data/tools/age-key.txt" && /data/tools/mongo_manage.sh backup mongo.ini
  • the mongo.ini file contains necessary settings for MongoDB: URI, HOST, PORT, AUTHDB, USERNAME, PASSWORD, BACKUP_DIR, RS_NAME
  • age, sops encryption tools to manage credentials
  • amtool to send alerts to CMS Monitoring
    • alert.sh script wrapper to use amtool and send it over to CMS Monitoring

In addition, CMSWEB group till take care to monitor liveness of MongoDB pods/nodes via PodManager [3] and expiration of keytab file via cert-checker [4] tools.

Failures

In case of MongoDB backup failure, either due to node or pods issues the appropriate alert will be issued to CMS Monitoring Alert Manager and will be routed to two channels:

  • alerts-mongodb-k8s-cluster MM channel
  • and, to cms-service-webtools e-group

References:

  1. https://registry.cern.ch/harbor/projects/1771/repositories/cmsmongo/artifacts-tab
  2. https://github.com/dmwm/CMSKubernetes/tree/master/docker/mongodb
  3. https://github.com/vkuznet/PodManager
  4. https://github.com/vkuznet/cert-checker