Database Restore - bcgov/common-service-showcase GitHub Wiki

Reference documentation:

Disaster Recovery and Cloning

Step 1: Identify Your Repository

You can restore the COMS test and production databases from either:

repo1 (backed up on PVC in OpenShift)
repo2 (stored in an S3 bucket)

Step 2: Update the Helm Chart

To prepare for restoration:

Access the corresponding namespace in your environment
Modify the Helm chart to enable restoration by setting enabled to true in the following code:
- GitHub Link

Step 3: Configure Restoration Options

In your Helm values file, locate the restoration section and modify it as follows:

restore:
  enabled: true  # Set to true to enable restoration
  repoName: repo1 # Change to `repo2` if restoring from S3
  options:
    - --type=time
    - --target="2024-10-28 14:15:11-04"  # Specify your desired timestamp

Step 4: Run the Helm Upgrade Command

Execute the following Helm command to initiate the restoration:

helm upgrade --install
  --atomic master common-object-management-service
  --namespace <namespace>
  --repo https://bcgov.github.io/common-object-management-service
  --values ./.github/environments/values.<env>.yaml
  --set image.repository=ghcr.io/bcgov
  --set image.tag=sha-$(git rev-parse --short HEAD)
  --set route.host=coms-<env>-master.apps.silver.devops.gov.bc.ca
  --set postgres.name=postgres-master
  --timeout 15m --wait

Step 5: Trigger the Restoration

To finalize the restoration process, run the following command to annotate the OpenShift namespace:

oc annotate -n <namespace> postgrescluster postgres-master --overwrite postgres-operator.crunchydata.com/pgbackrest-restore="$(date)"

This will trigger the restore operation using the specified repository and options.

Once the restore is complete, run Helm upgrade after setting restore enable to false in step 3

In some database restore scenarios, if the replica pod fails to start successfully, you can try adjusting the number of replicas in the Postgres-master YAML file.

Steps to Modify Replicas:

Set Replicas to 1:
- Update the replicas field in the Postgres-master configuration to 1. This allows the primary pod to run without the additional replicas.
```
spec:
  replicas: 1
```
Wait for Pods to Decommission:
- Monitor the status of the pods to ensure that the failed replicas are fully decommissioned.
Reset Replicas to Original Value:
- Once the failed pods have been removed, reset the replicas field back to its original value (e.g., 2 or 3).
```
spec:
  replicas: 2  # or 3, as needed
```

Following these steps can help mitigate issues during the recovery process and ensure the primary database instance is operational.