backups vault - juancamilocc/virtual_resources GitHub Wiki
In this guide, you will learn how to set up an automated backup for Vault server in Kubernetes environment saving it in the cloud storage for different providers like AWS, GCP, Azure and OCI.
The Bash script that automates the backup process is shown below.
NOTE: This script assumes that you have deployed a Vault Server in a vault
namespace.
#!/bin/bash
SNAPSHOT_FILE="vault-snapshot-$(date +%F).snap"
VAULT_ADDR="vault.vault.svc.cluster.local:8200"
BUCKET_NAME="backup-vault-juancamilocc"
CLOUD_PROVIDER=${1:-"aws"}
echo "Finding Vault leader pod..."
VAULT_PODS=$(kubectl get pods -n vault -l app.kubernetes.io/name=vault -o jsonpath='{.items[*].metadata.name}')
LEADER_POD=""
for POD in $VAULT_PODS; do
if kubectl exec -n vault $POD -- vault operator raft list-peers 2>/dev/null | grep -q "leader"; then
LEADER_POD=$POD
break
fi
done
if [ -z "$LEADER_POD" ]; then
echo "Error: No leader pod found. Exiting."
exit 1
fi
echo "Leader pod found: $LEADER_POD"
echo "Logging into Vault..."
kubectl -n vault exec $LEADER_POD -- vault login $VAULT_TOKEN
if [ $? -ne 0 ]; then
echo "Error: Vault login failed. Exiting."
exit 1
fi
echo "Creating snapshot file..."
kubectl -n vault exec $LEADER_POD -- vault operator raft snapshot save /tmp/$SNAPSHOT_FILE
if [ $? -ne 0 ]; then
echo "Error: Snapshot creation failed. Exiting."
exit 1
fi
echo "Copying snapshot file from leader pod..."
kubectl cp vault/$LEADER_POD:/tmp/$SNAPSHOT_FILE $SNAPSHOT_FILE
if [ $? -ne 0 ]; then
echo "Error: Failed to copy snapshot file. Exiting."
exit 1
fi
echo "Uploading snapshot to bucket ($CLOUD_PROVIDER)..."
case "$CLOUD_PROVIDER" in
aws)
aws s3 cp "$SNAPSHOT_FILE" s3://$BUCKET_NAME/
;;
gcp)
gsutil cp "$SNAPSHOT_FILE" gs://$BUCKET_NAME/
;;
azure)
az storage blob upload --account-name <account-name> --container-name $BUCKET_NAME --file "$SNAPSHOT_FILE" --name "$(basename "$SNAPSHOT_FILE")"
;;
oci)
oci os object put --bucket-name "$BUCKET_NAME" --file "$SNAPSHOT_FILE" --name "$(basename "$SNAPSHOT_FILE")"
;;
*)
echo "Error: Unsupported cloud provider '$CLOUD_PROVIDER'. Exiting."
exit 1
;;
esac
if [ $? -ne 0 ]; then
echo "Error: Failed to upload snapshot. Exiting."
exit 1
fi
echo "Backup completed successfully!"
Let's go through each step of the script.
First, we define the important variables like snapshot file with its date, Vault internal address in Kubernetes, bucket name where we will save the snapshot file and finally the cloud provider by default, in this case AWS.
SNAPSHOT_FILE="vault-snapshot-$(date +%F).snap"
VAULT_ADDR="vault.vault.svc.cluster.local:8200"
BUCKET_NAME="backup-vault"
CLOUD_PROVIDER=${1:-"aws"}
Get the Vault pods and filter by leader, we do this because the Raft storage just allows us to make snapshots from the leader pod and this can change dinamically due to unexpected restarts of pods or failures.
echo "Finding Vault leader pod..."
VAULT_PODS=$(kubectl get pods -n vault -l app.kubernetes.io/name=vault -o jsonpath='{.items[*].metadata.name}')
LEADER_POD=""
for POD in $VAULT_PODS; do
if kubectl exec -n vault $POD -- vault operator raft list-peers 2>/dev/null | grep -q "leader"; then
LEADER_POD=$POD
break
fi
done
if [ -z "$LEADER_POD" ]; then
echo "Error: No leader pod found. Exiting."
exit 1
fi
Logging into the Vault leader pod and creating a snapshot.
echo "Logging into Vault..."
kubectl -n vault exec $LEADER_POD -- vault login $VAULT_TOKEN
if [ $? -ne 0 ]; then
echo "Error: Vault login failed. Exiting."
exit 1
fi
echo "Creating snapshot file..."
kubectl -n vault exec $LEADER_POD -- vault operator raft snapshot save /tmp/$SNAPSHOT_FILE
if [ $? -ne 0 ]; then
echo "Error: Snapshot creation failed. Exiting."
exit 1
fi
Finally, we copy the snapshot and upload it to the appropriate cloud provider's bucket.
echo "Copying snapshot file from leader pod..."
kubectl cp vault/$LEADER_POD:/tmp/$SNAPSHOT_FILE $SNAPSHOT_FILE
if [ $? -ne 0 ]; then
echo "Error: Failed to copy snapshot file. Exiting."
exit 1
fi
echo "Uploading snapshot to bucket ($CLOUD_PROVIDER)..."
case "$CLOUD_PROVIDER" in
aws)
aws s3 cp "$SNAPSHOT_FILE" s3://$BUCKET_NAME/
;;
gcp)
gsutil cp "$SNAPSHOT_FILE" gs://$BUCKET_NAME/
;;
azure)
az storage blob upload --account-name <account-name> --container-name $BUCKET_NAME --file "$SNAPSHOT_FILE" --name "$(basename "$SNAPSHOT_FILE")"
;;
oci)
oci os object put --bucket-name "$BUCKET_NAME" --file "$SNAPSHOT_FILE" --name "$(basename "$SNAPSHOT_FILE")"
;;
*)
echo "Error: Unsupported cloud provider '$CLOUD_PROVIDER'. Exiting."
exit 1
;;
esac
if [ $? -ne 0 ]; then
echo "Error: Failed to upload snapshot. Exiting."
exit 1
fi
echo "Backup completed successfully!"
You can execute the script according to your cloud, as follows.
./backup-vault.sh # For AWS S3
./backup-vault.sh gcp # For Google Cloud Storage
./backup-vault.sh azure # For Azure Blob Storage
./backup-vault.sh oci # For OCI Object Storage
The snapshot file will look similar like this.
Use the following Dockerfile. You must also uncomment specific lines depending on your cloud provider.
# Choose your cloud provider
# AWS Cloud
# FROM amazon/aws-cli
# GCP Cloud
# FROM google/cloud-sdk
# Azure Cloud
# FROM chainguard/az:latest-dev
# OCI Cloud
# FROM ghcr.io/oracle/oci-cli:latest
# ENV OCI_CLI_SUPPRESS_FILE_PERMISSIONS_WARNING=True
USER root
COPY backup-vault.sh backup-vault.sh
# AWS Cloud
# RUN yum install -y unzip curl
# Azure or GCP Cloud
# RUN apt-get update && apt-get install -y curl
# OCI Cloud
# RUN dnf update -y && dnf install -y curl bash-completion
RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" && \
chmod +x kubectl && \
mv kubectl /usr/local/bin/
ENTRYPOINT ["bash", "backup-vault.sh", "<YOUR_CLOUD>"]
In this section, we will create a token with limited permissions only to make snapshots.
Enter to the vault, as follows.
kubectl -n vault exec -it vault-0 -- sh
/ $ vault login
# Token (will be hidden): <root_token>
# Success! You are now authenticated. The token information displayed below
# is already stored in the token helper. You do NOT need to run "vault login"
# again. Future Vault requests will automatically use this token.
Now, let's create a limited token.
vault policy write backup-policy - <<EOF
# Allow to create snapshots
path "sys/storage/raft/snapshot" {
capabilities = ["create", "read"]
}
# Allow to restore snapshots
path "sys/storage/raft/snapshot/restore" {
capabilities = ["create"]
}
# Negate access to rest paths
path "*" {
capabilities = []
}
EOF
# Success! Uploaded policy: backup-policy
And create a token based on the previous policy.
vault token create -policy="backup-policy" -ttl="0"
# Key Value
# --- -----
# token hvs.CAESIAdIgTXydtk1nF7gqJXYNzSA-E3oIiDetED8G4oV3vUdGh4KHGh2cy5rMFUyeFdDdkl6U24zZFQza3VnbFVoSXU
# token_accessor s0RsNTj6USHtOVEV3lyks4sX
# token_duration 768h
# token_renewable true
# token_policies ["backup-policy" "default"]
# identity_policies []
# policies ["backup-policy" "default"]
In this case the token is hvs.CAESIAdIgTXydtk1nF7gqJXYNzSA-E3oIiDetED8G4oV3vUdGh4KHGh2cy5rMFUyeFdDdkl6U24zZFQza3VnbFVoSXU
.
We need to define an RBAC configuration to allow a ServiceAccount to execute commands on the Vault pods.
apiVersion: v1
kind: ServiceAccount
metadata:
name: vault-admin
namespace: vault
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: vault
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create", "get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-reader-binding
namespace: vault
subjects:
- kind: ServiceAccount
name: vault-admin
namespace: vault
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Define a configmap to store the kube-config.
kube-config-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-configmap
namespace: vault
data:
CONFIG: |
<here kubeconfig>
Define a secret for the Vault token created previously.
apiVersion: v1
kind: Secret
metadata:
name: vault-token
namespace: vault
data:
VAULT_TOKEN: <encoded-base64 limited vault token>
type: Opaque
For aws config file, as aws-conf.yaml.
NOTE: You should take into account that the AWS config file looks like this:
[default]
aws_access_key_id = <tu_access_key_id>
aws_secret_access_key = <tu_secret_access_key>
region = <region>
echo "<aws-config-file>" | base64
# base64 content
aws-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: aws-config-secret
namespace: vault
data:
config: <encoded-base64 aws config-file>
type: Opaque
Then, we can set up the cronjob, as follows.
aws-backup-vault-cj.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: vault-backup-cron
namespace: vault
spec:
schedule: "0 23 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: vault-admin
containers:
- name: backup-vault
image: <aws-backup-image> # This is defined in the Dockerfile section
imagePullPolicy: Always
env:
- name: VAULT_TOKEN
valueFrom:
secretKeyRef:
name: vault-token
key: VAULT_TOKEN
volumeMounts:
- name: aws-config-secret
mountPath: /root/.aws/config
subPath: config
- name: kube-configmap
mountPath: /root/.kube/
restartPolicy: OnFailure
volumes:
- name: aws-config-secret
secret:
secretName: aws-config-secret
items:
- key: config
path: config
- name: kube-configmap
configMap:
name: kube-configmap
items:
- key: CONFIG
path: config
defaultMode: 0600
Deploy all resources.
kubectl apply -f aws-conf.yaml
kubectl apply -f kube-config-cm.yaml
kubectl apply -f aws-backup-vault-cj.yaml
For gcp config file, as gcp-conf.yaml
NOTE: You should take into account, the gcp config file looks like this:
{
"type": "service_account",
"project_id": "PROJECT_ID",
"private_key_id": "KEY_ID",
"private_key": "-----BEGIN PRIVATE KEY-----\nPRIVATE_KEY\n-----END PRIVATE KEY-----\n",
"client_email": "SERVICE_ACCOUNT_EMAIL",
"client_id": "CLIENT_ID",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/SERVICE_ACCOUNT_EMAIL"
}
echo "<gcp-config-file>" | base64
# base64 content
gcp-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: gcp-config-secret
namespace: vault
data:
service-account.json: <encoded-base64 gcp-config-file>
type: Opaque
Then, we can set up the cronjob, as follows.
gcp-backup-vault-cj.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: vault-backup-cron
namespace: vault
spec:
schedule: "0 23 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: vault-admin
containers:
- name: backup-vault
image: <gcp-backup-image> # This is defined in the Dockerfile section
imagePullPolicy: Always
env:
- name: VAULT_TOKEN
valueFrom:
secretKeyRef:
name: vault-token
key: VAULT_TOKEN
volumeMounts:
- name: gcp-config-secret
mountPath: /root/.config/gcloud/application_default_credentials.json
subPath: service-account.json
- name: kube-configmap
mountPath: /root/.kube/
restartPolicy: OnFailure
volumes:
- name: gcp-config-secret
secret:
secretName: gcp-config-secret
items:
- key: service-account.json
path: service-account.json
- name: kube-configmap
configMap:
name: kube-configmap
items:
- key: CONFIG
path: config
defaultMode: 0600
Deploy all resources.
kubectl apply -f gcp-conf.yaml
kubectl apply -f kube-config-cm.yaml
kubectl apply -f gcp-backup-vault-cj.yaml
For azure config file, as az-conf.yaml
NOTE: You should take into account, the azure config file looks like this:
{
"clientId": "CLIENT_ID",
"clientSecret": "CLIENT_SECRET",
"tenantId": "TENANT_ID",
"subscriptionId": "SUBSCRIPTION_ID"
}
echo "<azure-config-file>" | base64
# base64 content
az-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: azure-config-secret
namespace: vault
data:
azure-config.json: <encoded-base64 azure-config-file>
type: Opaque
Then, we can set up the cronjob, as follows.
az-backup-vault-cj.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: vault-backup-cron
namespace: vault
spec:
schedule: "0 23 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: vault-admin
containers:
- name: backup-vault
image: <az-backup-image> # This is defined in the Dockerfile section
imagePullPolicy: Always
env:
- name: VAULT_TOKEN
valueFrom:
secretKeyRef:
name: vault-token
key: VAULT_TOKEN
volumeMounts:
- name: azure-config-secret
mountPath: /root/.azure/azure.json
subPath: azure-config.json
- name: kube-configmap
mountPath: /root/.kube/
restartPolicy: OnFailure
volumes:
- name: azure-config-secret
secret:
secretName: azure-config-secret
items:
- key: azure-config.json
path: azure-config.json
- name: kube-configmap
configMap:
name: kube-configmap
items:
- key: CONFIG
path: config
defaultMode: 0600
Deploy all resources.
kubectl apply -f az-conf.yaml
kubectl apply -f kube-config-cm.yaml
kubectl apply -f az-backup-vault-cj.yaml
For OCI config file, as oci-config.yaml
NOTE: You should take into account, the OCI config file looks like this:
[DEFAULT]
user=<userid>
fingerprint=<fingerprint>
key_file=/root/.oci/oci_api_key.pem
tenancy=<tenancy>
region=<region>
echo "<oci-config-file>" | base64
# base64 content
oci-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: oci-conf
namespace: vault
data:
oci_config: <encoded-base64 oci-conf-file>
type: Opaque
For oci key-secret, as oci-key-secret.yam
.
apiVersion: v1
kind: Secret
metadata:
name: oci-key
namespace: vault
data:
oci_api_key.pem: <base64-encoded key-secret>
type: Opaque
Then, we can set up the cronjob, as follows.
oci-backup-vault-cj.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: vault-backup-cron
namespace: vault
spec:
schedule: "0 23 * * *"
jobTemplate:
spec:
template:
spec:
priorityClassName: workflow-controller
serviceAccountName: vault-admin
containers:
- name: backup-vault
image: <oci-backup-image> # This is defined in the Dockerfile section
imagePullPolicy: Always
env:
- name: VAULT_TOKEN
valueFrom:
secretKeyRef:
name: vault-token
key: VAULT_TOKEN
volumeMounts:
- name: oci-key
mountPath: /root/.oci/oci_api_key.pem
subPath: oci_api_key.pem
- name: oci-conf
mountPath: /root/.oci/config
subPath: config
- name: kube-configmap
mountPath: /root/.kube/
restartPolicy: OnFailure
volumes:
- name: oci-key
secret:
secretName: oci-key
items:
- key: oci_api_key.pem
path: oci_api_key.pem
defaultMode: 256
- name: oci-conf
configMap:
name: oci-conf
items:
- key: oci_config
path: config
- name: kube-configmap
configMap:
name: kube-configmap
items:
- key: CONFIG
path: config
defaultMode: 0600
Deploy all resources.
kubectl apply -f oci-conf.yaml
kubectl apply -f oci-key-secret.yaml
kubectl apply -f kube-config-cm.yaml
kubectl apply -f oci-backup-vault-cj.yaml
Finally, we can verify the snapshot file in the bucket, as follows.
In this case, you need to download the snapshot stored in the bucket and follow these steps:
# Copy the snapshot from your local machine to the Vault pod
kubectl cp snapshot.snap vault/vault-0:/tmp/snapshot.snap
# Restore the snapshot
kubectl -n vault exec vault-0 -- vault operator raft snapshot restore /tmp/snapshot.snap
You will most likely receive an error message stating that the operation is not possible due to mismatched keys. In this case, simply add the --force
flag. Don't worry—Vault will restore the data without issues since this is a new Vault server.
kubectl -n vault exec vault-0 vault operator raft snapshot restore /tmp/snapshot.snap --force
Automating Vault server backups with Raft storage in a Kubernetes environment ensures data security and disaster recovery preparedness. This guide demonstrated a Bash script to create snapshots from the Vault leader pod and store them in a cloud provider of choice, enhancing resilience and compliance.