Secrets Vault - department-of-veterans-affairs/abd-vro GitHub Wiki

Secrets are stored in a LHDI's HashiCorp Vault, which resides in the VA network. Secrets include credentials, tokens, and certificates for all deployment environments. Scripts and Helm configurations have been created to formalize and reproduce deployment of secrets to all LHDI environments.

HashiCorp Vault

Secrets for all LHDI deployment environments are stored in a single vault at https://ldx-mapi.lighthouse.va.gov/vault, which requires VA network access. Following the security principle of least privilege, only members of the VRO Admins GitHub Team can log in using their GitHub credentials. Log in to the web UI using these instructions using vro-admins (which corresponds to the new VRO Admins GitHub Team) as the "Role".

(Context: A separate VRO Admins GitHub Team was created to limit access to secrets. By default, LHDI allows all members of VA-ABD-RRD GitHub Team to have access to a vault store, which is contrary to the principle of least privilege. There's a vault store for va-abd-rrd but it is unused.)

In the Vault, secrets are organized under the deploy/ folder. Subfolders for each environment are used as follows:

  • default: provides default secrets for all environments; used for the LHDI dev environment
  • qa, sandbox, prod-test, prod: used for the respective LHDI environment and overrides any default secrets
    • Only differences from default secrets need to be present. As a result, there are few secrets in the qa environment and there is no dev subfolder.

Within each environment subfolder are other subfolders, which will be referred to as "groups". Each group contains key-value pairs. Typically the key is an environment variable name that is mapped verbatim for use by containers. Occasionally, Helm configurations map the secret to a different environment variable name as expected by different container -- for an example, search for DB_CLIENTUSER_NAME. In summary, the full Vault path to a group is $TEAM_NAME/deploy/$ENV/$GROUP. There are no subfolders deeper than the group level.

The groups are as follows:

  • db: secrets for VRO's database; maps to Kubernetes secret named vro-db
  • mq: secrets for the message queue; maps to Kubernetes secret named vro-mq
  • redis: secrets for the Redis cache; maps to Kubernetes secret named vro-redis
  • VRO_SECRETS_API, VRO_SECRETS_LH, VRO_SECRETS_MAS, ...: secrets used by VRO components; these VRO_SECRETS_* groups map to a Kubernetes secrets named vro-secrets-....
    • These VRO_SECRETS_* groups are treated differently than the above groups to allow new secrets to be added without having to update Helm configurations, thereby minimizing maintenance. Most new secrets will be added in these groups.
    • Unlike the other groups, each VRO_SECRETS_* group is passed as a single aggregate environment variable to VRO containers that use them, as specified in Helm configurations. For example, the VRO_SECRETS_API group maps to the VRO_SECRETS_API environment variable for the app container. The aggregate environment variable contains multiple export commands like export APIAUTH_KEY01=.... Upon startup, the container runs set-env-secrets.src to execute the export commands in the aggregate environment variable, resulting in exported environment variables (such as APIAUTH_KEY01) being available for the application.
    • To handle multiline strings and special characters, secret values can be base64-encoded. These secrets use a key name that ends with _BASE64 so that the set-k8s-secrets.sh script will decode the value properly and sets an environment variable without the _BASE64 suffix.

Unique key names to avoid collisions

While key-value pairs are organized in separate subfolders, the key names (which are typically used as environment variable names) should be unique within each LHDI environment to avoid any collisions when they are mapped to environment variables for Docker containers. For example, if there was a MY_SECRET key name in both the redis and VRO_SECRETS_API group subfolders AND a container uses both groups, then the container will only have one environment variable rather than the desired two. Note that this collision can also occur between MY_SECRET and MY_SECRET_BASE64 key names because the _BASE64 suffix is elided from the container's environment variable name.

Adding/Modifying a secret

Ask a VRO Admin to add, remove, or update the secret in Vault. Securely provide the secret for each LHDI environment -- minimally, one secret value for dev and another for prod.

  • If the secret is added to an existing VRO_SECRETS_* group, no Helm configuration changes are needed.
  • If the secret is added to another group, Helm configurations should be updated to use the new secret.

Run Deploy secrets from Vault for each LHDI environment to update the Kubernetes secrets. The Docker containers will not use the secrets until they are redeployed. This action is broken.

Updates to secrets not being propagated?

There are circumstances where Kubernetes logs "Error: couldn't find key VRO_SECRETS_... in Secret va-abd-rrd-.../vro-secrets" -- see Slack thread for screenshots. This occurred because a single aggregate vro-secrets secret was used for all VRO_SECRETS_* groups, as that introduces issues with propagation of secret updates because containers still references the old aggregate secret:

  • Symptom: Sometimes redeploying the pod works and sometimes it fails with this error.
  • Current hypothesis for this inconsistent error: If other running pods reference the vro-secrets secret, then old versions of it may be available and is being used by new pods. This article prompted the hypothesis.
  • Workaround: Restart all old pods that reference the vro-secrets secret. Then start everything back up. If a restart isn't sufficient, a complete shutdown of all pods may not be necessary to remove all references to the old secret.
  • Additionally, marking the secret immutable may be contributing to the use of old secrets because immutable secrets aren't expected to change, so any changes (included a destroy and re-create) are not propagated. As a result, the vro-secrets secret is marked mutable in set-k8-secrets.sh.

Now the vro-secrets-* secrets are individual secrets, where an individual secret is used by one or a very small number of containers. This reduces the number of containers that need to be shut down simultaneously to release all references to the old secret. This improvement should mitigate this probably of this problem.

Adding a non-secret environment variable

To set a non-secret environment variable for a container in an LHDI environment, add it to the relevant Helm chart(s) under helm/. If the variable value is different for each environment, also add it to helm/values-for-*.yaml files.

With that said, before adding an environment variable, please read the next section.

Configuration setting vs. Environment variable

It is preferred to use a configuration file scoped to only the application/microservice/container (e.g., Configuration Settings#springs-applicationyml).

An environment variable is needed when any of the following are true:

  • it is a secret (username, password, token, private certificate, ...) -- use Hashicorp Vault (as described on this page)
  • used by multiple containers -- set it in helm/values*.yaml files and reference it in Helm charts (under helm/)
  • needs to be manually changed in deployment environments -- let's discuss

We should minimize the number of unnecessary Helm configurations, which will reduce DevOps maintenance and overhead, and reduce the number of factors that can cause VRO deployments to fail.

Setting the Vault-token secret

A Vault token is needed to access vault. The automation (a self-hosted GitHub Runner) expects a the Vault token to be a Kubernetes secret named vro-vault in the LHDI dev environment. The token expires monthly. Run scripts/set-secret-vault-token.sh "$VAULT_TOKEN" to set the token, where $VAULT_TOKEN equals the string copied from the Vault web UI (click "Copy token" in the upper-right corner drop-down menu).

Setting Kubernetes access tokens

Kubernetes access tokens for each cluster (i.e., non-prod and prod) are needed to be able to deploy the secrets to the LHDI environments. The access tokens expire in 90 days. Run scripts/set-secret-kube-config.sh to set the devops-kubeconfig secret.

Setting GHCR secret

A GHCR secret in Kubernetes named devops-ghcr needs to be set for LHDI to pull images. Run scripts/set-secret-ghcr.sh "$ENV" "$PAT" for each LHDI environment, where $ENV is dev, qa, etc. The $PAT is a GitHub personal access token -- generate one using the abd-vro-machine account. This only needs to be run once (or every time the PAT expires).

FAQ

Why use Vault?

It is a centralized, secure location in the VA's network designed to hold secrets. From the Vault, secrets can be quickly and consistently redeployed to various LHDI environments in case they need to be reset or rotated.

Why use a self-hosted GitHub Runner?

Our GitHub Action workflow starts a self-host runner that runs within our LHDI dev environment to pull Vault secrets and set Kubernetes secrets, all within the VA's network. This is more secure than using HashiCorp's GitHub Action which would pull Vault secrets outside the VA network and into the GitHub Action workflow environment. The runner is a container in the vro-set-secrets-... Kubernetes pod and can deploy secrets to any LHDI environment when initiated by the Deploy secrets from Vault GitHub Action workflow. The pod is deployed to the dev LHDI environment (because that environment doesn't require SecRel-signed images) and can deploy secrets to other environments.

Why are some Kubernetes secrets immutable?

There has been unexplained occurrences where Kubernetes secrets have changed and caused problems. Making them immutable aims to reduce (but not entirely prevents) this problem.