Secrets are stored in a [LHDI's HashiCorp Vault](https://animated-carnival-57b3e7f5.pages.github.io/platform-tools/vault/), which resides in the VA network. Secrets include credentials, tokens, and certificates for all deployment environments. Scripts and Helm configurations have been created to formalize and reproduce deployment of secrets to all LHDI environments. ## HashiCorp Vault Secrets for all LHDI deployment environments are stored in a single vault at https://ldx-mapi.lighthouse.va.gov/vault, which requires VA network access. Following the security [principle of least privilege](https://csrc.nist.gov/glossary/term/least_privilege), only members of the [VRO Admins GitHub Team](https://github.com/orgs/department-of-veterans-affairs/teams/vro-admins/members) can log in using their GitHub credentials. Log in to the web UI using [these instructions](https://animated-carnival-57b3e7f5.pages.github.io/platform-tools/vault/#using-the-vault-ui) using `vro-admins` (which corresponds to the new [VRO Admins GitHub Team](https://github.com/orgs/department-of-veterans-affairs/teams/vro-admins/members)) as the "Role". (Context: A separate [VRO Admins GitHub Team](https://github.com/orgs/department-of-veterans-affairs/teams/vro-admins/members) was created to limit access to secrets. By default, LHDI allows [all members of VA-ABD-RRD GitHub Team](https://github.com/orgs/department-of-veterans-affairs/teams/va-abd-rrd/members) to have access to a vault store, which is contrary to the principle of least privilege. There's a vault store for `va-abd-rrd` but it is unused.) In the Vault, secrets are organized under the `deploy/` folder. Subfolders for each environment are used as follows: - `default`: provides default secrets for all environments; used for the LHDI `dev` environment - `qa`, `sandbox`, `prod-test`, `prod`: used for the respective LHDI environment and overrides any default secrets - Only differences from default secrets need to be present. As a result, there are few secrets in the `qa` environment and there is no `dev` subfolder. Within each environment subfolder are other subfolders, which will be referred to as "groups". Each group contains key-value pairs. Typically the key is an environment variable name that is mapped verbatim for use by containers. Occasionally, Helm configurations map the secret to a different environment variable name as expected by different container -- for an example, search for `DB_CLIENTUSER_NAME`. In summary, the full Vault path to a group is `$TEAM_NAME/deploy/$ENV/$GROUP`. There are no subfolders deeper than the group level. The groups are as follows: - `db`: secrets for VRO's database; maps to Kubernetes secret named `vro-db` - `mq`: secrets for the message queue; maps to Kubernetes secret named `vro-mq` - `redis`: secrets for the Redis cache; maps to Kubernetes secret named `vro-redis` - `VRO_SECRETS_API`, `VRO_SECRETS_LH`, `VRO_SECRETS_MAS`, ...: secrets used by VRO components; these `VRO_SECRETS_*` groups map to a Kubernetes secrets named `vro-secrets-...`. - These `VRO_SECRETS_*` groups are treated differently than the above groups to allow new secrets to be added without having to update Helm configurations, thereby minimizing maintenance. Most new secrets will be added in these groups. - Unlike the other groups, each `VRO_SECRETS_*` group is passed as a single aggregate environment variable to VRO containers that use them, as specified in Helm configurations. For example, the `VRO_SECRETS_API` group maps to the `VRO_SECRETS_API` environment variable for the `app` container. The aggregate environment variable contains _multiple_ export commands like `export APIAUTH_KEY01=...`. Upon startup, the container runs `set-env-secrets.src` to execute the export commands in the aggregate environment variable, resulting in exported environment variables (such as `APIAUTH_KEY01`) being available for the application. - To handle multiline strings and special characters, secret values can be base64-encoded. These secrets use a key name that ends with `_BASE64` so that the `set-k8s-secrets.sh` script will decode the value properly and sets an environment variable *without* the `_BASE64` suffix. ## Unique key names to avoid collisions While key-value pairs are organized in separate subfolders, the key names (which are typically used as environment variable names) should be unique within each LHDI environment to avoid any collisions when they are mapped to environment variables for Docker containers. For example, if there was a `MY_SECRET` key name in both the `redis` and `VRO_SECRETS_API` group subfolders AND a container uses both groups, then the container will only have one environment variable rather than the desired two. Note that this collision can also occur between `MY_SECRET` and `MY_SECRET_BASE64` key names because the `_BASE64` suffix is elided from the container's environment variable name. ## Adding/Modifying a secret Ask a [VRO Admin](https://github.com/orgs/department-of-veterans-affairs/teams/vro-admins/members) to add, remove, or update the secret in Vault. Securely provide the secret for each LHDI environment -- minimally, one secret value for `dev` and another for `prod`. - If the secret is added to an existing `VRO_SECRETS_*` group, no Helm configuration changes are needed. - If the secret is added to another group, Helm configurations should be updated to use the new secret. ~~Run [Deploy secrets from Vault](https://github.com/department-of-veterans-affairs/abd-vro-internal/actions/workflows/deploy-secrets.yml) for each LHDI environment to update the Kubernetes secrets. The Docker containers will not use the secrets until they are redeployed.~~ This action is broken. ### Updates to secrets not being propagated? There are circumstances where Kubernetes logs "Error: couldn't find key VRO_SECRETS_... in Secret va-abd-rrd-.../vro-secrets" -- see [Slack thread](https://dsva.slack.com/archives/C04QLHM9LR0/p1689634792401989?thread_ts=1689604659.611619&cid=C04QLHM9LR0) for screenshots. This occurred because a single aggregate `vro-secrets` secret was used for all `VRO_SECRETS_*` groups, as that introduces issues with propagation of secret updates because containers still references the old aggregate secret: - **Symptom**: Sometimes redeploying the pod works and sometimes it fails with this error. - **Current hypothesis** for this inconsistent error: If other running pods reference the `vro-secrets` secret, then old versions of it may be available and is being used by new pods. [This article](https://medium.com/devops-dudes/how-to-propagate-a-change-in-kubernetes-secrets-by-restarting-dependent-pods-b71231827656) prompted the hypothesis. - **Workaround**: Restart all old pods that reference the `vro-secrets` secret. Then start everything back up. If a restart isn't sufficient, a complete shutdown of all pods may not be necessary to remove all references to the old secret. - Additionally, marking the secret [immutable](https://kubernetes.io/docs/concepts/configuration/secret/#secret-immutable-create) may be contributing to the use of old secrets because immutable secrets aren't expected to change, so any changes (included a destroy and re-create) are not propagated. As a result, the `vro-secrets` secret is marked mutable in `set-k8-secrets.sh`. Now the `vro-secrets-*` secrets are individual secrets, where an individual secret is used by one or a very small number of containers. This reduces the number of containers that need to be shut down simultaneously to release all references to the old secret. This improvement should mitigate this probably of this problem. ## Adding a non-secret environment variable To set a non-secret environment variable for a container in an LHDI environment, add it to the relevant Helm chart(s) under `helm/`. If the variable value is different for each environment, also add it to `helm/values-for-*.yaml` files. With that said, before adding an environment variable, please read the next section. ## Configuration setting vs. Environment variable It is preferred to use a configuration file scoped to only the application/microservice/container (e.g., [[Configuration Settings#springs-applicationyml]]). An environment variable is needed when any of the following are true: - it is a secret (username, password, token, private certificate, ...) -- use Hashicorp Vault (as described on this page) - used by multiple containers -- set it in `helm/values*.yaml` files and reference it in Helm charts (under `helm/`) - needs to be manually changed in deployment environments -- let's discuss We should minimize the number of unnecessary Helm configurations, which will reduce DevOps maintenance and overhead, and reduce the number of factors that can cause VRO deployments to fail. ## Setting the Vault-token secret A Vault token is needed to access vault. The automation (a self-hosted GitHub Runner) expects a the Vault token to be a Kubernetes secret named `vro-vault` in the LHDI `dev` environment. The token expires monthly. Run `scripts/set-secret-vault-token.sh "$VAULT_TOKEN"` to set the token, where `$VAULT_TOKEN` equals the string copied from the Vault web UI (click "Copy token" in the upper-right corner drop-down menu). ## Setting Kubernetes access tokens Kubernetes access tokens for each cluster (i.e., non-prod and prod) are needed to be able to deploy the secrets to the LHDI environments. The access tokens expire in 90 days. Run `scripts/set-secret-kube-config.sh` to set the `devops-kubeconfig` secret. ## Setting GHCR secret A GHCR secret in Kubernetes named `devops-ghcr` needs to be set for LHDI to pull images. Run `scripts/set-secret-ghcr.sh "$ENV" "$PAT"` for each LHDI environment, where `$ENV` is `dev`, `qa`, etc. The `$PAT` is a GitHub personal access token -- generate one using the `abd-vro-machine` account. This only needs to be run once (or every time the PAT expires). ## FAQ ### Why use Vault? It is a centralized, secure location in the VA's network designed to hold secrets. From the Vault, secrets can be quickly and consistently redeployed to various LHDI environments in case they need to be reset or rotated. ### Why use a self-hosted GitHub Runner? Our GitHub Action workflow starts a self-host runner that runs within our LHDI `dev` environment to pull Vault secrets and set Kubernetes secrets, all within the VA's network. This is more secure than using HashiCorp's GitHub Action which would pull Vault secrets outside the VA network and into the GitHub Action workflow environment. The runner is a container in the `vro-set-secrets-...` Kubernetes pod and can deploy secrets to any LHDI environment when initiated by the [Deploy secrets from Vault](https://github.com/department-of-veterans-affairs/abd-vro-internal/actions/workflows/deploy-secrets.yml) GitHub Action workflow. The pod is deployed to the `dev` LHDI environment (because that environment doesn't require SecRel-signed images) and can deploy secrets to other environments. ### Why are some Kubernetes secrets immutable? There has been unexplained occurrences where Kubernetes secrets have changed and caused problems. Making them immutable aims to reduce (but not entirely prevents) this problem.