Openshift ‐ Migration to a new cluster - bcgov/SIMS GitHub Wiki

Guide to migrate

Part 1 - One time setup for a new cluster(Does not repeat for all environments)

  • Create service account: Create a service account in tools namespace by executing the make command from devops folder. make create-service-account SERVICE_ACCOUNT_NAMESPACE="$TOOLS_NAMESPACE" SERVICE_ACCOUNT_NAME="github-action"
  • Generate service account token: Generate a service account token by executing the make command from devops folder. make generate-service-account-token SERVICE_ACCOUNT_NAMESPACE="$TOOLS_NAMESPACE" SERVICE_ACCOUNT_NAME="github-action" Note: The token is set to expire every six months and we must regenerate the token and update GH Secret SA_TOKEN in repository.
  • SFTP Access from new cluster: Request to allow OC cluster IPs to access SFTP Server.

Part 2 - Pre migration setup

Section1: Provide roles for target namespace

The github service accounts requires permission to create/update/delete resources in target namespace.

  • Add role edit for Service account on target namespace: execute the following command from devops folder.
# NAMESPACE="e0a504-dev | e0a504-test | e0a504-prod" (Target namespace)
make add-role-edit-to-service-account NAMESPACE="$NAMESPACE" SERVICE_ACCOUNT_NAMESPACE="e0a504-tools" SERVICE_ACCOUNT_NAME="github-action"
  • Add role image-puller for target namespace on tools: execute the following command from devops folder.
# TARGET_NAMESPACE="e0a504-dev | e0a504-test | e0a504-prod" (Target namespace)
make add-role-image-puller-to-target-namespace TARGET_NAMESPACE="$NAMESPACE" IMAGE_SOURCE_NAMESPACE="e0a504-tools"

Section 2: Network policies

  • Run the following command from devops directory:
# NAMESPACE="e0a504-dev | e0a504-test | c85dee-test | e0a504-prod" (Target namespace)
make create-network-policies NAMESPACE="$NAMESPACE"

Section3: Temporary Github environment

  • Create/Re-use a temporary Github environment(e.g. DEV-TEMP | DEV-GOLD)
  • Add minimal secrets and variables to the temporary env with values copied from an equivalent existing environment(DEV TEST STG PROD).

Minimal secret setup:

Secrets:

OPENSHIFT_ENV_NAMESPACE=e0a504-dev
SA_TOKEN=
HOST_PREFIX=
MONGODB_URI=
FORMS_SA_USER_NAME=
FORMS_SA_PASSWORD=
S3_ACCESS_KEY_ID=
S3_SECRET_ACCESS_KEY=
FORMS_DB_USER=
FORMS_DB_PASSWORD=

Variables:

BUILD_NAMESPACE=e0a504-tools
OPENSHIFT_CLUSTER_URL=
OPENSHIFT_LICENSE_PLATE=e0a504
FORMIO_CPU_REQUEST=
FORMIO_MEMORY_REQUEST=
FORMIO_MEMORY_LIMIT=

Section 4: Deploy secrets: artifactory-creds on Target namespace(Use the Gitref as latest main tag for GHA)

  • Deploy secrets: artifactory-creds: Run GHA Env Setup - Deploy SIMS Secrets to Openshift and select artifactory-creds as secret on temporary github environment.

Section 5: Deploy Redis, ClamAV, Mongo and Forms Server (While running GHA use the Gitref as latest main tag for these Actions)

  • Install Redis HA: Run GHA Redis Cluster - Install/Upgrade and select install on temporary github environment.
  • Install Clam AV: Run GHA ClamAV - Install/Upgrade/Remove and select install on temporary github environment.
  • Install Mongo HA: Run GHA Env Setup - Deploy Mongo HA in Openshift for Formio on temporary github environment.
  • Deploy Forms Server: Run GHA Env Setup - Deploy Forms Server with form.io tag v4.3.2 on temporary github environment.

Section 6: Add new cluster IPs to S3 bucket Policy for the user

Dev user name: sims-dev Test user name: sims-test Staging user name: sims-staging Prod user name: sims-prod

Refer to wiki: https://github.com/bcgov/SIMS/wiki/ObjectStorage-Config

Section 7: Add additional secrets to the existing github env which is migrated(DEV TEST STG PROD)

ZONE_B_SFTP_SERVER=
ZONE_B_SFTP_SERVER_PORT=
ZONE_B_SFTP_USER_NAME=
ZONE_B_SFTP_PRIVATE_KEY_PASSPHRASE=
ZONE_B_SFTP_PRIVATE_KEY_VALUE=

Part 3 - Migrate Database

Section 1: Shutdown applications in current environment

  • Shutdown $env-api-sims, $env-queue-consumers-sims, $env-workers-sims and $env-web-sims

Section 2: Backup current database on S3(full backup)

Note: Find most recent activity in DB prior to taking backup (for lower environments this can be a manually entered note). Then can query the db to verify.

  • Execute a full back on s3(Repo 2): Execute the following command from simsdb-repo-host pod terminal: pgbackrest --stanza=db backup --type=full --repo=2.
  • Verify the completion of full backup: Execute the following command from simsdb-repo-host pod terminal: pgbackrest info.
  • Shutdown the current DB after backup: Run the GHA Crunchy Postgres - Shutdown/Startup with action shutdown on current environment.

    This can take some time to shutdown in the simsdb-ha-****.

Section 3: Install postgres cluster from datasource in target environment

  • Install postgres: Run the GHA Crunchy Postgres - Install/Upgrade with datasource with action install-from-datasource and s3 path name: habackup(This value will be set to habackup-silver only for DEV env migration.)
  • Observe for installation to complete and all instances of DB to be up
  • Once the ha instances are up, wait for the one time backup job to complete before disabling the data source.(This is not a scheduled job)
  • Connect to postgres cluster using a SQL Client(e.g. DBeaver, pgadmin) and verify the data restored.(User non-super-user can be used)
  • Run the GHA Crunchy Postgres - Install/Upgrade with datasource with action disable-datasource with no path required. (To cross verify, look at the postgres cluster YAML in Openshift)
  • Connect using app-database-user and read-only-user to verify the access (The users with their grants are expected to implicitly migrated).

Section 4: Upgrade the Postgres DB(This step is applicable only if the migration to new cluster involves DB Upgrade)

  • Run the GHA Crunchy DB - Postgres Version Upgrade with from version 17 and to version 18 and image tag ubi9-18.1-2547.
  • Observe the upgrade process and wait until all the posgtres HA instances are up. By executing postgres --version will confirm the version of the postgres DB.
  • Connect to the database using a SQL Client(e.g. DBeaver, pgadmin) to smoke test the database.

Section 5: Execute the post upgrade steps as per guidelines.

The steps below are based on the guidelines from Crunchy postgres documentation SIMS Wiki from previous version upgrade

  • Analyze new cluster:Connect to master instance pod and execute: vacuumdb --all --analyze-in-stages
  • If the execution of previous command indicates collation version mismatch, then execute the following sql from master pod using psql.
ALTER DATABASE postgres REFRESH COLLATION VERSION;
ALTER DATABASE simsdb REFRESH COLLATION VERSION;
ALTER DATABASE template1 REFRESH COLLATION VERSION;

followed by repeating vacuumdb --all --analyze-in-stages.

  • Remove the old data directory: Connect to master instance pod and execute: ./pgdata/delete_old_cluster.sh
  • Remove the old WAL files: Connect to master instance pod and execute: rm -rf pgdata/pg17_wal
  • Re-create the extension pgaudit: Connect to master instance pod and execute: vi /pgdata/update_extensions.sql

update the file to

\connect template1
DROP EXTENSION pgaudit;
CREATE EXTENSION pgaudit;
\connect postgres
DROP EXTENSION pgaudit;
CREATE EXTENSION pgaudit;
\connect simsdb
DROP EXTENSION pgaudit;
CREATE EXTENSION pgaudit;

and then execute psql -f /pgdata/update_extensions.sql

Part 4 - Github environment cutover

Update and create the github env secrets and variable to cutover to new environment

  • ** Create and Update the following github env secret(s) to new value**:
########## Secrets ############

# Update.
OPENSHIFT_ENV_NAMESPACE=
# Create.
SA_TOKEN=

########## Variables ############

# Create.
BUILD_NAMESPACE=
# Create.
OPENSHIFT_CLUSTER_URL=
# Create.
OPENSHIFT_LICENSE_PLATE=e0a504

Rollback to the old OC cluster(Only if required): Update and remove the github env secrets and variable to rollback to old OC cluster.

  • Delete the following github env secrets: SA_TOKEN
  • Delete the following github env variables: BUILD_NAMESPACE, OPENSHIFT_CLUSTER_URL, OPENSHIFT_LICENSE_PLATE
  • Updatethe following github env secret(s) to old value: OPENSHIFT_ENV_NAMESPACE
########## Secrets ############

# Update.
OPENSHIFT_ENV_NAMESPACE=
# Delete.
SA_TOKEN=

########## Variables ############

# Delete.
BUILD_NAMESPACE=
# Delete.
OPENSHIFT_CLUSTER_URL=
# Delete.
OPENSHIFT_LICENSE_PLATE=e0a504

Part 5 - Deploy secrets(Use currently deployed tag/gitref to run GHA)

  • Run the GHA Env Setup - Deploy SIMS Secrets to Openshift with the option sims-creds
  • Run the GHA Env Setup - Deploy SIMS Secrets to Openshift with the option sftp-creds

Part 6 - Build and deploy

  • Execute only if required: Run the GHA Release - Build All if the build needs to be generated for the current branch pertinent to the environment. e.g.: for DEV: main is always the current branch, TEST: based on the current deployed tag(mostly, it is version branch), STG and Prod(Version branch)

  • Deploy all: Run the GHA Release - Deploy All with the tag that is latest for the environment.

Part 7 - DNS Cutover

The host name pattern for the migrated environment must route to new cluster IP. e.g. when migrating DEV, dev.$(hostname) must be configured to route to new cluster IP.

Part 8 - Post migration tasks

Section1: PR with following updates

  • Crunchy postgres namespace-values.yml file updated for the migrated environment if there was any change made to cluster during the cutover.(e.g. postgres version upgrade (crunchyImage and postgresVersion must be updated)) updates values file for the migrated environment for crunchy postgres

Section 2: Create sysdig alerts for the new environment(namespace)

  • Sysdig alerts - Create/Update Sysdig teams: Deploy the configuration SysdigTeam on the tools namespace of the new cluster. This can be achieved by updating the make command update-sysdig-team with new license plate and executing the make or GHA for sysdig. During the migration process(until the production cutover), if make command consists of old and new environment license plates, a single OC service account may not have access to edit on old and new license plates. In that case, we can have a temporary make command (e.g. update-sysdig-team-{new env}) for new license plate until production is migrated.
  • Create notification channels for the new Sysdig teams: The notifications channels(Microsoft teams and Email) must be created for the new Sysdig teams following the notification channels from the existing teams. (The webhook URL can be copied over from respective channel for the newly created channel. e.g. existing Microsoft teams webhook URL for non-prod alerts channel can be copied to create a non-prod alerts channel in new sysdig teams)
  • Copy alerts from existing channel: Copy the alerts from existing sysdig team for a similar environment to new sysdig team. By clicking the more options on a given alert, the copy alert can be done.