DevOps and Running the Application - bcgov/SIMS GitHub Wiki

DevOps

This document will explain DevOps setup and utilities for AEST/SIMS project.

N.B: ROOT means repository root directory

Table of Content

Prerequisites

  1. OpenShift namespace
  2. OpenShift Client (OC CLI)
  3. Keycloak realm
  4. Docker (for local development only)
  5. Make cmd (for local development only - windows users)
  • 5.1 Install Chocolatey first in order to install 'make'. In a CMD terminal execute:
@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "[System.Net.ServicePointManager]::SecurityProtocol = 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"

or refer to: https://docs.chocolatey.org/en-us/choco/setup

  • 5.2 Install make using the commmand:
choco install make
  1. Test cmd (you can comment out when using as well - windows users)

Local Development

  1. Clone repo to the local machine git clone https://github.com/bcgov/SIMS
  2. Create a .env file in the repository root dir, for reference check /config/env-example on Microsoft Teams files. Then run all make commands from the /sources dir (see below)
  3. To build the application: make local-build
  4. To run all web+api+DB: make local
  5. To stop all application stack: make stop
  6. To clean all applications including storage: make local-clean
  7. To run database only: make postgres
  8. To run api with database: make local-api
  9. Shell into local api container: make api
  10. Run api test on Docker: make test-api
  11. To run local redis: make local-redis or make redis (local redis is required to run the application)
  12. To run queue-consumers in local docker: make queue-consumers
  13. To run forms in local docker: make forms

For Camunda

  1. To make Camunda make camunda from sources
  2. run npm i from packages/backend/workflows
  3. To deploy workflows make deploy-camunda-definitions from SIMS/sources or npm run deploy from packages/backend/workflows folder

For Web (If not using make local)

  1. run npm i from packages/web folder
  2. To run web npm run serve from the packages/web folder

For Backend (API + Workers + Others) (If not using make local)

  1. run npm i from packages/backend
  2. npm run start:[dev][debug] [workers][api][other]

OpenShift

OpenShift is cloud-native deployment platform to run all our application stacks. The OpenShift (oc) CLI is required to run any OpenShit operation from local machine or OpenShift web console.

OpenShift Login

  • Developer need an account on OpenShift 4 cluster managed by BC Gov.

  • Copy temporary token from web console and use oc login --token=#Token --server=https://api.silver.devops.gov.bc.ca:6443

  • After login please verify all the attached namespaces: oc projects

  • Select any project: oc project #ProjectName

OpenShift Infrastructure

  • Application images are building on a single namespace (tools namespace)

  • Images are promoted to different environment using Deployment Config.

  • All application secrets and configs are kept in OpenShift Secret and config maps. Theses values are injected to target application through deployment config.

  • BC Government OpenShift DevOps Security Considerations

OpenShift Template files

Under ROOT/devops/openshift/, all the OpenShift related template files are stored.

  • api-deploy.yml: Api deployment config template.
  • db-migrations-job.yml: DB migrations job template.
  • docker-build.yml: Generic builder template.
  • forms-build.yml: Formio builder template.
  • forms-deploy.yml: Formio deployment config template.
  • init-secrets.yml: Init secrets to setup the initial environment setup on Openshift.
  • networkpolicy.yml: Security on the network policy on openshift template.
  • queue-consumers-deploy.yml: Queue Consumers deployment config template.
  • security-init.yml: Network and security polices template to enable any namespace for application dev.
  • web-deploy.yml: Web app deployment config template.
  • workers-deploy.yml: Workers deployment config template.

Database

Under ROOT/devops/openshift/database/, all the database related template files are stored.

  • createdb-job.yml: Job template to create separate database in patroni postgres database.
  • mongo-ha-param.yml: Parameter file to run mongo template file mongo-ha.yml.
  • mongo-ha.yml: HA Mongo State-full-state deployment config template.
  • patroni-deploy.yml: Patroni(Postgres) State-full-state deployment template (Enhanced from BCGov sample deploy.yml).
  • patroni-pre-req.yml: OpenShit secret creation template for Patroni app.
  • redis-ha-deploy.yml: Redis State-full-state deployment config template.
  • redis-secrets.yml: Redis secrets template.
Backups

Under ROOT/devops/openshift/database-backup, all the OpenShift database backups related templates file are stored. The templates were copied and adapted from BCDevOps backup-container.

The backups are executed in a container for Postgres databases. It follows different backup configurations where we have different containers to execute the different backups for different databases types.

  • backup-build.yaml: build template used by Postgres.
  • backup-deploy.yaml: deploy template used by Postgres.
  • backup-example.conf: sample configuration for backups.
  • patroni-simsdb-backup: current backup configuration shared by Postgres.

OpenShift Setup

We have created a setup of make helper commands, Now we can perform following steps to setup any namespace.

  • Setup your env variable in ROOT/.env file or in ROOT/devops/Makefile, sample env file is available under ROOT/configs/env-example. The list of essential env variables are

    1. NAMESPACE
    2. BUILD_NAMESPACE
    3. HOST_PREFIX (optional)
    4. BUILD_REF (optional, git brach/tag to use for building images)
    5. BUILD_ID (optional, default is 1)

Initial Setup

  • Login and select namespace

  • Setup network, security policies OpenShift network policies, ClusterRole addition for image puller and github-action rolebinding : make init-oc

Patroni Setup

  • Add to the existing env variable in ROOT/.env file or in ROOT/devops/Makefile, sample env file is available under ROOT/configs/env-example. The list of essential env variables are

    1. PVC_SIZE (optional, else the Patroni PVC_SIZE of the each pod will be defaulted to 2Gi)
  • Setup Patroni secrets: make init-patroni

  • Deploy Patroni: make oc-deploy-patroni

Assign privileges for Database user

  • Connect the openshift terminal using the openshift token.

  • Go to the appropriate project oc project $(NAMESPACE)

  • Connect patroni service using the port-forward command oc port-forward services/patroni-master 5453:5432

  • Go to your local pgadmin/DbWeaver applications in your local machine and connect the database using the below connection parameters

    • Host name - localhost
    • Port - 5433
    • Database - SIMSDB
    • Username - postgres
    • Password - Login to openshift -> go to the assigned Namespace -> Secrets -> patroni-creds -> Copy superuser-password
  • Open the Query tool and run the below privilege assignment commands.

GRANT pg_read_all_settings TO "app_database_user";

GRANT pg_read_all_stats TO "app_database_user";

GRANT CONNECT ON DATABASE "SIMSDB" TO "app_database_user";

GRANT USAGE ON SCHEMA sims TO "app_database_user";

GRANT SELECT ON ALL TABLES IN SCHEMA sims TO "app_database_user";

GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA sims TO "app_database_user";

GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA sims TO "app_database_user";

SFTP Setup

  • Add to the existing env variable in ROOT/.env file or in ROOT/devops/Makefile, sample env file is available under ROOT/configs/env-example. The list of essential env variables are

    1. INIT_ZONE_B_SFTP_SERVER=
    2. INIT_ZONE_B_SFTP_SERVER_PORT=
    3. INIT_ZONE_B_SFTP_USER_NAME=
    4. INIT_ZONE_B_SFTP_PRIVATE_KEY_PASSPHRASE=
  • Add the private key for the zone B sftp server in the file ROOT/devops/openshift/zone-b-private-key.cer

  • Setup Zone B SFTP secrets: make init-zone-b-sftp-secret

Secrets Setup

  • Add the appropriate openshift secrets in the Github Secrets -> Environments.

  • Run Github action Env Setup - Deploy SIMS Secrets to Openshift to create all the secrets in openshift.

    • Select the Environment to create the secrets.
    • Input the tag as Build Ref in the workflow and `Run workflow'.

FORMIO Setup

Build and Deploy
  • Popluate the mongo-ha-param.yml with the required values for mongo db creation.

  • Create Mongo DB: make oc-deploy-ha-mongo

Note: For a fresh install we may need to Build Forms in the tools namespace and then deploy, else for deploying into a new environment, where the build is already available Deploy Forms is enough.

Build Forms
  • Run Github action Env Setup - Build Forms Server to build the formio(Forms server) in tools namespace of openshift.
    • The minimum version of the formio server to be deployed is v2.5.3. Please refer the formio tag url for any updates needed.
    • Input the tag as Build Ref in the workflow and `Run workflow'.
Deploy Forms
  • Fetch the mongo-url secret from the mongodb-ha-creds created as part of the previous oc-deploy-ha-mongo make command and update the GITHUB secrets -> Environment -> MONGODB_URI

  • Run Github action Env Setup - Deploy Forms Server to deploy the formio(Forms server) in tools namespace of openshift.

    • Select the Environment to build, deploy the Forms server and its related secrets, service and routes.
    • The minimum version of the formio server to be deployed is v2.5.3. Please refer the formio tag url for any updates needed.
    • Input the tag as Build Ref in the workflow and `Run workflow'.
Formio Definition Deployment
  • Fetch the secrets from the {HOST-PREFI}-forms created as part of the previous Github action and update the GITHUB secrets -> Environment ->

    • FORMIO_ROOT_EMAIL : FORMS_SA_USER_NAME
    • FORMIO_ROOT_PASSWORD : FORMS_SA_PASSWORD
    • FORMS_URL : FORMS_URL
    • FORMS_SECRET_NAME : {HOST-PREFI}-forms
  • Run Github action Release - Deploy Form.io resources to deploy the forms resources to the formio server.

    • image
    • Select the Environment to build, deploy the Forms server and its related secrets, service and routes.
    • Input the tag as Build Ref in the workflow and `Run workflow'.

Redis Setup through Make

  • Setup Redis secrets:

    • make init-redis NAMESPACE=$namespace
  • Deploy Redis with 6 replicas:

    • make deploy-redis NAMESPACE=$namespace
  • Initialize the Redis Cluster

    • Make sure that all the redis pods are up and running before initializing the cluster:
      • make init-redis-cluster NAMESPACE=$namespace REDIS_PORT=$redis_port
    • When prompted, type 'yes'

Redis Setup through Github Actions

image

image

Deploy API, Web, Workers, Queue-consumers

  • Run Github action Release - Deploy to deploy the API Web Workers Queue-consumers in the namespace.
    • Input the tag as Git Ref in the workflow.
    • Select the Environment to deploy the API Web Workers Queue-consumers.
    • `Run workflow'.

Database Backups

  • If you have the env variables as part of the previous steps, ignore the below step.

  • Setup your env variable in ROOT/.env file or in ROOT/devops/Makefile, sample env file is available under ROOT/configs/env-example. The list of essential env variables are

    1. NAMESPACE
    2. BUILD_NAMESPACE
    3. HOST_PREFIX (optional)
  • Create backup build for Postgres: make oc-db-backup-patroni-simsdb-build

  • Deploy Postgress DB container: make oc-db-backup-patroni-simsdb-deploy

Note: Additional commands to delete build and deploy containers for DB backup

  • Delete backup build for Postgres: make oc-db-backup-patroni-simsdb-build-delete

  • Delete deploy Postgress DB container: make oc-db-backup-patroni-simsdb-deploy-delete

Troubleshooting and Additional Context

Patroni Backup and Restore

  • Login to openshift and go to the backup-container patroni-simsdb-backup.

  • Select the pods running in the backup container and open the terminal.

  • Run ./backup.sh -l to list the available backups.

  • Run ./backup.sh -s to do a current backup of the db.

  • Run ./backup.sh -r patroni-master/SIMSDB -f <specific backup filename from the backup list>, to restore the postgres SIMSDB from the specific backup file.

  • Note: When a new patroni is spin up and restored by deleting the old Patroni including the PVC's then please follow the below troubleshooting.

    • During the restore process, due to the mismatch in the database role previously created in patroni, the backup might have the old database role and the newly created will have the database role as app_database_user.

    • Connect the new patroni using the local pgamin/dbweaver using the postgres user/pwd and run the below command.

    • ALTER ROLE "app_database_user" RENAME TO "<Old_postgres_role>";

    • Old_postgres_role is the role name that is erroring when the restore was trying to run. If the name is not to be found, run ./backup.sh -r patroni-master/SIMSDB -f <specific backup filename from the backup list>, it will throw an error stating the restore cannot be done with the role name not found. Note: Once you run the backup command, it might drop the 'SIMSDB' as the first command, so you may need to delete the old patroni and start again a new one.

    • Once the above role is rename is successful, run the restore, it should be successful. Try to run the below commands again to rename back the user to app_database_user with the changed new 'database-password' from openshift secrets 'patroni-creds'.

    • ALTER ROLE "app_api_MUspRgoK" RENAME TO "app_database_user";

    • ALTER ROLE "<Old_postgres_role>" WITH PASSWORD '<New database-password from openshift secrets - patroni-creds>'

    • Take a backup again ./backup.sh -s to do a current backup of the db and remove any old backups, as it might have the old role.

    • Verify the backup again by restoring.

    • Sequential backups happening after this should have the new database role and it should work seamlessly.

Patroni

  • The key commands you will want to be familiar with patronictl are:
    • failover
    • history
    • list
    • pause
    • reinit
    • resume
    • version
  • General Diagnosis Steps
    • patronictl list
    • To reset a specific node, run patronictl reinit <cluster_name>
    • If its not starting or the master is not assigned, please follow the below commands to find the master.
      • This step will take a bit of guesswork, but we need to know which was the right master in order to recover correctly without losing data
      • One way to determine this is to manually start up Postgres out of band with pg_ctl -D /home/postgres/pgdata/pgroot/data start.
      • If the node is able to start up without showing any errors, it is possible that node was formerly the master node. If it crashes or exhibits any other errors, it's likely that node is a corrupted replica.
      • If there are still Kibana logs, you could search in there to try and determine which was the master before the cluster failed as an alternative.
    • After finding the master, run command to Restart the Postgres DB manually on patroni-1 with patronictl restart patroni patroni-1
    • Once patroni master is started Restore Patroni cluster management with patroni resume.
    • With patroni-master-1 as the elected leader, we can let Patroni manage things again
    • After running this command, some of the other cluster nodes will hopefully realign with the elected leader. Use patronictl list frequently to keep tabs on the situation.
    • Else Hard reset the defective master patroni-1 with patronictl reinit patroni patroni-1 and try the resume and list commands.

Note: In case still problems exists with Patroni please refer to this link: https://github.com/bcgov/common-service-showcase/wiki/Patroni-Troubleshooting

Some Additional Commands

  • Create new database: make create-new-db NEW_DB=newdbname JOB_NAME=openshift-jobname

  • Delete the config map for databases config: oc-db-backup-configmap-delete

  • Delete the resources associate with Postgres database (PVCs are not deleted): oc-db-backup-delete-postgresql

  • Delete the resources associate with Mongo database (PVCs are not deleted): oc-db-backup-delete-mongodb

Redis cluster failing to recover

Partial cluster failure recovery

When redis cluster is restarted or pods are put down and started, and the cluster did not recover gracefully in the openshift environment, or when the Redis node does not connect to another node and join the cluster. Please follow the below make commands.

  • Step 1: Bring down the slave pods: Make the redis pods from 6 to 3 by into openshift.

  • STEP 2: run the following Github action Env Setup - Redis recovery in Openshift - Check if the masters are Cluster meet successfully.

  • STEP 3: Make the redis pods replicas to 6 - This should automatically pass the queue-consumers.

Total cluster failure recovery

When partial cluster failure recovery in the above steps didnt work and the redis nodes are unable to connect the cluster we have to delete and deploy the redis instances and create cluster again to make redis up and running normally.

Note: This is just a temporary solution as we do not have a permanent solution in place to recover the redis cluster. This process will result in deleting all the redis data as we have to delete the stateful set.

Follow the given set of instructions to deploy the redis instance and create cluster.

  • STEP 1: run the following command to delete the stateful set and other redis dependencies make delete-redis NAMESPACE=$namespace

    Or go to Env Setup - Delete Redis in Openshift, select a branch or tag, select the environment and click on "Run Workflow" like the image below: image

  • STEP 2: Follow the instructions from the section above Redis Setup through Make or Redis Setup through Github Actions to setup redis.

  • STEP 3: Restart queue-consumers and api.

  • STEP 4: If this activity is performed in DEV(!!ONLY DEV ENVIRONMENT!!) environment, please pause the schedulers used for file integrations.

Pod disruption budget and Pod replicas

  • All pods are created with 2 replicas and will go to a maximum of 10 replicas when load increases.
  • Pod disruption budget is set for all the deployment config pods (except the db backup container) with a maxUnavailable value of 1.
  • For the databases patroni and mongo has a maxUnavailable value of 1, but redis has a maxUnavailable of 2.
  • As per the configuration, when there is a drain of nodes or maintenance happening, only one node will be drained and the application will be live.
  • Sample PDB for our API is shown below: