Creating a New Environment Within a Resource Group - CDCgov/prime-simplereport GitHub Wiki


Warning

If this is used for production regional recovery, create a new state file for the new region and update the new directory to point to the new region.


Prerequisites

  • A functional CDC superuser account with Azure access.
  • A fresh copy of your SU password, obtained at the beginning of your change window.
  • terraform installed on your development environment. Installation instructions can be found at this link.
  • The Azure CLI installed on your development environment. Installation instructions can be found at this link.
  • A repository branch on your local machine from which to work. Use a branch based on main, with no other commits or feature work.

This document assumes the use of a *nix-based system. Windows users may need to tweak command syntax to suit their needs. As an alternative, Windows users can also leverage the Windows Subsystem for Linux to run commands from a *nix environment within their system.

Process

Phase 1: Terraform Changes

Each environment within SimpleReport has its own folder within the ops subdirectory. These environments are not currently grouped by level; each one is unique. There are future plans to group these into environment level folders, at which time this document will be updated. (Note to self: create this)

At the time of writing, each environment is slightly different in its Terraform structure, with certain elements that are present or missing depending on the environment's purpose. There are plans to refactor our Terraform to enforce standardization across environments, at which time this document will be updated; in the meantime, it is recommended that an environment from the same hierarchical level be used as a source for TF code.

As an example, if a new dev environment is to be created, you can source TF code from dev2, dev3, or dev4. Do not use something like prod or test, as these will produce unpredictable results.

Complete the following steps to finish Phase 1.

Code Copying

  1. Copy an entire environment folder from ops that you wish to use for a base. (As an example, if you wish to create dev5, copy dev or dev2.)
  2. Paste this folder into ops, renaming it to the name of the new environment of your choosing.
  3. Scan each *.tf file in the new environment folder, as well as the <env>/persistent folder. Look for any non-parameterized instances of variable values, resource names, or data names that use the source environment name. Change them accordingly.

NOTE: Pay special attention to any key vault values in your vault or _data files. Non-parameterized values found in these files may correspond to dev/prod keys, and should NOT be changed, as they are shared across environments.

Terraform Init/Plan

  1. Open an instance of your favorite terminal or console program.
  2. From the repository root, run az login. Your browser should automatically open and guide you through the authentication process. Use your SU credentials. (If you would rather use an incognito tab, or a browser other than your default to prevent account-crossing, run az login --use-device-code and follow the steps.)
  3. Execute cd ops/<env>/persistent, where <env> is the name of your new environment.
  4. Set up Terraform's internals by running terraform init.
  5. Run terraform plan.
  6. If there are errors, inspect the output. You may need to rename resources or make changes to TF code in other folders, like ops/services. If you do, make a note of what you have changed; you will need to verify your changes against multiple environments later, to ensure your changes do not produce deleterious effects on other environments. Re-run terraform plan, and repeat this step as necessary. Proceed when no further errors are produced.
  7. Inspect the results. Satisfactory output will indicate Plan: <nn> to add, 0 to change, 0 to destroy. The number of resources to be added varies based on environment level. If any resources are reporting as targets for change or destruction, STOP. Verify your code; ensure that any changes made are intentional.
  8. Execute cd .. to return to the <env> folder.
  9. Set up Terraform's internals by running terraform init.
  10. Run terraform plan.
  11. If there are errors, inspect the output. You may need to rename resources or make changes to TF code in other folders, like ops/services. If you do, make a note of what you have changed; you will need to verify your changes against multiple environments later, to ensure your changes do not produce deleterious effects on other environments. Re-run terraform plan, and repeat this step as necessary. Proceed when no further errors are produced.
  12. Inspect the results. Satisfactory output will indicate Plan: <nn> to add, 0 to change, 0 to destroy. The number of resources to be added varies based on environment level. If any resources are reporting as targets for change or destruction, STOP. Verify your code; ensure that any changes made are intentional.

Once your TF changes are complete, it is highly recommended to repeat steps 3-12 on another environment from a different hierarchical level in the ops folder. Make sure that your changes haven't caused any unexpected planned actions on existing infrastructure.

As a guide, compare the results of your terraform plan to the same environment by using the Ad-Hoc Terraform Plan GitHub Action, making sure you use the code from main. Your local results should be identical to those that the action outputs. (Remember: the action outputs plan results from both the <env> folder AND the <env>/persistent folder.

Once your results are satisfactory, commit your changes to your local branch.

Phase 2: GitHub Action Changes (This is very much out of date and needs attention)

  1. Copy a deploy<Env>.yml file from .github/workflows that you wish to use for a base. (As an example, if you wish to create dev5, copy deployDev.yml or deployDev2.yml.)
  2. Paste this file into .github/workflows, renaming it to incorporate the name of the new environment of your choosing.
  3. Scan the file for any non-parameterized instances the source environment name, changing them as necessary.
  4. Update .github/workflows/terraform_checks.yml by adding <env> and <env>/persistent to the TERRAFORM_DIRS environment variable within the check-terraform-validity process.
  5. Commit your changes.

Phase 3: Persistent Infrastructure Creation

  1. From the repository root, execute cd ops/<env>/persistent, where <env> is the name of your new environment.
  2. Set up Terraform's internals by running terraform init.
  3. Run terraform apply.
  4. Terraform will automatically generate a plan. Inspect the results, and confirm they match the result of your terraform plan run in Phase 1. If they do, type yes when prompted.
  5. Wait for the apply process to complete. This may take up to 10 minutes.

NOTE: You may see an error like the following:

Error: waiting for creation of the Postgresql Flexible Server "simple-report-dev4-flexible-db" (Resource Group "prime-simple-report-dev"): Code="VirtualNetworkNotLinkedToPrivateDnsZone" Message="The virtual network simple-report-dev4-network is not linked to private DNS zone privatelink.dev4.postgres.database.azure.com. Please link the virtual network to zone and retry."

This is usually a timing issue. To resolve, simply wait 5 minutes, and repeat steps 2-5.

If the above error persists, or if additional errors result, modify your Terraform code accordingly, perform the terraform plan process again, and attempt another terraform apply.

Phase 4: Stateless Infrastructure Creation

Azure Key Vault Setup

  1. Log into the Azure console.
  2. Navigate to the simple-report-global key vault.
  3. Copy the value of the following secret: simple-report-<source_env>-db-jdbc, where <source_env> represents the environment on which you based your Terraform changes in Phase 1.
  4. Create a new secret in the key vault with the following name: simple-report-<env>-db-jdbc, where env represents the name of the new environment you have created. The value should be the secret value you copied from the previous step. Save your changes when finished.
  5. Copy the value of the following secret: simple-report-<source_env>-metabase-uri, where <source_env> represents the environment on which you based your Terraform changes in Phase 1.
  6. Create a new secret in the key vault with the following name: simple-report-<env>-metabase-uri, where env represents the name of the new environment you have created. The value should be the secret value you copied from the previous step. Save your changes when finished.

Terraform Apply

  1. From the repository root, execute cd ops/<env>, where <env> is the name of your new environment.
  2. Set up Terraform's internals by running terraform init.
  3. Run terraform apply.
  4. Terraform will automatically generate a plan. Inspect the results, and confirm they match the result of your terraform plan run in Phase 1. If they do, type yes when prompted.
  5. Wait for the apply process to complete. This may take up to 10 minutes.

You will encounter the following error during this process (with environment names that match those of your new environment):

Error: creating/updating Monitor Smart Detector Alert Rule: (Name "dev4-failure-anomalies" / Resource Group "prime-simple-report-dev"): alertsmanagement.SmartDetectorAlertRulesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="ScopeInUse" Message="A FailureAnomaliesDetector alert rule with id '/subscriptions/<GUID>/resourcegroups/prime-simple-report-dev/providers/microsoft.alertsmanagement/smartdetectoralertrules/failure anomalies - prime-simple-report-dev4-insights' is already defined on the resource '/subscriptions/<GUID>/resourcegroups/prime-simple-report-dev/providers/microsoft.insights/components/prime-simple-report-dev4-insights'. Only a single FailureAnomaliesDetector alert rule can be define for the same resource."

A FailureAnomaliesDetector is automatically created with the App Gateway and Application Insights objects. A future TF change will take care of this manual step. In the meantime, to address this error, follow these steps:

  1. Log into the Azure console.
  2. Navigate to the prime-simple-report-<env>-insights resource, where <env> is the name of your new environment.
  3. Click "Alerts" in the left-side menu pane.
  4. At the top of the new pane, click "Rules".
  5. In the list presented, delete the existing Smart Detector (FailureAnomaliesDetector) object.
  6. Re-run the above terraform apply process.

When the second terraform apply process runs, you will be met with the following error (with environment names that match those of your new environment):

Error: retrieving Custom Hostname Binding "api-dev4.simplereport.gov" (App Service "simple-report-api-dev4" / Resource Group "prime-simple-report-dev"): web.AppsClient#GetHostNameBinding: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="NotFound" Message="Cannot find HostName with name api-dev4.simplereport.gov." Details=[{"Message":"Cannot find HostName with name api-dev4.simplereport.gov."},{"Code":"NotFound"},{"ErrorEntity":{"Code":"NotFound","ExtendedCode":"51004","Message":"Cannot find HostName with name api-dev4.simplereport.gov.","MessageTemplate":"Cannot find {0} with name {1}.","Parameters":["HostName","api-dev4.simplereport.gov"]}}]

To address this error, complete the following steps:

  1. Complete Phase 6 of this document.
  2. Re-run the above terraform apply process.

If the above errors persist, or if additional errors result, modify your Terraform code accordingly, perform the terraform plan process again, and attempt another terraform apply.

Phase 5: Cleanup and Code Review

  1. Before committing your final TF code, run terraform fmt -recursive from the ops directory. This will catch any formatting mistakes before PR creation.
  2. Update README.md with information about the new environment.
  3. Update the Makefile to include the new environment within the .valid-env-% target.
  4. Within GitHub, update the Wiki page "Cloud Environments" with information about the new environment.
  5. Push your completed branch to the prime-simplereport repository.
  6. Create a new PR for your changes. Creating this PR as a draft is highly recommended to allow the suite of automated checks to run.
  7. If checks fail, push the required changes. Pay special attention to the Terraform Checks workflow that is run; errors with terraform fmt or terraform validate will appear here.
  8. Closely inspect the results of the terraform plan check run as part of the Terraform Checks workflow. This check runs against prod, and we do NOT want changes made to production unless they are intentional. If something unexpected has changed here, STOP, verify your code, and see what has caused the change. (Please don't be afraid to ask for a second set of eyes from a teammate; we're all here to help!)
  9. Once all checks are passing, mark your PR as "Ready for Review", and add reviewers.

Phase 6: DNS Association/SSL Binding

The CDC controls DNS entries for all SimpleReport environments. Modifying these values requires intervention from a CDC representative.

To complete this process, contact SimpleReport's DevSecOps lead for assistance. (Currently, that individual is Alis Akers; previously, it was Rin Concordia.)

Detailed steps for this part of the process will be added in a future update to this document.

Phase 7: App Service Deployment

Now that all infrastructure is in place, SimpleReport can be properly deployed. The process differs based on whether this environment is part of the CI/CD pipeline.

For Pipelined Environments

For environments part of the pipeline (expected to update with each push to main), no further action is necessary. The app will auto-deploy on the next merge to main.

For Non-Pipelined Environments

To deploy non-pipelined environments, execute the Deploy GitHub Action you created in Phase 2.

⚠️ **GitHub.com Fallback** ⚠️