Week 9 – Summary Task: Terraform on Azure - snir1551/DevOps-Linux GitHub Wiki

This project demonstrates how to provision and manage real-world infrastructure on Microsoft Azure using Terraform, following Infrastructure as Code (IaC) best practices. The solution includes networking, VM provisioning, remote backend configuration, CI/CD automation with GitHub Actions, and post-deployment health checks.

Project Goals

  • Use Terraform to provision a complete infrastructure on Azure.
  • Configure remote state management with an Azure Storage Account.
  • Automate deployment and testing with GitHub Actions workflows.
  • Run app on an Azure VM using Docker Compose.
  • Document and log the full provisioning process.

Project Structure

Steps Overview

Step 1–2: Initialize Project and Define Infrastructure

  • Created modules for:
    • Virtual Network (VNet)
    • Subnet
    • Network Security Group (NSG)
    • Public IP
    • Linux VM with SSH access

Full definition in main.tf, variables.tf, and outputs.tf.

Project Wiki for Task 1–2

update module/vm/main.tf:

resource "azurerm_linux_virtual_machine" "this" {
  name                = var.vm.name
  resource_group_name = var.resource_group_name
  location            = var.location
  size                = var.vm.size
  admin_username      = var.vm.admin_user

  network_interface_ids = [var.network_interface_id]

  admin_ssh_key {
    username   = var.vm.admin_user
    public_key = var.ssh_public_key
  }

  os_disk {
    caching              = var.vm.disk_caching
    storage_account_type = var.vm.disk_storage_type
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts-gen2"
    version   = "latest"
  }

  tags = var.tags

  custom_data = base64encode(<<EOF

  sudo apt-get update -y
  sudo apt-get install -y docker.io

  sudo systemctl enable docker
  sudo systemctl start docker

  # Docker Compose
  sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
  sudo chmod +x /usr/local/bin/docker-compose
  sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

  # Setup Swap (optional)
  if ! swapon --show | grep -q '/swapfile'; then
    sudo fallocate -l 1G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfileש
    echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
  fi
EOF
)
}

update modules/vm/variables.tf:

# modules/virtual_machine/variables.tf

variable "resource_group_name" {
  type        = string
  description = "Resource group name"
}

variable "location" {
  type        = string
  description = "Azure region"
}

variable "subnet_id" {
  description = "ID of the subnet to associate with the NIC"
  type        = string
}

variable "tags" {
  type        = map(string)
  description = "Common tags"
}

variable "vm" {
  description = "Virtual machine configuration"
  type = object({
    name                = string
    size                = string
    admin_user          = string
    public_ip_name      = string
    public_ip_alloc     = string
    nic_name            = string
    ip_config_name      = string
    private_ip_alloc    = string
    disk_caching        = string
    disk_storage_type   = string
  })
}

variable "network_interface_id" {
  description = "The ID of the network interface to attach to the VM"
  type        = string
}

variable "ssh_public_key" {
  description = "SSH public key for VM"
  type        = string
}

update main.tf:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0"
    }
  }
}

provider "azurerm" {
  features {
    resource_group {
      prevent_deletion_if_contains_resources = false
    }
  }
}

# resource "azurerm_resource_group" "imported" {
#   name     = "new_resource"
#   location = "israelcentral"
# }


module "resource_group" {
  source         = "./modules/resource_group"
  resource_group = var.resource_group
  tags           = var.common_tags
}

module "network" {
  source              = "./modules/network"
  resource_group_name = module.resource_group.resource_group_name
  location            = module.resource_group.resource_group_location
  tags                = var.common_tags

  virtual_network = {
    name          = var.virtual_network.name
    address_space = var.virtual_network.address_space
  }

  subnet = {
    name           = var.subnet.name
    address_prefix = var.subnet.address_prefix
  }

  nsg = {
    name      = var.network_security_group.name
    rule_name = "ssh-rule"
  }

  public_ip         = var.public_ip
  network_interface = var.network_interface
}


module "vm" {
  source               = "./modules/vm"
  resource_group_name  = module.resource_group.resource_group_name
  location             = module.resource_group.resource_group_location
  subnet_id            = module.network.subnet_id
  tags                 = var.common_tags
  vm                   = var.virtual_machine
  network_interface_id = module.network.network_interface_id
  ssh_public_key       = var.ssh_public_key
}

update variables.tf:

# Resource Group
variable "resource_group" {
  description = "Resource group configuration"
  type = object({
    name     = string
    location = string
  })
  default = {
    name     = "mtc-resources"
    location = "Israel Central"
  }
}

variable "common_tags" {
  description = "Tags applied to all resources"
  type        = map(string)
  default = {
    environment = "dev"
  }
}


# Virtual Network
variable "virtual_network" {
  description = "Virtual network configuration"
  type = object({
    name          = string
    address_space = list(string)
  })
  default = {
    name          = "mtc-network"
    address_space = ["10.123.0.0/16"]
  }
}


# Subnet
variable "subnet" {
  description = "Subnet configuration"
  type = object({
    name           = string
    address_prefix = list(string)
  })
  default = {
    name           = "mtc-subnet"
    address_prefix = ["10.123.1.0/24"]
  }
}


# Network Security Group
variable "network_security_group" {
  description = "NSG configuration"
  type = object({
    name = string
  })
  default = {
    name = "mtc-nsg"
  }
}

# Public IP
variable "public_ip" {
  description = "Public IP configuration"
  type = object({
    name              = string
    allocation_method = string
  })
  default = {
    name              = "mtc-ip"
    allocation_method = "Static"
  }
}

# Network Interface
variable "network_interface" {
  description = "NIC configuration"
  type = object({
    name                  = string
    ip_configuration_name = string
    private_ip_allocation = string
  })
  default = {
    name                  = "mtc-nic"
    ip_configuration_name = "internal"
    private_ip_allocation = "Dynamic"
  }
}

variable "virtual_machine" {
  description = "Virtual machine configuration"
  type = object({
    name              = string
    size              = string
    admin_user        = string
    public_ip_name    = string
    public_ip_alloc   = string
    nic_name          = string
    ip_config_name    = string
    private_ip_alloc  = string
    disk_caching      = string
    disk_storage_type = string
  })
  default = {
    name              = "mtc-vm"
    size              = "Standard_B1s"
    admin_user        = "azureuser"
    public_ip_name    = "mtc-ip"
    public_ip_alloc   = "Static"
    nic_name          = "mtc-nic"
    ip_config_name    = "internal"
    private_ip_alloc  = "Dynamic"
    disk_caching      = "ReadWrite"
    disk_storage_type = "Standard_LRS"
  }
}

variable "ssh_public_key" {
  description = "SSH public key for VM"
  type        = string
}

update outputs.tf:

output "resource_group_name" {
  description = "The name of the resource group"
  value       = module.resource_group.resource_group_name
}

output "public_ip_address" {
  description = "Public IP address of the virtual machine"
  value       = module.network.public_ip_address
}

output "virtual_machine_id" {
  description = "ID of the deployed virtual machine"
  value       = module.vm.virtual_machine_id
}

output "virtual_machine_name" {
  description = "Name of the deployed virtual machine"
  value       = module.vm.virtual_machine_name
}

output "ssh_connection_command" {
  description = "Command to SSH into the VM"
  value       = "ssh ${var.virtual_machine.admin_user}@${module.network.public_ip_address}"
}

Summary of Changes:

  1. Automatic Docker & Docker Compose Installation

    What: Added a custom_data script to the azurerm_linux_virtual_machine resource in your VM module. Why: This ensures that every new VM created by Terraform will automatically install Docker and Docker Compose on first boot, making the VM ready for container workloads without manual intervention. Passing SSH Public Key as a String

  2. Passing SSH Public Key as a String

    What: The VM module now receives the SSH public key as a string variable (ssh_public_key) instead of a file path. Why: This approach is more CI/CD-friendly, allowing you to inject the key dynamically (for example, from GitHub Actions secrets) and making the infrastructure code more portable and automated. Outputs for Automation

  3. Outputs for Automation What: Added outputs for the resource group name, public IP address, VM ID, VM name, and a ready-to-use SSH connection command. Why: These outputs make it easy to retrieve important information about your deployed infrastructure, which can be used in subsequent automation steps
    (like deployment scripts or notifications). Removed ssh_key_path from Variables

  4. Removed ssh_key_path from Variables What: The ssh_key_path variable was removed from the VM module. Why: Since the public key is now passed as a string, there’s no need to reference a file path, simplifying the module and making it more robust for
    automated pipelines. (Optional) Swap File Creation

  5. (Optional) Swap File Creation What: The custom_data script also creates a swap file if one does not exist. Why: This can help improve VM performance, especially for small VM sizes or memory-intensive workloads.

Step 3 – Configure Remote State

  • Resource Group (mtc-resources)
  • Storage Account (mtcstatetf)
  • Container (tfstate)
name: Terraform Backend Setup

on:
  workflow_dispatch:
  workflow_call:

jobs:
  setup-backend:
    name: Create Storage Account + Container for Terraform State
    runs-on: ubuntu-latest

    steps:
      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Create Backend Storage Resources
        run: |
          RESOURCE_GROUP="mtc-resources"
          STORAGE_ACCOUNT="mtcstatetf" # MUST be globally unique
          CONTAINER_NAME="tfstate"
          LOCATION="israelcentral"

          echo "Checking for existing resource group..."
          az group show --name $RESOURCE_GROUP || \
          az group create --name $RESOURCE_GROUP --location $LOCATION

          echo "Checking for existing storage account..."
          az storage account show --name $STORAGE_ACCOUNT --resource-group $RESOURCE_GROUP || \
          az storage account create \
            --name $STORAGE_ACCOUNT \
            --resource-group $RESOURCE_GROUP \
            --location $LOCATION \
            --sku Standard_LRS \

          echo "Getting storage account key..."
          ACCOUNT_KEY=$(az storage account keys list \
            --resource-group $RESOURCE_GROUP \
            --account-name $STORAGE_ACCOUNT \
            --query '[0].value' -o tsv)

          echo "Checking for existing container..."
          az storage container show \
            --name $CONTAINER_NAME \
            --account-name $STORAGE_ACCOUNT \
            --account-key $ACCOUNT_KEY || \
          az storage container create \
            --name $CONTAINER_NAME \
            --account-name $STORAGE_ACCOUNT \
            --account-key $ACCOUNT_KEY

          echo "Backend storage is ready for Terraform."

Step 4 – Apply Infrastructure

Used terraform apply via GitHub Actions (Terraform Deploy Task9):

  • Automatically writes SSH key
  • Initializes and applies Terraform
  • Imports existing resource group if needed
  • Outputs the VM's public IP for next steps

Output is consumed by other workflows via workflow_call.

name: Terraform Deploy Task9

on:
  workflow_call:
    outputs:
      vm_ip:
        description: "Public IP of the VM"
        value: ${{ jobs.terraform.outputs.vm_ip }}
    secrets:
      AZURE_CREDENTIALS:
        required: true
      VM_SSH_KEY:
        required: true

jobs:
  terraform:
    name: Terraform Setup
    runs-on: ubuntu-latest
    outputs:
      vm_ip: ${{ steps.vm_ip.outputs.vm_ip }}
    defaults:
      run:
        working-directory: week9/week9_summery/Terraform
    env:
      ARM_CLIENT_ID: ${{ fromJson(secrets.AZURE_CREDENTIALS).clientId }}
      ARM_CLIENT_SECRET: ${{ fromJson(secrets.AZURE_CREDENTIALS).clientSecret }}
      ARM_SUBSCRIPTION_ID: ${{ fromJson(secrets.AZURE_CREDENTIALS).subscriptionId }}
      ARM_TENANT_ID: ${{ fromJson(secrets.AZURE_CREDENTIALS).tenantId }}

    steps:
      - name: Checkout Code
        uses: actions/checkout@v3

      - name: Azure Login (CLI)
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.6

      - name: Write SSH Private Key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.VM_SSH_KEY }}" > ~/.ssh/id_rsa
          chmod 600 ~/.ssh/id_rsa

      - name: Derive SSH Public Key
        id: ssh
        run: |
          ssh-keygen -y -f ~/.ssh/id_rsa > ~/.ssh/id_rsa.pub
          echo "ssh_public_key=$(cat ~/.ssh/id_rsa.pub)" >> "$GITHUB_OUTPUT"

      - name: Terraform Init
        run: terraform init

      - name: Conditionally Import Resource Group
        run: |
          RG_NAME="mtc-resources"
          SUB_ID="${{ env.ARM_SUBSCRIPTION_ID }}"
          MODULE_PATH="module.resource_group.azurerm_resource_group.this"

          echo "Checking if resource group is already in Terraform state..."
          if terraform state list | grep -q "$MODULE_PATH"; then
            echo "Resource group already managed in Terraform state. Skipping import."
          else
            echo "Checking if resource group exists in Azure..."
            EXISTS=$(az group exists --name "$RG_NAME")
            if [ "$EXISTS" == "true" ]; then
              echo "Resource group exists. Importing into Terraform state..."
              terraform import -input=false -lock=false \
                -var="ssh_public_key=${{ steps.ssh.outputs.ssh_public_key }}" \
                "$MODULE_PATH" "/subscriptions/$SUB_ID/resourceGroups/$RG_NAME"
            else
              echo "Resource group does not exist. Terraform will create it during apply."
            fi
          fi

      - name: Terraform Apply
        run: |
          terraform apply -auto-approve \
            -var="ssh_public_key=${{ steps.ssh.outputs.ssh_public_key }}"

      - name: Terraform Output
        id: vm_ip
        run: |
          IP=$(terraform output -raw public_ip_address)
          echo "Public IP from Terraform: $IP"
          echo "vm_ip=$IP" >> $GITHUB_OUTPUT

screenshots of successful execution:

image

Step 5,6 – Healthcheck Script and Automatic Deployment (Optional)

This GitHub Action automates the deployment of the app to the Azure VM and performs a basic health check.

Application Stack:

Node.js frontend – exposed on port 3000

MongoDB backend – accessible on port 8080

Uses version-controlled docker-compose.yml

App URL: http://51.4.113.244:3000/

name: Deploy to Azure VM Task8

on:
  workflow_dispatch:
  workflow_call:
    inputs:
      vm_ip:
        required: true
        type: string

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Write SSH key
        run: |
          echo "${{ secrets.VM_SSH_KEY }}" > key.pem
          chmod 600 key.pem

      - name: Create .env file
        run: echo "${{ secrets.ENV_FILE_TASK8 }}" > week8/week8_summery/app/.env

      - name: Clean Docker on VM
        run: |
          ssh -i key.pem -o StrictHostKeyChecking=no azureuser@${{ inputs.vm_ip }} "
            echo 'Cleaning Docker environment...'

            containers=\$(docker ps -q)
            if [ -n \"\$containers\" ]; then
              echo 'Stopping running containers...'
              docker stop \$containers
            else
              echo 'No running containers to stop.'
            fi

            sudo docker container prune -f
            sudo docker image prune -af
            sudo docker network prune -f

            volumes=\$(docker volume ls -q)
            if [ -n \"\$volumes\" ]; then
              echo 'Removing all Docker volumes...'
              docker volume rm \$volumes
            else
              echo 'No Docker volumes to remove.'
            fi
          "

      - name: Debug SSH command
        run: echo "ssh -i key.pem -o StrictHostKeyChecking=no azureuser@${{ inputs.vm_ip }}"

      - name: Sync app folder to Azure VM
        run: |
          ssh -i key.pem -o StrictHostKeyChecking=no azureuser@${{ inputs.vm_ip }} "mkdir -p /home/azureuser/week9summery/app"
          rsync -az --delete --exclude='.git' --exclude='node_modules' -e "ssh -i key.pem -o StrictHostKeyChecking=no" ./week8/week8_summery/app/ azureuser@${{ inputs.vm_ip }}:/home/azureuser/week9summery/app/

      - name: Deploy with Docker Compose
        run: |
          ssh -i key.pem -o StrictHostKeyChecking=no azureuser@${{ inputs.vm_ip }} "
            cd /home/azureuser/week9summery/app &&
            sudo docker-compose down --remove-orphans &&
            sudo docker-compose up -d --build
          "

      - name: Healthcheck and get logs
        run: |
          ssh -i key.pem -o StrictHostKeyChecking=no azureuser@${{ inputs.vm_ip }} "
            sudo docker ps
          " > remote_logs.txt

      - name: Logs from Azure VM
        run: |
          ssh -i key.pem -o StrictHostKeyChecking=no azureuser@${{ inputs.vm_ip }} "
            cd /home/azureuser/week9summery/app
            sudo docker-compose ps
            sudo docker-compose logs --tail=50
          " > remote_logs.txt

      - name: Upload logs
        uses: actions/upload-artifact@v4
        with:
          name: remote-logs
          path: remote_logs.txt

      - name: Cleanup SSH key
        run: rm key.pem

      - name: Cleanup .env file
        if: always()
        run: rm -f week8/week8_summery/app/.env

Screenshot of the result:

image

Step 7 – Logging and Documentation

All Terraform commands (init, plan, apply) are logged automatically into:

  • deployment_log.md
terraform init

Initializing Terraform...

Initializing the backend...
Successfully configured the backend "azurerm"!
Terraform will automatically use this backend unless the backend configuration changes.

Initializing modules...
- network in modules/network
- resource_group in modules/resource_group
- vm in modules/vm

Initializing provider plugins...
Finding hashicorp/azurerm versions matching "~> 3.0"...
Installing hashicorp/azurerm v3.117.1...
Installed hashicorp/azurerm v3.117.1 (signed by HashiCorp)

Terraform has created a lock file `.terraform.lock.hcl` to record the provider selections it made above.
Include this file in your version control repository to guarantee consistent provider selections in future runs.

Terraform has been successfully initialized!

You may now begin working with Terraform.
Try running `terraform plan` to see any changes required for your infrastructure.

If you ever set or change modules or backend configuration, rerun `terraform init`.
Other commands will remind you if reinitialization is necessary.

---

terraform apply

Applying Terraform configuration...
Acquiring state lock. This may take a few moments...

Refreshing state:
- module.resource_group.azurerm_resource_group.this
- module.network.azurerm_network_security_group.this
- module.network.azurerm_virtual_network.this
- module.network.azurerm_public_ip.this
- module.network.azurerm_subnet.this
- module.network.azurerm_network_security_rule.this
- module.network.azurerm_subnet_network_security_group_association.this
- module.network.azurerm_network_interface.this
- module.vm.azurerm_linux_virtual_machine.this

Terraform execution plan:

Resource actions:
~ Update in-place

Resources to modify:
- module.vm.azurerm_linux_virtual_machine.this

Details of modification:
  identity block will be removed:
    - identity_ids: []
    - principal_id: "24b72eeb-2c65-4acb-946f-be0e6ee7c4ca"
    - tenant_id: "485d9998-0bfe-4500-89f1-6d8e49183499"
    - type: "SystemAssigned"

Plan summary:
- 0 resources to add
- 1 resource to change
- 0 resources to destroy

Applying changes:
- module.vm.azurerm_linux_virtual_machine.this: Modifying...
- module.vm.azurerm_linux_virtual_machine.this: Modifications complete after 23s

Releasing state lock. This may take a few moments...

Apply complete!
- Resources: 0 added, 1 changed, 0 destroyed.

Outputs:
- public_ip_address = "51.4.113.244"
- resource_group_name = "mtc-resources"
- ssh_connection_command = "ssh [email protected]"
- virtual_machine_id = "/subscriptions/f9f71262-67c6-48a1-ad2f-75ed5b29135b/resourceGroups/mtc-resources/providers/Microsoft.Compute/virtualMachines/mtc-vm"
- virtual_machine_name = "mtc-vm"

---

terraform output

Public IP from Terraform: 51.4.113.244

Step 8 – Resilience Test

name: Post-Reboot Healthcheck on App Ports Task9

on:
  workflow_dispatch:
  workflow_call:
    inputs:
      vm_ip:
        required: true
        type: string
  

jobs:
  check-access:
    runs-on: ubuntu-latest

    steps:
      - name: Check HTTP access on port 3000 (Frontend)
        run: |
          echo "Checking http://${{ inputs.vm_ip }}:3000 ..." > access-check.log
          if curl --fail --silent http://${{ inputs.vm_ip }}:3000; then
            echo "Port 3000 is accessible." >> access-check.log
          else
            echo "Port 3000 is NOT accessible." >> access-check.log
            exit 1
          fi

      - name: Check HTTP access on port 8080 (Backend)
        run: |
          echo "Checking http://${{ inputs.vm_ip }}:8080 ..." >> access-check.log
          if curl --fail --silent http://${{ inputs.vm_ip }}:8080; then
            echo "Port 8080 is accessible." >> access-check.log
          else
            echo "Port 8080 is NOT accessible." >> access-check.log
            exit 1
          fi

      - name: Upload access check log
        uses: actions/upload-artifact@v4
        with:
          name: post-reboot-healthcheck-log
          path: access-check.log

image

Step 9 – User Experience and Validation:

image


http://51.4.113.244:3000/