Infrastructure Guide - jra3/mulm GitHub Wiki

Infrastructure Guide

Complete reference for AWS infrastructure, critical resources, and deployment procedures.

Quick Reference

  • AWS Profile: basny
  • Region: us-east-1
  • Stack Name: BasnyInfrastructureStack
  • Instance Type: t3.micro
  • Public IP: 98.91.62.199 (Elastic IP)

⚠️ CRITICAL RESOURCES - DO NOT DELETE ⚠️

The following production resources contain live data and MUST NEVER be deleted.

Production EBS Volume

Volume ID: vol-0aba5b85a1582b2c0

  • Size: 8 GB (gp3)
  • Mount Point: /mnt/basny-data (on EC2 instance)
  • Device: /dev/xvdf
  • Contains:
    • Production SQLite database (/mnt/basny-data/app/database/database.db)
    • Production config with secrets (/mnt/basny-data/app/config/config.production.json)
    • Let's Encrypt SSL certificates (/mnt/basny-data/nginx/certs/)
    • Nginx logs (/mnt/basny-data/nginx/logs/)

Protection Measures:

  • CDK deletion policy set to RETAIN
  • Protected tag: DoNotDelete=true
  • UserData script checks for existing data before formatting
  • Stack termination protection enabled

Production Elastic IP

Allocation ID: eipalloc-01f29c26363e0465a

  • IP Address: 98.91.62.199
  • DNS: bap.basny.org points to this IP
  • Purpose: Stable public IP address for production application

Protection Measures:

  • CDK uses existing EIP (does not create new one)
  • CDK deletion policy set to RETAIN
  • Protected tag: DoNotDelete=true
  • Stack termination protection enabled

Resource Protection Strategy

Five Layers of Protection

  1. Visual Identification: Resources tagged with DoNotDelete=true and descriptive names
  2. CDK Deletion Policies: RETAIN policies prevent CloudFormation from deleting resources
  3. Stack Termination Protection: Prevents cdk destroy from running without explicit disable
  4. UserData Safety Checks: Prevents accidental formatting of volumes with existing data
  5. Documentation: This guide and warnings in project files

⚠️ Data Loss History

On October 6, 2025, the production EBS volume was accidentally formatted due to a race condition in the UserData script. This resulted in:

  • Complete loss of production database
  • Loss of SSL certificates
  • Loss of production config

Lesson Learned: Always test infrastructure changes with detached volumes first.

SSM Parameter Store

Critical resource IDs are stored in AWS Systems Manager Parameter Store. The CDK stack reads these parameters at synth time to reference the production resources.

Parameter Names

  • /basny/production/data-volume-idvol-0aba5b85a1582b2c0
  • /basny/production/elastic-ip-allocation-ideipalloc-01f29c26363e0465a
  • /basny/production/elastic-ip-address98.91.62.199

Why SSM Parameter Store?

  • Single source of truth for resource IDs
  • Human-readable parameter names instead of hardcoded IDs in code
  • Can update resource IDs without modifying code (if resources need to be recreated)
  • Version history tracked by SSM
  • Parameters are tagged with Protected=true

Working with Parameters

# View all production parameters
aws --profile basny ssm get-parameters \
  --names /basny/production/data-volume-id \
          /basny/production/elastic-ip-allocation-id \
          /basny/production/elastic-ip-address

# Update a parameter (ONLY if resource is recreated)
aws --profile basny ssm put-parameter \
  --name /basny/production/data-volume-id \
  --value vol-NEW_VOLUME_ID \
  --overwrite

⚠️ IMPORTANT: Only update these parameters if you've intentionally recreated the resources. Never change them to point to a different resource unless you're absolutely sure.

Architecture Overview

AWS Infrastructure
├── VPC (10.0.0.0/16)
│   └── Public Subnet (10.0.1.0/24)
│       └── EC2 Instance (t3.micro)
│           ├── Root Volume (20GB gp3) - Replaceable
│           └── Data Volume (8GB gp3) - CRITICAL - vol-0aba5b85a1582b2c0
├── Elastic IP (98.91.62.199) - CRITICAL - eipalloc-01f29c26363e0465a
├── Security Group
│   ├── Port 22 (SSH) - 0.0.0.0/0
│   ├── Port 80 (HTTP) - 0.0.0.0/0
│   └── Port 443 (HTTPS) - 0.0.0.0/0
├── IAM Role (EC2 instance permissions)
│   ├── SSM access (for key retrieval)
│   ├── CloudWatch logs
│   └── S3 access (for backups)
└── CloudWatch Log Groups
    ├── /basny/application
    └── /basny/nginx

Initial Deployment

Prerequisites

  1. AWS CLI configured with BASNY profile:

    aws configure --profile basny
    
  2. AWS CDK CLI installed globally:

    npm install -g aws-cdk
    
  3. Infrastructure dependencies installed:

    cd infrastructure
    npm install
    

Deployment Steps

1. Bootstrap CDK (first time only)

cd infrastructure
npm run cdk bootstrap -- --profile basny

This creates the CDK toolkit stack in your AWS account (S3 bucket, ECR repo, IAM roles).

2. Build the stack

npm run build

3. Preview changes

npm run cdk diff -- --profile basny

Review the changes that will be made.

4. Deploy the stack

npm run cdk deploy -- --profile basny

The deployment creates:

  • VPC with public subnet
  • EC2 t3.micro instance
  • 20GB root volume (gp3)
  • 8GB data volume (gp3, persistent)
  • Elastic IP for static address
  • Security groups (ports 22, 80, 443)
  • IAM role with necessary permissions
  • CloudWatch log groups
  • SSH key pair (stored in SSM Parameter Store)

5. Note the outputs

After deployment, CDK outputs:

  • InstanceId: EC2 instance identifier
  • PublicIP: Elastic IP address (98.91.62.199)
  • SSHCommand: Command to SSH into instance
  • KeyPairId: ID of the SSH key pair

6. Retrieve SSH private key

cd infrastructure
./scripts/get-private-key.sh

This retrieves the private key from AWS Systems Manager and saves it to ~/.ssh/basny-ec2-keypair.pem with correct permissions (400).

7. Update DNS

Point bap.basny.org A record to the Elastic IP address (98.91.62.199).

8. Configure SSH

Add to ~/.ssh/config:

Host BAP
  HostName 98.91.62.199
  User ec2-user
  IdentityFile ~/.ssh/basny-ec2-keypair.pem
  StrictHostKeyChecking no

Now you can connect with: ssh BAP

Redeploying Infrastructure

⚠️ IMPORTANT: Your Elastic IP and data volume will be preserved!

The Elastic IP and data volume are referenced (not created) by the CDK stack, so they persist even when the instance is replaced.

Steps to Redeploy

1. Create snapshot before changes

aws --profile basny ec2 create-snapshot \
  --volume-id vol-0aba5b85a1582b2c0 \
  --description "Pre-deployment backup $(date +%Y%m%d-%H%M%S)" \
  --tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=BASNY-PreDeployment-Backup},{Key=DoNotDelete,Value=true}]'

2. Build the CDK stack

cd infrastructure
npm run build

3. Preview changes

npm run cdk diff -- --profile basny

Review what will change:

  • EC2 instance may be REPLACED (if configuration changed)
  • Elastic IP will remain UNCHANGED
  • Data volume will remain UNCHANGED

4. Deploy the updated stack

npm run cdk deploy -- --profile basny

Note: If the instance is replaced, this will terminate your current instance and create a new one. The Elastic IP automatically attaches to the new instance.

5. Verify deployment

# Check instance is running
aws --profile basny ec2 describe-instances \
  --filters "Name=tag:Name,Values=BASNY-Production" \
  --query 'Reservations[0].Instances[0].State.Name'

# SSH to verify
ssh BAP

# Check containers are running
sudo docker ps

Recovery Procedures

If Database is Lost

1. Locate most recent backup

# SSH to server
ssh BAP

# Check for local backups
ls -lah /tmp/*.sqlite /tmp/*.db

# Check for manual backups
ls -lah ~/backups/*.sqlite ~/backups/*.db

2. Restore database

# Stop application
cd /opt/basny
sudo docker-compose -f docker-compose.prod.yml down

# Copy backup to data volume
sudo cp /path/to/backup.sqlite /mnt/basny-data/app/database/database.db

# Fix permissions (CRITICAL - must be owned by nodejs user UID 1001)
sudo chown 1001:65533 /mnt/basny-data/app/database/database.db
sudo chmod 644 /mnt/basny-data/app/database/database.db

# Restart application
sudo docker-compose -f docker-compose.prod.yml up -d

3. Verify data integrity

sqlite3 /mnt/basny-data/app/database/database.db "PRAGMA integrity_check;"

If Config is Lost

1. Check for local backup

  • Look in /tmp/config.production.json (developer may have saved copy)
  • Check password manager for credentials

2. Restore config

# Copy config to data volume
sudo cp /tmp/config.production.json /mnt/basny-data/app/config/config.production.json

# Fix permissions (CRITICAL - must be 600 owner-only)
sudo chown 1001:65533 /mnt/basny-data/app/config/config.production.json
sudo chmod 600 /mnt/basny-data/app/config/config.production.json

# Restart application
cd /opt/basny
sudo docker-compose -f docker-compose.prod.yml restart

If SSL Certificates are Lost

1. Create temporary HTTP-only nginx config

# Temporarily disable HTTPS in nginx config
# Edit nginx/conf.d/default.conf to comment out SSL server block
sudo docker-compose -f docker-compose.prod.yml restart nginx

2. Verify DNS is pointing to current IP

dig bap.basny.org +short
# Should return: 98.91.62.199

3. Re-issue SSL certificates

Wait for DNS propagation (usually 5-10 minutes), then:

cd /opt/basny
sudo ./scripts/init-letsencrypt.sh

This will:

  • Request new certificates from Let's Encrypt
  • Store them in /mnt/basny-data/nginx/certs/
  • Reload nginx with SSL enabled

If Entire Volume is Lost

Prevention (ALWAYS do this before infrastructure changes)

# Create volume snapshot
aws --profile basny ec2 create-snapshot \
  --volume-id vol-0aba5b85a1582b2c0 \
  --description "Pre-deployment backup $(date +%Y%m%d-%H%M%S)" \
  --tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=BASNY-PreDeployment-Backup},{Key=DoNotDelete,Value=true}]'

Recovery (if snapshot exists)

  1. Create new volume from snapshot:

    aws --profile basny ec2 create-volume \
      --snapshot-id snap-XXXXXXXXX \
      --availability-zone us-east-1a \
      --volume-type gp3 \
      --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=BASNY-Data-Restored},{Key=DoNotDelete,Value=true}]'
    
  2. Update SSM parameter:

    aws --profile basny ssm put-parameter \
      --name /basny/production/data-volume-id \
      --value vol-NEW_VOLUME_ID \
      --overwrite
    
  3. Redeploy CDK stack:

    cd infrastructure
    npm run cdk deploy -- --profile basny
    
  4. Verify data integrity:

    ssh BAP
    ls -la /mnt/basny-data/app/
    sqlite3 /mnt/basny-data/app/database/database.db "PRAGMA integrity_check;"
    

Backup Strategy

Recommended Backup Schedule

  • Daily: Automated database backups to S3 (not yet implemented)
  • Weekly: Full EBS volume snapshots
  • Pre-deployment: Manual snapshot before any infrastructure changes

Creating Manual Backup

# Database backup
ssh BAP "sqlite3 /mnt/basny-data/app/database/database.db '.backup /tmp/backup-$(date +%Y%m%d-%H%M%S).db'"

# Copy to local machine
scp BAP:/tmp/backup-*.db ~/backups/

# EBS snapshot via AWS CLI
aws --profile basny ec2 create-snapshot \
  --volume-id vol-0aba5b85a1582b2c0 \
  --description "Manual backup $(date +%Y%m%d-%H%M%S)" \
  --tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=BASNY-Manual-Backup},{Key=DoNotDelete,Value=true}]'

Restoring from Snapshot

# List available snapshots
aws --profile basny ec2 describe-snapshots \
  --owner-ids self \
  --filters "Name=tag:Name,Values=BASNY-*" \
  --query 'Snapshots[*].[SnapshotId,StartTime,Description]' \
  --output table

# Create volume from snapshot
aws --profile basny ec2 create-volume \
  --snapshot-id snap-XXXXXXXXX \
  --availability-zone us-east-1a \
  --volume-type gp3 \
  --size 8

Testing Infrastructure Changes Safely

NEVER test infrastructure changes with the production volume attached!

Safe Testing Procedure

  1. Create test volume:

    aws --profile basny ec2 create-volume \
      --availability-zone us-east-1a \
      --size 8 \
      --volume-type gp3 \
      --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=BASNY-Test}]'
    
  2. Update SSM parameter temporarily:

    aws --profile basny ssm put-parameter \
      --name /basny/test/data-volume-id \
      --value vol-TEST_VOLUME_ID \
      --overwrite
    
  3. Deploy to separate stack:

    cd infrastructure
    # Modify stack name in bin/infrastructure.ts to use test name
    npm run cdk deploy -- --profile basny
    
  4. Verify behavior: Ensure UserData script works correctly

  5. Delete test resources:

    npm run cdk destroy -- --profile basny
    aws --profile basny ec2 delete-volume --volume-id vol-TEST_VOLUME_ID
    
  6. Deploy to production: Only after thorough testing

Pre-Deployment Checklist

Before running ANY cdk deploy or infrastructure changes:

  • Create snapshot of production EBS volume
  • Verify production volume is NOT attached to test instance
  • Review UserData script for safety checks
  • Verify RETAIN deletion policies are set
  • Confirm stack termination protection is enabled
  • Have recent database backup available locally
  • Test changes on separate stack first
  • Review cdk diff output carefully

Cost Management

Current Monthly Costs (Approximate)

  • EC2 t3.micro: ~$8/month (730 hours)
  • EBS storage (28GB total): ~$2.80/month
    • Root volume: 20GB gp3 = $1.60
    • Data volume: 8GB gp3 = $0.64
    • Snapshots: Variable (~$0.50 per snapshot/month)
  • Elastic IP: Free while attached to running instance
  • Data transfer: Variable (first 100GB free)

Total: ~$11-15/month

Cost Optimization Tips

  1. Stop instance during off-hours (if acceptable):

    aws --profile basny ec2 stop-instances --instance-ids i-XXXXXXXXX
    

    Note: Elastic IP remains free while instance is stopped

  2. Delete old snapshots:

    # List old snapshots
    aws --profile basny ec2 describe-snapshots --owner-ids self
    
    # Delete specific snapshot
    aws --profile basny ec2 delete-snapshot --snapshot-id snap-XXXXXXXXX
    
  3. Monitor CloudWatch metrics to ensure right-sizing

Security Considerations

Network Security

  • Security Group: Restricts inbound traffic to ports 22, 80, 443
  • SSH: Key-based authentication only (no passwords)
  • Consider: Restricting SSH to specific IP addresses

IAM Permissions

The EC2 instance has an IAM role with:

  • SSM Parameter Store read access (for SSH key retrieval)
  • CloudWatch logs write access
  • S3 access for backups (when implemented)

Principle of least privilege: Role only has necessary permissions

Secrets Management

  • Current: Production config stored in /mnt/basny-data/app/config/config.production.json with 600 permissions
  • Future: Consider migrating to AWS Secrets Manager or Parameter Store (Issue #80)

Updates and Patching

  • OS updates: UserData script enables automatic security updates
  • Docker images: Rebuild regularly to get latest base image updates
  • Dependencies: Dependabot monitors npm packages (enabled Issue #83)

Additional Resources

Emergency Contacts

  • Infrastructure Issues: Check CloudWatch alarms and EC2 instance health
  • DNS Management: Contact domain administrator
  • AWS Support: File support case if needed (requires support plan)

Additional Notes

  • The UserData script (scripts/ec2-userdata.sh) will NOT format a volume if it detects existing data
  • The initialization flag /var/lib/cloud/basny-initialized prevents re-initialization on instance reboot
  • All Docker volumes are mounted from the persistent EBS volume, not the root volume
  • Root volume (/dev/xvda) can be safely replaced - it contains no persistent data
  • SSH key pair is automatically created by CDK and stored in SSM Parameter Store
  • Private key is retrievable via infrastructure/scripts/get-private-key.sh