Infrastructure Guide - jra3/mulm GitHub Wiki
Infrastructure Guide
Complete reference for AWS infrastructure, critical resources, and deployment procedures.
Quick Reference
- AWS Profile:
basny - Region: us-east-1
- Stack Name: BasnyInfrastructureStack
- Instance Type: t3.micro
- Public IP: 98.91.62.199 (Elastic IP)
⚠️ CRITICAL RESOURCES - DO NOT DELETE ⚠️
The following production resources contain live data and MUST NEVER be deleted.
Production EBS Volume
Volume ID: vol-0aba5b85a1582b2c0
- Size: 8 GB (gp3)
- Mount Point:
/mnt/basny-data(on EC2 instance) - Device:
/dev/xvdf - Contains:
- Production SQLite database (
/mnt/basny-data/app/database/database.db) - Production config with secrets (
/mnt/basny-data/app/config/config.production.json) - Let's Encrypt SSL certificates (
/mnt/basny-data/nginx/certs/) - Nginx logs (
/mnt/basny-data/nginx/logs/)
- Production SQLite database (
Protection Measures:
- CDK deletion policy set to RETAIN
- Protected tag:
DoNotDelete=true - UserData script checks for existing data before formatting
- Stack termination protection enabled
Production Elastic IP
Allocation ID: eipalloc-01f29c26363e0465a
- IP Address:
98.91.62.199 - DNS:
bap.basny.orgpoints to this IP - Purpose: Stable public IP address for production application
Protection Measures:
- CDK uses existing EIP (does not create new one)
- CDK deletion policy set to RETAIN
- Protected tag:
DoNotDelete=true - Stack termination protection enabled
Resource Protection Strategy
Five Layers of Protection
- Visual Identification: Resources tagged with
DoNotDelete=trueand descriptive names - CDK Deletion Policies: RETAIN policies prevent CloudFormation from deleting resources
- Stack Termination Protection: Prevents
cdk destroyfrom running without explicit disable - UserData Safety Checks: Prevents accidental formatting of volumes with existing data
- Documentation: This guide and warnings in project files
⚠️ Data Loss History
On October 6, 2025, the production EBS volume was accidentally formatted due to a race condition in the UserData script. This resulted in:
- Complete loss of production database
- Loss of SSL certificates
- Loss of production config
Lesson Learned: Always test infrastructure changes with detached volumes first.
SSM Parameter Store
Critical resource IDs are stored in AWS Systems Manager Parameter Store. The CDK stack reads these parameters at synth time to reference the production resources.
Parameter Names
/basny/production/data-volume-id→vol-0aba5b85a1582b2c0/basny/production/elastic-ip-allocation-id→eipalloc-01f29c26363e0465a/basny/production/elastic-ip-address→98.91.62.199
Why SSM Parameter Store?
- Single source of truth for resource IDs
- Human-readable parameter names instead of hardcoded IDs in code
- Can update resource IDs without modifying code (if resources need to be recreated)
- Version history tracked by SSM
- Parameters are tagged with
Protected=true
Working with Parameters
# View all production parameters
aws --profile basny ssm get-parameters \
--names /basny/production/data-volume-id \
/basny/production/elastic-ip-allocation-id \
/basny/production/elastic-ip-address
# Update a parameter (ONLY if resource is recreated)
aws --profile basny ssm put-parameter \
--name /basny/production/data-volume-id \
--value vol-NEW_VOLUME_ID \
--overwrite
⚠️ IMPORTANT: Only update these parameters if you've intentionally recreated the resources. Never change them to point to a different resource unless you're absolutely sure.
Architecture Overview
AWS Infrastructure
├── VPC (10.0.0.0/16)
│ └── Public Subnet (10.0.1.0/24)
│ └── EC2 Instance (t3.micro)
│ ├── Root Volume (20GB gp3) - Replaceable
│ └── Data Volume (8GB gp3) - CRITICAL - vol-0aba5b85a1582b2c0
├── Elastic IP (98.91.62.199) - CRITICAL - eipalloc-01f29c26363e0465a
├── Security Group
│ ├── Port 22 (SSH) - 0.0.0.0/0
│ ├── Port 80 (HTTP) - 0.0.0.0/0
│ └── Port 443 (HTTPS) - 0.0.0.0/0
├── IAM Role (EC2 instance permissions)
│ ├── SSM access (for key retrieval)
│ ├── CloudWatch logs
│ └── S3 access (for backups)
└── CloudWatch Log Groups
├── /basny/application
└── /basny/nginx
Initial Deployment
Prerequisites
-
AWS CLI configured with BASNY profile:
aws configure --profile basny -
AWS CDK CLI installed globally:
npm install -g aws-cdk -
Infrastructure dependencies installed:
cd infrastructure npm install
Deployment Steps
1. Bootstrap CDK (first time only)
cd infrastructure
npm run cdk bootstrap -- --profile basny
This creates the CDK toolkit stack in your AWS account (S3 bucket, ECR repo, IAM roles).
2. Build the stack
npm run build
3. Preview changes
npm run cdk diff -- --profile basny
Review the changes that will be made.
4. Deploy the stack
npm run cdk deploy -- --profile basny
The deployment creates:
- VPC with public subnet
- EC2 t3.micro instance
- 20GB root volume (gp3)
- 8GB data volume (gp3, persistent)
- Elastic IP for static address
- Security groups (ports 22, 80, 443)
- IAM role with necessary permissions
- CloudWatch log groups
- SSH key pair (stored in SSM Parameter Store)
5. Note the outputs
After deployment, CDK outputs:
- InstanceId: EC2 instance identifier
- PublicIP: Elastic IP address (98.91.62.199)
- SSHCommand: Command to SSH into instance
- KeyPairId: ID of the SSH key pair
6. Retrieve SSH private key
cd infrastructure
./scripts/get-private-key.sh
This retrieves the private key from AWS Systems Manager and saves it to ~/.ssh/basny-ec2-keypair.pem with correct permissions (400).
7. Update DNS
Point bap.basny.org A record to the Elastic IP address (98.91.62.199).
8. Configure SSH
Add to ~/.ssh/config:
Host BAP
HostName 98.91.62.199
User ec2-user
IdentityFile ~/.ssh/basny-ec2-keypair.pem
StrictHostKeyChecking no
Now you can connect with: ssh BAP
Redeploying Infrastructure
⚠️ IMPORTANT: Your Elastic IP and data volume will be preserved!
The Elastic IP and data volume are referenced (not created) by the CDK stack, so they persist even when the instance is replaced.
Steps to Redeploy
1. Create snapshot before changes
aws --profile basny ec2 create-snapshot \
--volume-id vol-0aba5b85a1582b2c0 \
--description "Pre-deployment backup $(date +%Y%m%d-%H%M%S)" \
--tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=BASNY-PreDeployment-Backup},{Key=DoNotDelete,Value=true}]'
2. Build the CDK stack
cd infrastructure
npm run build
3. Preview changes
npm run cdk diff -- --profile basny
Review what will change:
- EC2 instance may be REPLACED (if configuration changed)
- Elastic IP will remain UNCHANGED
- Data volume will remain UNCHANGED
4. Deploy the updated stack
npm run cdk deploy -- --profile basny
Note: If the instance is replaced, this will terminate your current instance and create a new one. The Elastic IP automatically attaches to the new instance.
5. Verify deployment
# Check instance is running
aws --profile basny ec2 describe-instances \
--filters "Name=tag:Name,Values=BASNY-Production" \
--query 'Reservations[0].Instances[0].State.Name'
# SSH to verify
ssh BAP
# Check containers are running
sudo docker ps
Recovery Procedures
If Database is Lost
1. Locate most recent backup
# SSH to server
ssh BAP
# Check for local backups
ls -lah /tmp/*.sqlite /tmp/*.db
# Check for manual backups
ls -lah ~/backups/*.sqlite ~/backups/*.db
2. Restore database
# Stop application
cd /opt/basny
sudo docker-compose -f docker-compose.prod.yml down
# Copy backup to data volume
sudo cp /path/to/backup.sqlite /mnt/basny-data/app/database/database.db
# Fix permissions (CRITICAL - must be owned by nodejs user UID 1001)
sudo chown 1001:65533 /mnt/basny-data/app/database/database.db
sudo chmod 644 /mnt/basny-data/app/database/database.db
# Restart application
sudo docker-compose -f docker-compose.prod.yml up -d
3. Verify data integrity
sqlite3 /mnt/basny-data/app/database/database.db "PRAGMA integrity_check;"
If Config is Lost
1. Check for local backup
- Look in
/tmp/config.production.json(developer may have saved copy) - Check password manager for credentials
2. Restore config
# Copy config to data volume
sudo cp /tmp/config.production.json /mnt/basny-data/app/config/config.production.json
# Fix permissions (CRITICAL - must be 600 owner-only)
sudo chown 1001:65533 /mnt/basny-data/app/config/config.production.json
sudo chmod 600 /mnt/basny-data/app/config/config.production.json
# Restart application
cd /opt/basny
sudo docker-compose -f docker-compose.prod.yml restart
If SSL Certificates are Lost
1. Create temporary HTTP-only nginx config
# Temporarily disable HTTPS in nginx config
# Edit nginx/conf.d/default.conf to comment out SSL server block
sudo docker-compose -f docker-compose.prod.yml restart nginx
2. Verify DNS is pointing to current IP
dig bap.basny.org +short
# Should return: 98.91.62.199
3. Re-issue SSL certificates
Wait for DNS propagation (usually 5-10 minutes), then:
cd /opt/basny
sudo ./scripts/init-letsencrypt.sh
This will:
- Request new certificates from Let's Encrypt
- Store them in
/mnt/basny-data/nginx/certs/ - Reload nginx with SSL enabled
If Entire Volume is Lost
Prevention (ALWAYS do this before infrastructure changes)
# Create volume snapshot
aws --profile basny ec2 create-snapshot \
--volume-id vol-0aba5b85a1582b2c0 \
--description "Pre-deployment backup $(date +%Y%m%d-%H%M%S)" \
--tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=BASNY-PreDeployment-Backup},{Key=DoNotDelete,Value=true}]'
Recovery (if snapshot exists)
-
Create new volume from snapshot:
aws --profile basny ec2 create-volume \ --snapshot-id snap-XXXXXXXXX \ --availability-zone us-east-1a \ --volume-type gp3 \ --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=BASNY-Data-Restored},{Key=DoNotDelete,Value=true}]' -
Update SSM parameter:
aws --profile basny ssm put-parameter \ --name /basny/production/data-volume-id \ --value vol-NEW_VOLUME_ID \ --overwrite -
Redeploy CDK stack:
cd infrastructure npm run cdk deploy -- --profile basny -
Verify data integrity:
ssh BAP ls -la /mnt/basny-data/app/ sqlite3 /mnt/basny-data/app/database/database.db "PRAGMA integrity_check;"
Backup Strategy
Recommended Backup Schedule
- Daily: Automated database backups to S3 (not yet implemented)
- Weekly: Full EBS volume snapshots
- Pre-deployment: Manual snapshot before any infrastructure changes
Creating Manual Backup
# Database backup
ssh BAP "sqlite3 /mnt/basny-data/app/database/database.db '.backup /tmp/backup-$(date +%Y%m%d-%H%M%S).db'"
# Copy to local machine
scp BAP:/tmp/backup-*.db ~/backups/
# EBS snapshot via AWS CLI
aws --profile basny ec2 create-snapshot \
--volume-id vol-0aba5b85a1582b2c0 \
--description "Manual backup $(date +%Y%m%d-%H%M%S)" \
--tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=BASNY-Manual-Backup},{Key=DoNotDelete,Value=true}]'
Restoring from Snapshot
# List available snapshots
aws --profile basny ec2 describe-snapshots \
--owner-ids self \
--filters "Name=tag:Name,Values=BASNY-*" \
--query 'Snapshots[*].[SnapshotId,StartTime,Description]' \
--output table
# Create volume from snapshot
aws --profile basny ec2 create-volume \
--snapshot-id snap-XXXXXXXXX \
--availability-zone us-east-1a \
--volume-type gp3 \
--size 8
Testing Infrastructure Changes Safely
NEVER test infrastructure changes with the production volume attached!
Safe Testing Procedure
-
Create test volume:
aws --profile basny ec2 create-volume \ --availability-zone us-east-1a \ --size 8 \ --volume-type gp3 \ --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=BASNY-Test}]' -
Update SSM parameter temporarily:
aws --profile basny ssm put-parameter \ --name /basny/test/data-volume-id \ --value vol-TEST_VOLUME_ID \ --overwrite -
Deploy to separate stack:
cd infrastructure # Modify stack name in bin/infrastructure.ts to use test name npm run cdk deploy -- --profile basny -
Verify behavior: Ensure UserData script works correctly
-
Delete test resources:
npm run cdk destroy -- --profile basny aws --profile basny ec2 delete-volume --volume-id vol-TEST_VOLUME_ID -
Deploy to production: Only after thorough testing
Pre-Deployment Checklist
Before running ANY cdk deploy or infrastructure changes:
- Create snapshot of production EBS volume
- Verify production volume is NOT attached to test instance
- Review UserData script for safety checks
- Verify RETAIN deletion policies are set
- Confirm stack termination protection is enabled
- Have recent database backup available locally
- Test changes on separate stack first
- Review
cdk diffoutput carefully
Cost Management
Current Monthly Costs (Approximate)
- EC2 t3.micro: ~$8/month (730 hours)
- EBS storage (28GB total): ~$2.80/month
- Root volume: 20GB gp3 = $1.60
- Data volume: 8GB gp3 = $0.64
- Snapshots: Variable (~$0.50 per snapshot/month)
- Elastic IP: Free while attached to running instance
- Data transfer: Variable (first 100GB free)
Total: ~$11-15/month
Cost Optimization Tips
-
Stop instance during off-hours (if acceptable):
aws --profile basny ec2 stop-instances --instance-ids i-XXXXXXXXXNote: Elastic IP remains free while instance is stopped
-
Delete old snapshots:
# List old snapshots aws --profile basny ec2 describe-snapshots --owner-ids self # Delete specific snapshot aws --profile basny ec2 delete-snapshot --snapshot-id snap-XXXXXXXXX -
Monitor CloudWatch metrics to ensure right-sizing
Security Considerations
Network Security
- Security Group: Restricts inbound traffic to ports 22, 80, 443
- SSH: Key-based authentication only (no passwords)
- Consider: Restricting SSH to specific IP addresses
IAM Permissions
The EC2 instance has an IAM role with:
- SSM Parameter Store read access (for SSH key retrieval)
- CloudWatch logs write access
- S3 access for backups (when implemented)
Principle of least privilege: Role only has necessary permissions
Secrets Management
- Current: Production config stored in
/mnt/basny-data/app/config/config.production.jsonwith 600 permissions - Future: Consider migrating to AWS Secrets Manager or Parameter Store (Issue #80)
Updates and Patching
- OS updates: UserData script enables automatic security updates
- Docker images: Rebuild regularly to get latest base image updates
- Dependencies: Dependabot monitors npm packages (enabled Issue #83)
Additional Resources
- Production Deployment - Application deployment procedures
- Monitoring & Logs - Observability and debugging
- Troubleshooting - Common issues and solutions
- Backup & Recovery - Detailed backup procedures
Emergency Contacts
- Infrastructure Issues: Check CloudWatch alarms and EC2 instance health
- DNS Management: Contact domain administrator
- AWS Support: File support case if needed (requires support plan)
Additional Notes
- The UserData script (
scripts/ec2-userdata.sh) will NOT format a volume if it detects existing data - The initialization flag
/var/lib/cloud/basny-initializedprevents re-initialization on instance reboot - All Docker volumes are mounted from the persistent EBS volume, not the root volume
- Root volume (
/dev/xvda) can be safely replaced - it contains no persistent data - SSH key pair is automatically created by CDK and stored in SSM Parameter Store
- Private key is retrievable via
infrastructure/scripts/get-private-key.sh