Week 8 ‐ 07.04.2025 ‐ 13.04.2025 - Campus-Castolo/m300 GitHub Wiki

Week 8 - 07.04.2025 ‐ 13.04.2025

Week 8 - 07.04.2025 ‐ 13.04.2025 - Task list

Task	Description	Notes	Status	Start Date	Completion Date	Hours Needed
Finalize DRY refactoring in Terraform	Complete unfinished DRY optimization from Week 7	Use `locals`, `variables`, avoid duplication	✅	08.04.2025	09.04.2025	2.5 Hrs
Optimize ECS setup	Enhance ECS reliability, fine-tune task settings	More stable deployments using health checks, Fargate config	✅	09.04.2025	10.04.2025	2 Hrs
Optimize RDS setup	Tune DB instance performance & security	Backup retention, storage autoscaling, correct subnet group	✅	10.04.2025	11.04.2025	1.5 Hrs
Implement IAM policy for EC2-NIC mgmt	Create IAM policy for detaching ENIs for ECS ↔ RDS connectivity	Applied to user `rayan`	✅	11.04.2025	11.04.2025	1 Hrs
Lambda-based automated RDS snapshot backup	Create and schedule a Lambda function that creates timestamped RDS snapshots	Full integration with CloudWatch and IAM	✅	11.04.2025	13.04.2025	3 Hrs

Daily Log 08.04.2025

Daily Log 08.04.2025 - Activity

Task	Description	Notes	Status	Start Date	Completion Date	Hours Needed
Terraform DRY Refactor Completion	Finished simplifying the Terraform structure	Used `locals`, cleaner naming	✅	08.04.2025	09.04.2025	2.5 Hrs

Daily Log 08.04.2025 - Summary

Today I finalized the DRY optimization work started in Week 7. I introduced locals for repeated values (like AZ names, tag sets, CIDR blocks), grouped variables more clearly, and created reusable naming conventions (prefixes like m300_). This significantly improves maintainability and reduces future error potential when making infrastructure changes.

Daily Log 10.04.2025

Daily Log 10.04.2025 - Activity

Task	Description	Notes	Status	Start Date	Completion Date	Hours Needed
ECS & RDS Optimization	Improved configuration and fault tolerance	Enabled health checks & subnet fix	✅	09.04.2025	11.04.2025	3.5 Hrs

Daily Log 10.04.2025 - Summary

ECS:
I refined the ECS task definition for better fault tolerance and performance:

Enabled detailed health checks and integrated them with the ALB.
Updated resource configurations for optimal Fargate sizing (CPU, memory).
Enabled logging and tied it into CloudWatch.

RDS:
I reviewed and improved the RDS setup:

Enabled automated backups.
Tuned allocated storage with autoscaling enabled.
Configured subnet groups to ensure proper multi-AZ failover compatibility.

Daily Log 11.04.2025

Daily Log 11.04.2025 - Activity

Task	Description	Notes	Status	Start Date	Completion Date	Hours Needed
EC2-NIC IAM Policy	Created IAM policy for managing EC2 ENIs	Applied to IAM user `rayan`	✅	11.04.2025	11.04.2025	1 Hrs
Lambda RDS Backup - Setup + Role + Events	Implemented full Lambda-based automated backup for RDS	CloudWatch integration & timestamped tagging	✅	11.04.2025	13.04.2025	3 Hrs

Daily Log 11.04.2025 - Summary

IAM Policy (`iam-ec2-nic-policy.tf`)

This file defines an IAM policy that allows describing, detaching, and deleting EC2 network interfaces (ENIs).
It’s necessary when ECS services need to tear down or refresh network interfaces associated with RDS connectivity.

The policy was attached to my user (rayan) like so:

resource "aws_iam_user_policy_attachment" "attach_policy_to_user" {
  user       = "rayan"
  policy_arn = aws_iam_policy.ec2_network_interface_management.arn
}

Lambda RDS Backup Functionality (`lambda-rds-backup.tf` + Python script)

This component automates daily snapshots of the RDS database using AWS Lambda and CloudWatch Events:

IAM Role: Grants the Lambda function permission to snapshot RDS, and log to CloudWatch.
Lambda Function: Python script creates a timestamped RDS snapshot every day at 02:00 UTC.
CloudWatch Rule: Triggers Lambda execution daily using a cron expression.
Permissions: Lambda is allowed to be invoked by EventBridge using a lambda:InvokeFunction statement.

Python logic summary:

Reads RDS instance identifier from an environment variable.
Creates a snapshot named: instance-id-snapshot-YYYYMMDD-HHMMSS.
Adds metadata tags like Created, Owner, and Project.

Result:
Snapshots are now created automatically, with clean timestamp formatting and consistent tagging — no manual action needed!

Weekly Summary - 07.04.2025 - 13.04.2025

This week’s focus was on completing unfinished tasks and significantly optimizing infrastructure reliability and automation. The ECS and RDS components were tuned for production-like behavior with improved logging, health checks, and multi-AZ readiness. I also added a fully automated daily backup system for RDS via Lambda, improving resiliency and aligning with best practice.

Weekly Summary - 07.04.2025 - 13.04.2025 - Activity

Task	Description	Status	Completion Date	Hours Spent
Terraform DRY refactor completion	Finished implementing reusable structure and variable patterns	✅	09.04.2025	2.5 Hrs
ECS configuration optimization	Improved health checks, autoscaling, and logging	✅	10.04.2025	2 Hrs
RDS reliability optimization	Enabled backup retention and auto-storage scaling	✅	11.04.2025	1.5 Hrs
EC2-NIC IAM policy	Created new IAM policy for ENI management	✅	11.04.2025	1 Hrs
Lambda RDS snapshot automation	Set up complete daily backup solution using Lambda + EventBridge	✅	13.04.2025	3 Hrs

Weekly Summary - 07.04.2025 - 13.04.2025 - Weekly Results

✅ Terraform now fully DRY with simplified maintenance.
✅ ECS tuned for higher reliability, lower manual intervention.
✅ RDS more fault-tolerant with snapshot automation in place.
✅ IAM policies created to support more secure and flexible infrastructure.
✅ First production-like Lambda service implemented (automated backup).

Weekly Summary - 07.04.2025 - 13.04.2025 - Problems

Lambda initially failed due to incorrect handler name — fixed by aligning filename and function definition.
CloudWatch cron schedule syntax was tricky — used AWS Docs for correct format.
Needed to zip Lambda Python code manually for correct deployment.

Weekly Summary - 07.04.2025 - 13.04.2025 - Open Questions

Should I store the Lambda snapshot code in a separate Git repo and automate zip creation?
How can I notify via email or SNS after each successful snapshot?
Would adding lifecycle policies to ECR and RDS snapshot cleanup be a good next step?