Week 8 ‐ 07.04.2025 ‐ 13.04.2025 - Campus-Castolo/m300 GitHub Wiki
Week 8 - 07.04.2025 ‐ 13.04.2025
Week 8 - 07.04.2025 ‐ 13.04.2025 - Task list
Task | Description | Notes | Status | Start Date | Completion Date | Hours Needed |
---|---|---|---|---|---|---|
Finalize DRY refactoring in Terraform | Complete unfinished DRY optimization from Week 7 | Use locals , variables , avoid duplication |
✅ | 08.04.2025 | 09.04.2025 | 2.5 Hrs |
Optimize ECS setup | Enhance ECS reliability, fine-tune task settings | More stable deployments using health checks, Fargate config | ✅ | 09.04.2025 | 10.04.2025 | 2 Hrs |
Optimize RDS setup | Tune DB instance performance & security | Backup retention, storage autoscaling, correct subnet group | ✅ | 10.04.2025 | 11.04.2025 | 1.5 Hrs |
Implement IAM policy for EC2-NIC mgmt | Create IAM policy for detaching ENIs for ECS ↔ RDS connectivity | Applied to user rayan |
✅ | 11.04.2025 | 11.04.2025 | 1 Hrs |
Lambda-based automated RDS snapshot backup | Create and schedule a Lambda function that creates timestamped RDS snapshots | Full integration with CloudWatch and IAM | ✅ | 11.04.2025 | 13.04.2025 | 3 Hrs |
Daily Log 08.04.2025
Daily Log 08.04.2025 - Activity
Task | Description | Notes | Status | Start Date | Completion Date | Hours Needed |
---|---|---|---|---|---|---|
Terraform DRY Refactor Completion | Finished simplifying the Terraform structure | Used locals , cleaner naming |
✅ | 08.04.2025 | 09.04.2025 | 2.5 Hrs |
Daily Log 08.04.2025 - Summary
Today I finalized the DRY optimization work started in Week 7. I introduced locals
for repeated values (like AZ names, tag sets, CIDR blocks), grouped variables more clearly, and created reusable naming conventions (prefixes like m300_
). This significantly improves maintainability and reduces future error potential when making infrastructure changes.
Daily Log 10.04.2025
Daily Log 10.04.2025 - Activity
Task | Description | Notes | Status | Start Date | Completion Date | Hours Needed |
---|---|---|---|---|---|---|
ECS & RDS Optimization | Improved configuration and fault tolerance | Enabled health checks & subnet fix | ✅ | 09.04.2025 | 11.04.2025 | 3.5 Hrs |
Daily Log 10.04.2025 - Summary
ECS:
I refined the ECS task definition for better fault tolerance and performance:
- Enabled detailed health checks and integrated them with the ALB.
- Updated resource configurations for optimal Fargate sizing (CPU, memory).
- Enabled logging and tied it into CloudWatch.
RDS:
I reviewed and improved the RDS setup:
- Enabled automated backups.
- Tuned allocated storage with autoscaling enabled.
- Configured subnet groups to ensure proper multi-AZ failover compatibility.
Daily Log 11.04.2025
Daily Log 11.04.2025 - Activity
Task | Description | Notes | Status | Start Date | Completion Date | Hours Needed |
---|---|---|---|---|---|---|
EC2-NIC IAM Policy | Created IAM policy for managing EC2 ENIs | Applied to IAM user rayan |
✅ | 11.04.2025 | 11.04.2025 | 1 Hrs |
Lambda RDS Backup - Setup + Role + Events | Implemented full Lambda-based automated backup for RDS | CloudWatch integration & timestamped tagging | ✅ | 11.04.2025 | 13.04.2025 | 3 Hrs |
Daily Log 11.04.2025 - Summary
iam-ec2-nic-policy.tf
)
IAM Policy (This file defines an IAM policy that allows describing, detaching, and deleting EC2 network interfaces (ENIs).
It’s necessary when ECS services need to tear down or refresh network interfaces associated with RDS connectivity.
The policy was attached to my user (rayan
) like so:
resource "aws_iam_user_policy_attachment" "attach_policy_to_user" {
user = "rayan"
policy_arn = aws_iam_policy.ec2_network_interface_management.arn
}
lambda-rds-backup.tf
+ Python script)
Lambda RDS Backup Functionality (This component automates daily snapshots of the RDS database using AWS Lambda and CloudWatch Events:
- IAM Role: Grants the Lambda function permission to snapshot RDS, and log to CloudWatch.
- Lambda Function: Python script creates a timestamped RDS snapshot every day at 02:00 UTC.
- CloudWatch Rule: Triggers Lambda execution daily using a cron expression.
- Permissions: Lambda is allowed to be invoked by EventBridge using a
lambda:InvokeFunction
statement.
Python logic summary:
- Reads RDS instance identifier from an environment variable.
- Creates a snapshot named:
instance-id-snapshot-YYYYMMDD-HHMMSS
. - Adds metadata tags like
Created
,Owner
, andProject
.
Result:
Snapshots are now created automatically, with clean timestamp formatting and consistent tagging — no manual action needed!
Weekly Summary - 07.04.2025 - 13.04.2025
This week’s focus was on completing unfinished tasks and significantly optimizing infrastructure reliability and automation. The ECS and RDS components were tuned for production-like behavior with improved logging, health checks, and multi-AZ readiness. I also added a fully automated daily backup system for RDS via Lambda, improving resiliency and aligning with best practice.
Weekly Summary - 07.04.2025 - 13.04.2025 - Activity
Task | Description | Status | Completion Date | Hours Spent |
---|---|---|---|---|
Terraform DRY refactor completion | Finished implementing reusable structure and variable patterns | ✅ | 09.04.2025 | 2.5 Hrs |
ECS configuration optimization | Improved health checks, autoscaling, and logging | ✅ | 10.04.2025 | 2 Hrs |
RDS reliability optimization | Enabled backup retention and auto-storage scaling | ✅ | 11.04.2025 | 1.5 Hrs |
EC2-NIC IAM policy | Created new IAM policy for ENI management | ✅ | 11.04.2025 | 1 Hrs |
Lambda RDS snapshot automation | Set up complete daily backup solution using Lambda + EventBridge | ✅ | 13.04.2025 | 3 Hrs |
Weekly Summary - 07.04.2025 - 13.04.2025 - Weekly Results
- ✅ Terraform now fully DRY with simplified maintenance.
- ✅ ECS tuned for higher reliability, lower manual intervention.
- ✅ RDS more fault-tolerant with snapshot automation in place.
- ✅ IAM policies created to support more secure and flexible infrastructure.
- ✅ First production-like Lambda service implemented (automated backup).
Weekly Summary - 07.04.2025 - 13.04.2025 - Problems
- Lambda initially failed due to incorrect
handler
name — fixed by aligning filename and function definition. - CloudWatch cron schedule syntax was tricky — used AWS Docs for correct format.
- Needed to zip Lambda Python code manually for correct deployment.
Weekly Summary - 07.04.2025 - 13.04.2025 - Open Questions
- Should I store the Lambda snapshot code in a separate Git repo and automate zip creation?
- How can I notify via email or SNS after each successful snapshot?
- Would adding lifecycle policies to ECR and RDS snapshot cleanup be a good next step?