Week 8 ‐ 07.04.2025 ‐ 13.04.2025 - Campus-Castolo/m300 GitHub Wiki

Week 8 - 07.04.2025 ‐ 13.04.2025

Week 8 - 07.04.2025 ‐ 13.04.2025 - Task list

Task Description Notes Status Start Date Completion Date Hours Needed
Finalize DRY refactoring in Terraform Complete unfinished DRY optimization from Week 7 Use locals, variables, avoid duplication 08.04.2025 09.04.2025 2.5 Hrs
Optimize ECS setup Enhance ECS reliability, fine-tune task settings More stable deployments using health checks, Fargate config 09.04.2025 10.04.2025 2 Hrs
Optimize RDS setup Tune DB instance performance & security Backup retention, storage autoscaling, correct subnet group 10.04.2025 11.04.2025 1.5 Hrs
Implement IAM policy for EC2-NIC mgmt Create IAM policy for detaching ENIs for ECS ↔ RDS connectivity Applied to user rayan 11.04.2025 11.04.2025 1 Hrs
Lambda-based automated RDS snapshot backup Create and schedule a Lambda function that creates timestamped RDS snapshots Full integration with CloudWatch and IAM 11.04.2025 13.04.2025 3 Hrs

Daily Log 08.04.2025

Daily Log 08.04.2025 - Activity

Task Description Notes Status Start Date Completion Date Hours Needed
Terraform DRY Refactor Completion Finished simplifying the Terraform structure Used locals, cleaner naming 08.04.2025 09.04.2025 2.5 Hrs

Daily Log 08.04.2025 - Summary

Today I finalized the DRY optimization work started in Week 7. I introduced locals for repeated values (like AZ names, tag sets, CIDR blocks), grouped variables more clearly, and created reusable naming conventions (prefixes like m300_). This significantly improves maintainability and reduces future error potential when making infrastructure changes.


Daily Log 10.04.2025

Daily Log 10.04.2025 - Activity

Task Description Notes Status Start Date Completion Date Hours Needed
ECS & RDS Optimization Improved configuration and fault tolerance Enabled health checks & subnet fix 09.04.2025 11.04.2025 3.5 Hrs

Daily Log 10.04.2025 - Summary

ECS:
I refined the ECS task definition for better fault tolerance and performance:

  • Enabled detailed health checks and integrated them with the ALB.
  • Updated resource configurations for optimal Fargate sizing (CPU, memory).
  • Enabled logging and tied it into CloudWatch.

RDS:
I reviewed and improved the RDS setup:

  • Enabled automated backups.
  • Tuned allocated storage with autoscaling enabled.
  • Configured subnet groups to ensure proper multi-AZ failover compatibility.

Daily Log 11.04.2025

Daily Log 11.04.2025 - Activity

Task Description Notes Status Start Date Completion Date Hours Needed
EC2-NIC IAM Policy Created IAM policy for managing EC2 ENIs Applied to IAM user rayan 11.04.2025 11.04.2025 1 Hrs
Lambda RDS Backup - Setup + Role + Events Implemented full Lambda-based automated backup for RDS CloudWatch integration & timestamped tagging 11.04.2025 13.04.2025 3 Hrs

Daily Log 11.04.2025 - Summary

IAM Policy (iam-ec2-nic-policy.tf)

This file defines an IAM policy that allows describing, detaching, and deleting EC2 network interfaces (ENIs).
It’s necessary when ECS services need to tear down or refresh network interfaces associated with RDS connectivity.

The policy was attached to my user (rayan) like so:

resource "aws_iam_user_policy_attachment" "attach_policy_to_user" {
  user       = "rayan"
  policy_arn = aws_iam_policy.ec2_network_interface_management.arn
}

Lambda RDS Backup Functionality (lambda-rds-backup.tf + Python script)

This component automates daily snapshots of the RDS database using AWS Lambda and CloudWatch Events:

  • IAM Role: Grants the Lambda function permission to snapshot RDS, and log to CloudWatch.
  • Lambda Function: Python script creates a timestamped RDS snapshot every day at 02:00 UTC.
  • CloudWatch Rule: Triggers Lambda execution daily using a cron expression.
  • Permissions: Lambda is allowed to be invoked by EventBridge using a lambda:InvokeFunction statement.

Python logic summary:

  • Reads RDS instance identifier from an environment variable.
  • Creates a snapshot named: instance-id-snapshot-YYYYMMDD-HHMMSS.
  • Adds metadata tags like Created, Owner, and Project.

Result:
Snapshots are now created automatically, with clean timestamp formatting and consistent tagging — no manual action needed!


Weekly Summary - 07.04.2025 - 13.04.2025

This week’s focus was on completing unfinished tasks and significantly optimizing infrastructure reliability and automation. The ECS and RDS components were tuned for production-like behavior with improved logging, health checks, and multi-AZ readiness. I also added a fully automated daily backup system for RDS via Lambda, improving resiliency and aligning with best practice.


Weekly Summary - 07.04.2025 - 13.04.2025 - Activity

Task Description Status Completion Date Hours Spent
Terraform DRY refactor completion Finished implementing reusable structure and variable patterns 09.04.2025 2.5 Hrs
ECS configuration optimization Improved health checks, autoscaling, and logging 10.04.2025 2 Hrs
RDS reliability optimization Enabled backup retention and auto-storage scaling 11.04.2025 1.5 Hrs
EC2-NIC IAM policy Created new IAM policy for ENI management 11.04.2025 1 Hrs
Lambda RDS snapshot automation Set up complete daily backup solution using Lambda + EventBridge 13.04.2025 3 Hrs

Weekly Summary - 07.04.2025 - 13.04.2025 - Weekly Results

  • Terraform now fully DRY with simplified maintenance.
  • ECS tuned for higher reliability, lower manual intervention.
  • RDS more fault-tolerant with snapshot automation in place.
  • IAM policies created to support more secure and flexible infrastructure.
  • ✅ First production-like Lambda service implemented (automated backup).

Weekly Summary - 07.04.2025 - 13.04.2025 - Problems

  • Lambda initially failed due to incorrect handler name — fixed by aligning filename and function definition.
  • CloudWatch cron schedule syntax was tricky — used AWS Docs for correct format.
  • Needed to zip Lambda Python code manually for correct deployment.

Weekly Summary - 07.04.2025 - 13.04.2025 - Open Questions

  • Should I store the Lambda snapshot code in a separate Git repo and automate zip creation?
  • How can I notify via email or SNS after each successful snapshot?
  • Would adding lifecycle policies to ECR and RDS snapshot cleanup be a good next step?

last revised on 13.04.2025