Project Idea ‐ EKS ECS - Campus-Castolo/m300 GitHub Wiki

🚀 Project Overview

This project aims to build and deploy a scalable and highly available application infrastructure using AWS services. A custom Docker image, built from a GitHub repository, is pushed to Amazon Elastic Container Registry (ECR) and deployed on either an ECS (Elastic Container Service) or EKS (Elastic Kubernetes Service) cluster.

The backend relies on a primary Amazon RDS instance, which is replicated into a secondary read-only RDS instance located in a different Availability Zone for redundancy and performance. Monitoring and logging are managed through Amazon CloudWatch, and a robust backup mechanism ensures that automated snapshots of the primary RDS database are regularly stored in an Amazon S3 bucket.

This infrastructure supports high availability, fault tolerance, continuous monitoring, and disaster recovery.

🛠️ Technologies and Tools

Category	Tool/Service	Purpose
Version Control	GitHub	Source code repository and Docker image build trigger
Containerization	Docker	Containerizing the application
Container Registry	Amazon ECR	Storing and versioning Docker images
Orchestration	Amazon ECS or EKS	Deploying and managing containers at scale
Database (Primary)	Amazon RDS (MySQL/PostgreSQL)	Persistent data storage for the application
Database (Replica)	Amazon RDS Read Replica	Redundant, read-optimized copy of the primary DB for failover/load
Monitoring	Amazon CloudWatch	Monitoring performance and logging
Backup	Amazon S3 + Lambda	Automated snapshot storage for disaster recovery
IaC	Terraform	Infrastructure automation and reproducibility
Networking	Amazon VPC, ALB	Secure traffic routing and load balancing
Security	IAM, Security Groups, Secrets Manager	Access control, secrets management, and secure communication

🎯 Goals and Functionality

✅ Automated CI/CD Pipeline
- Build Docker image on GitHub push
- Push to Amazon ECR
- Deploy to ECS/EKS cluster
✅ Highly Available Database Layer
- Deploy primary RDS instance in AZ-1
- Deploy read replica in AZ-2 for load balancing and failover
✅ Centralized Monitoring
- Monitor cluster and RDS metrics using CloudWatch
- Log ingestion and alerts for performance anomalies
✅ Automated Backups
- Take regular snapshots of the primary RDS
- Use Lambda to copy snapshots to S3 with timestamped tags
✅ Scalable and Secure Infrastructure
- Load balancing using ALB
- Secure access using IAM and environment secrets

⚠️ Probable Challenges

Area	Potential Challenge	Suggested Mitigation
ECR Authentication	Handling access tokens from GitHub to ECR	Use OIDC or deploy GitHub Actions runner inside AWS
RDS Replication Lag	Delay between primary and replica under high write loads	Monitor replica lag and set CloudWatch alarms
Cluster Scaling	Handling unexpected traffic spikes	Enable auto-scaling for ECS tasks or EKS pods
Snapshot Automation	Ensuring consistent, timestamped backups and lifecycle policies for S3	Use AWS Lambda + EventBridge scheduler with tags
Cost Management	ECS/EKS, RDS, and S3 can incur significant costs	Use budgeting alerts and clean up unused resources regularly
Terraform State Handling	Managing state files securely and collaboratively	Use remote backends like S3 + DynamoDB for locking
Security Compliance	Handling secrets, enforcing least-privilege IAM policies	Use AWS Secrets Manager + strict IAM role definitions
Multi-AZ Network Design	Ensuring consistent connectivity and failover between zones	Carefully design subnets, routing tables, and ALB configuration