Project Idea ‐ EKS ECS - Campus-Castolo/m300 GitHub Wiki
🚀 Project Overview
This project aims to build and deploy a scalable and highly available application infrastructure using AWS services. A custom Docker image, built from a GitHub repository, is pushed to Amazon Elastic Container Registry (ECR) and deployed on either an ECS (Elastic Container Service) or EKS (Elastic Kubernetes Service) cluster.
The backend relies on a primary Amazon RDS instance, which is replicated into a secondary read-only RDS instance located in a different Availability Zone for redundancy and performance. Monitoring and logging are managed through Amazon CloudWatch, and a robust backup mechanism ensures that automated snapshots of the primary RDS database are regularly stored in an Amazon S3 bucket.
This infrastructure supports high availability, fault tolerance, continuous monitoring, and disaster recovery.
🛠️ Technologies and Tools
Category | Tool/Service | Purpose |
---|---|---|
Version Control | GitHub | Source code repository and Docker image build trigger |
Containerization | Docker | Containerizing the application |
Container Registry | Amazon ECR | Storing and versioning Docker images |
Orchestration | Amazon ECS or EKS | Deploying and managing containers at scale |
Database (Primary) | Amazon RDS (MySQL/PostgreSQL) | Persistent data storage for the application |
Database (Replica) | Amazon RDS Read Replica | Redundant, read-optimized copy of the primary DB for failover/load |
Monitoring | Amazon CloudWatch | Monitoring performance and logging |
Backup | Amazon S3 + Lambda | Automated snapshot storage for disaster recovery |
IaC | Terraform | Infrastructure automation and reproducibility |
Networking | Amazon VPC, ALB | Secure traffic routing and load balancing |
Security | IAM, Security Groups, Secrets Manager | Access control, secrets management, and secure communication |
🎯 Goals and Functionality
-
✅ Automated CI/CD Pipeline
- Build Docker image on GitHub push
- Push to Amazon ECR
- Deploy to ECS/EKS cluster
-
✅ Highly Available Database Layer
- Deploy primary RDS instance in AZ-1
- Deploy read replica in AZ-2 for load balancing and failover
-
✅ Centralized Monitoring
- Monitor cluster and RDS metrics using CloudWatch
- Log ingestion and alerts for performance anomalies
-
✅ Automated Backups
- Take regular snapshots of the primary RDS
- Use Lambda to copy snapshots to S3 with timestamped tags
-
✅ Scalable and Secure Infrastructure
- Load balancing using ALB
- Secure access using IAM and environment secrets
⚠️ Probable Challenges
Area | Potential Challenge | Suggested Mitigation |
---|---|---|
ECR Authentication | Handling access tokens from GitHub to ECR | Use OIDC or deploy GitHub Actions runner inside AWS |
RDS Replication Lag | Delay between primary and replica under high write loads | Monitor replica lag and set CloudWatch alarms |
Cluster Scaling | Handling unexpected traffic spikes | Enable auto-scaling for ECS tasks or EKS pods |
Snapshot Automation | Ensuring consistent, timestamped backups and lifecycle policies for S3 | Use AWS Lambda + EventBridge scheduler with tags |
Cost Management | ECS/EKS, RDS, and S3 can incur significant costs | Use budgeting alerts and clean up unused resources regularly |
Terraform State Handling | Managing state files securely and collaboratively | Use remote backends like S3 + DynamoDB for locking |
Security Compliance | Handling secrets, enforcing least-privilege IAM policies | Use AWS Secrets Manager + strict IAM role definitions |
Multi-AZ Network Design | Ensuring consistent connectivity and failover between zones | Carefully design subnets, routing tables, and ALB configuration |