Project Concept ‐ Monitoring Concept - Campus-Castolo/m300 GitHub Wiki
📊 Monitoring Concept – Terraform-Based AWS Infrastructure
graph TD
A[Monitoring Overview]
A --> Logs[Logging]
Logs --> ECSLogs[/ecs/wordpress Log Group/]
Logs --> LambdaLogs[/aws/lambda/rds-backup Log Group/]
A --> Metrics[Metrics Collection]
Metrics --> ECSMetrics[ECS CPU/Memory]
Metrics --> RDSMetrics[RDS Performance Metrics]
Metrics --> LambdaMetrics[Lambda Invocations & Duration]
A --> Alerts[CloudWatch Alarms]
Alerts --> HighCPU[Alarm: ECS High CPU]
Alerts --> SNS[Alerts sent to SNS Topic]
A --> Visualization[Visualization]
Visualization --> CWConsole[CloudWatch Console]
Visualization --> Insights[CloudWatch Logs Insights]
Visualization --> Dashboards[Optional Dashboards]
A --> Roles[Operational Responsibility]
Roles --> Developer[Developer: Log & Metric Review]
Roles --> DevOps[Admin: Alarm & Incident Response]
Roles --> CloudTeam[Cloud Team: Platform Reliability]
This document describes the monitoring and observability strategy implemented in the cloud environment, based on the Terraform configuration. It covers log collection, metrics, alerting, retention, and responsibility for system monitoring and recovery.
1. Goals of Monitoring
- Ensure continuous visibility into service health and performance
- Enable rapid detection of anomalies, failures, or threshold breaches
- Facilitate historical analysis for optimization and audits
- Provide automatic alerts and notifications for predefined events
2. Logging Infrastructure
🔹 Services and Resources
Service | Resource | Log Group |
---|---|---|
ECS Tasks | aws_cloudwatch_log_group.ecs_logs |
/ecs/wordpress |
Lambda Backup | Defined in lambda-rds-backup.tf |
/aws/lambda/rds-backup |
🔹 Retention Policy
- Log retention is explicitly set for ECS logs: 14 days
- Lambda logs use default retention unless otherwise configured
resource "aws_cloudwatch_log_group" "ecs_logs" {
name = "/ecs/wordpress"
retention_in_days = 14
}
3. Metrics Collection
📈 CloudWatch Metrics Used
Metric Category | Examples | Source |
---|---|---|
ECS Metrics | CPUUtilization, MemoryUtilization | ECS Fargate |
RDS Metrics | CPUUtilization, FreeStorageSpace | Amazon RDS |
Lambda Metrics | Invocation count, Duration | AWS Lambda |
All metrics are collected automatically by AWS services and made available in CloudWatch.
4. Alarm Configuration
🚨 Defined Alarms
Alarms are defined via aws_cloudwatch_metric_alarm
, for example:
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "ecs-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/ECS"
period = 60
statistic = "Average"
threshold = 80
alarm_actions = [aws_sns_topic.alerts.arn]
}
🔔 Notification Channel
resource "aws_sns_topic" "alerts" {
name = "cloudwatch-alerts"
}
- Alarms trigger messages to an SNS topic.
- Subscribers (email/SMS/Webhook) must be manually confirmed via AWS Console.
5. Dashboard and Visualization
- ECS and RDS metrics are viewable in the CloudWatch Console.
- ECS Log Insights and Lambda execution logs can be queried by developers.
- Optional: Create CloudWatch Dashboards or integrate with external systems like Grafana.
📸 Insert Screenshot of CloudWatch Dashboard or Metrics Graph Here
6. Operational Responsibility
Role | Responsibility |
---|---|
Developer | Reviews logs, traces issues, and interprets metrics |
Admin/DevOps | Responds to alerts, maintains alarm configurations |
Cloud Team | Ensures availability and scaling of observability stack |
✅ Summary
Component | Status | Notes |
---|---|---|
ECS Log Group | ✅ Configured | 14-day retention |
RDS Metrics | ✅ Available | Auto-enabled by RDS |
Alarms & Alerts | ✅ Defined | Integrated with SNS |
Dashboards | ⏳ Optional | Can be added in CloudWatch |
Lambda Logs | ✅ Functional | Uses default CloudWatch group |
This monitoring concept ensures end-to-end visibility into the application's performance, security posture, and operational resilience. Additional tools (e.g., Prometheus, Grafana) can be integrated in the future.