Project Concept ‐ Monitoring Concept - Campus-Castolo/m300 GitHub Wiki

📊 Monitoring Concept – Terraform-Based AWS Infrastructure

graph TD
  A[Monitoring Overview]

  A --> Logs[Logging]
  Logs --> ECSLogs[/ecs/wordpress Log Group/]
  Logs --> LambdaLogs[/aws/lambda/rds-backup Log Group/]

  A --> Metrics[Metrics Collection]
  Metrics --> ECSMetrics[ECS CPU/Memory]
  Metrics --> RDSMetrics[RDS Performance Metrics]
  Metrics --> LambdaMetrics[Lambda Invocations & Duration]

  A --> Alerts[CloudWatch Alarms]
  Alerts --> HighCPU[Alarm: ECS High CPU]
  Alerts --> SNS[Alerts sent to SNS Topic]

  A --> Visualization[Visualization]
  Visualization --> CWConsole[CloudWatch Console]
  Visualization --> Insights[CloudWatch Logs Insights]
  Visualization --> Dashboards[Optional Dashboards]

  A --> Roles[Operational Responsibility]
  Roles --> Developer[Developer: Log & Metric Review]
  Roles --> DevOps[Admin: Alarm & Incident Response]
  Roles --> CloudTeam[Cloud Team: Platform Reliability]

This document describes the monitoring and observability strategy implemented in the cloud environment, based on the Terraform configuration. It covers log collection, metrics, alerting, retention, and responsibility for system monitoring and recovery.

1. Goals of Monitoring

  • Ensure continuous visibility into service health and performance
  • Enable rapid detection of anomalies, failures, or threshold breaches
  • Facilitate historical analysis for optimization and audits
  • Provide automatic alerts and notifications for predefined events

2. Logging Infrastructure

🔹 Services and Resources

Service Resource Log Group
ECS Tasks aws_cloudwatch_log_group.ecs_logs /ecs/wordpress
Lambda Backup Defined in lambda-rds-backup.tf /aws/lambda/rds-backup

🔹 Retention Policy

  • Log retention is explicitly set for ECS logs: 14 days
  • Lambda logs use default retention unless otherwise configured
resource "aws_cloudwatch_log_group" "ecs_logs" {
  name              = "/ecs/wordpress"
  retention_in_days = 14
}

3. Metrics Collection

📈 CloudWatch Metrics Used

Metric Category Examples Source
ECS Metrics CPUUtilization, MemoryUtilization ECS Fargate
RDS Metrics CPUUtilization, FreeStorageSpace Amazon RDS
Lambda Metrics Invocation count, Duration AWS Lambda

All metrics are collected automatically by AWS services and made available in CloudWatch.


4. Alarm Configuration

🚨 Defined Alarms

Alarms are defined via aws_cloudwatch_metric_alarm, for example:

resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "ecs-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = 80
  alarm_actions       = [aws_sns_topic.alerts.arn]
}

🔔 Notification Channel

resource "aws_sns_topic" "alerts" {
  name = "cloudwatch-alerts"
}
  • Alarms trigger messages to an SNS topic.
  • Subscribers (email/SMS/Webhook) must be manually confirmed via AWS Console.

5. Dashboard and Visualization

  • ECS and RDS metrics are viewable in the CloudWatch Console.
  • ECS Log Insights and Lambda execution logs can be queried by developers.
  • Optional: Create CloudWatch Dashboards or integrate with external systems like Grafana.

📸 Insert Screenshot of CloudWatch Dashboard or Metrics Graph Here


6. Operational Responsibility

Role Responsibility
Developer Reviews logs, traces issues, and interprets metrics
Admin/DevOps Responds to alerts, maintains alarm configurations
Cloud Team Ensures availability and scaling of observability stack

✅ Summary

Component Status Notes
ECS Log Group ✅ Configured 14-day retention
RDS Metrics ✅ Available Auto-enabled by RDS
Alarms & Alerts ✅ Defined Integrated with SNS
Dashboards ⏳ Optional Can be added in CloudWatch
Lambda Logs ✅ Functional Uses default CloudWatch group

This monitoring concept ensures end-to-end visibility into the application's performance, security posture, and operational resilience. Additional tools (e.g., Prometheus, Grafana) can be integrated in the future.