Monitoring Dashboard Documentation - VittorioDeMarzi/hero-beans GitHub Wiki

1. Overview

This CloudWatch Dashboard monitors the Hero Beans infrastructure. It covers key metrics for both EC2 (application server) and RDS (database).

🔹 Why Do We Need Monitoring?

Monitoring is the only way to detect when your app is:

overloaded,
failing silently,
responding slowly,
or about to crash due to resource exhaustion.

It helps not only during outages but prevents incidents by revealing bottlenecks early. It also allows you to scale efficiently and avoid overprovisioning.

2. EC2 Metrics (hero-beans-ec2)

🔹 CPU Utilization

Widget Title: EC2 CPU Usage (%)
Metric: AWS/EC2 – CPUUtilization
Description: Shows the percentage of CPU usage.
Purpose: Helps determine if the server can handle the workload or needs scaling.
What it shows: How busy the server’s CPU is.
Why it's important:
- Sustained high CPU (>80%) means the app may slow down or even fail to respond.
- It's a signal for scaling or optimization.
Business impact: High CPU = slower response times = user frustration = churn.

🔹 Network In/Out

Widget Title: EC2 Network In/Out
Metrics:
- NetworkIn → incoming traffic
- NetworkOut → outgoing traffic
Description: Tracks network activity in bytes.
Purpose: Detects network bottlenecks (e.g., DDoS spikes or large data transfers).
What it shows: Inbound and outbound traffic on the EC2 instance.
Why it's important:
- A sudden drop might indicate downtime or broken routing.
- A spike might indicate DDoS or infinite request loops.

Business impact: Helps detect silent downtime or suspicious traffic patterns early.

🔹 Disk I/O (Bytes)

Widget Title: EC2 Disk I/O (Bytes)
Metrics:
- EBSReadBytes → disk read operations
- EBSWriteBytes → disk write operations
Description: Measures EBS disk throughput.
Purpose: Monitors storage performance (e.g., read/write bottlenecks).
What it shows: Read/write operations on the EC2 instance’s disk.
Why it's important:
- Useful for analyzing app performance with logs, image storage, or database activity.
- High I/O may reveal disk bottlenecks.
Business impact: Disk issues can delay order saving, file uploads, etc.

3. RDS Metrics (hero-beans-db)

🔹 Current Snapshot

Widget Title: RDS (hero-beans-db) - Current
Metrics (singleValue + sparkline):
- CPUUtilization → current DB CPU load
- DatabaseConnections → number of active connections
- FreeStorageSpace → remaining storage in bytes
Description: Quick real-time status of the DB.
Purpose: Detects problems like too many connections or low storage space.

🔹 Historical Trend

Widget Title: RDS (hero-beans-db) – CPU / Connections / FreeStorage (Trend)
Metrics (timeSeries):
- CPUUtilization (Average, %)
- DatabaseConnections (Average, count)
- FreeStorageSpace (Average, bytes, right axis)
Description: Tracks metric changes over time.
Purpose: Helps analyze trends and plan scaling (e.g., rising DB connections, decreasing free storage).

4. Dashboard Settings

Region: ap-northeast-2 (Seoul)
Period: 300 seconds (5 minutes)
Widgets: timeSeries and singleValue
Custom Labels: hero-beans-ec2, hero-beans-db for readability

🔔 Log-Based Alarm for Error Tracking

A log-based metric filter was created to monitor error entries in application logs. This filter feeds into a CloudWatch Alarm that checks the number of error-level messages over a rolling window.

The alarm is configured to trigger if more than 10 errors are logged within any 15-minute period. When triggered, it sends an email notification to a pre-configured recipient via Amazon SNS.

This mechanism provides an automated way to detect and react to application issues in near real-time, enhancing operational awareness and response time.

Security & Reliability

These metrics help detect:

Bugs under load,
Traffic anomalies or abuse (e.g., bots hitting /favicon.ico every few seconds),
Set up alerts before incidents reach users.