Monitoring alarms - coopdevs/handbook GitHub Wiki


:warning: ATENCIÓ
Aquest handbook està descontinuat. La nova adreça és: https://handbook.coopdevs.org/ca/home

This is a proposal for monitoring alarms at Coopdevs.

Disk

Thresholds

We will define two thresholds: warning and critical. The time it takes between receiving a notification for one and the other will indicate the rate at which the disk is filling up. If the difference is just 5min, you better hurry to fix the issue.

Severity Threshold
Warning Disk use exceeds 70% of total available
Critical Disk use exceeds 85% of total available

Absolute limit

Apart from that, while the thresholds above are defined in relative terms, we suggest we add an absolute one to ensure no matter what there's always enough free disk to solve the root cause of the issue.

Said absolute limit could be 2 GB.

Memory

Thresholds

Severity Threshold
Warning Memory use exceeds 70% of total available during 5 minutes
Critical Memory use exceeds 85% of total available during 5 minutes

CPU

Thresholds

Severity Threshold
Warning CPU use exceeds 70% of total available during 5 minutes
Critical CPU use exceeds 85% of total available during 5 minutes

Absolute limit

100% during 3 minutes

Per-project settings

We propose to go with this proposal for all projects and adapt per-project only once we have enough monitoring data to do so.