On duty alerting - coopdevs/handbook GitHub Wiki
:warning: ATENCIÓ |
---|
Aquest handbook està descontinuat. La nova adreça és: https://handbook.coopdevs.org/ca/home |
The on duty alerting is configured in Grafana.
Dashboards and alerts
The dashboard that has all the alerts is: Alerting Dashboard consolidated (https://coopdevs.grafana.net/d/I36Xk2Pnk/alerting-dashboard-consolidated?orgId=1) The graphs and alerts configured are:
- Http Status Code
- Alert: status is outside range 100 and 399)
- TLS Expiration (Soft).
- Alert:Certificate expires in less than 2 weeks, there is a rule that avoids to send the alert on weekend
- TLS Expiration (Hard).
- Alert: Certificate expires in less than 4 days
- Disc occupation %
- Alert: Disc occupation is higher than 85%
- Disc occupation (free space)
- Alert: There are less than 2 GB of free space
- Memory availability (%)
- Alert: No alert has been configured yet
- CPU availability (%)
- Alert: CPU availability less than 15% in 5 minutes
All alerts are sent to Zulip channel.
Stopping and activating alerts
Buy default, every graph has inside all the instances that have to be monitored. If you want to stop an alert for one instance you have to modify the chart and add the instance in the Metrics browser formula. When you want to reactivate the alert you have to remove the instance from the Metrics browser formula.
Be careful, If you stop an alert from the general alert menu you will stop the alert for all instances.
Som Connexió
We have another dashboard for Som Connexió : Alerting Dashboard https://coopdevs.grafana.net/d/QIeyRTyWk/alerting-dashboard?orgId=1 Because Som Connexió alerts are sending also to Telegram group.