Sysdig - bcgov/common-service-showcase GitHub Wiki
This page gives a brief overview of our use of Sysdig
- BCDevOps Sysdig Monitor Service: https://app.sysdigcloud.com/api/oauth/openid/bcdevops
- Sysdig documentation: https://docs.sysdig.com/en/docs/sysdig-monitor/
In the Sysdig UI, resources are available within the scope of a Team. We have one team per project. switch Team scope (from lower left-hand link) to see resourses for that project. After logging in to sysdig (using your Azure gov id), you need to switch to the team context in which the project exists. see Sysdig docs for more details
An OpenShift SysdigTeam operator must be configured in the *-tools namespaces in OpenShift to enable access. BC Gov platform team have a guide for [configuring access](https://docs.developer.gov.bc.ca/sysdig-monitor-setup-team/ https://app.sysdigcloud.com/#/settings/teams).
Sysdig team for each project:
Project | Sysdig Team |
---|---|
CDOGS | 2250c5-team |
CHES | b160aa-team |
COMS | bb17f9-team |
BCBox | e7679d-team |
PCNS | d9d78e-team |
CHESS | 10d873-team |
DGRSC | bb0279-team |
CAVMS | 8035d1-team |
There's a REST api for manging our Sysdig configuration. Sysdig API spec: https://app.sysdigcloud.com/api/public/docs/index.html
To authenticate, get token from sysdig by going to user settings (scoped to current team): and add a Authorization: Bearer <token>
header to the api requests.
For example, to list dashboards:
GET https://app.sysdigcloud.com/api/v3/dashboards/list HTTP/1.1
Authorization: Bearer abcdefg-a574-4359-b6c9-fa3e2dc30acb
Accept: application/json
All Sysdig resources have been saved to our wiki
There are a library of pre-built dashboards for our projects. These show a range of data including details of our deployment, http, application and much more. We also have our own dashbaords for our most useful metrics. After logging in to sysdig (using your Azure gov id), you need to switch to the team context in which the project exists. see: [Access] information above. You can find the dashbaord.
Note: Sysdig agents collect 1-second samples and report data at a 10-second resolution. It is the lowest resolution at which Sysdig Monitor stores the data. https://docs.sysdig.com/en/docs/sysdig-monitor/using-monitor/metrics/data-aggregation/
We have various alerts sent to our team's shared inbox (Email) as well as our #monitoring Discord channel (Custom Webhook):
- PVC usage over 85%
- PVC usage over 90%
- patroni workloads ready < 3
- patroni workloads ready < 2
- HTTP
5xx
errors fromapp
containers* - OpenShift container waiting
*This alert only goes to our Discord #monitoring channel. Sysdig does not expose the full access log including Client ID. We should use our fluent-bit > fluentd > discord alerting process where available.
Discord notification body template:
{
"content": "Alert Name: {{@alert_name}}\nSeverity: {{@alert_severity}}\nDescription: {{@alert_description}}\nNamespace: {{@event_labels.kube_namespace_name}}\nEvent Entity: {{@event_entity}}\nMore Details: {{@event_url}}"
}
The following table contains links to our saved Sysdig configuration. Last updated 2023/12/05
Project | Dashboard | Alerts |
---|---|---|
CDOGS | cdogs_dashboard | cdogs_alerts.json |
CHES | ches_dashboard.json | ches_alerts.json |
COMS | coms_dashboard.json | coms_alerts.json |
BCBox | bcbox_dashboard.json | bcbox_alerts.json |
PCNS | pcns_dashboard.json | pcns_alerts.json |
CHESS | chess_dashboard.json | chess_alerts.json |
DGRSC | dgrsc_dashboard.json | dgrsc_alerts.json |
CAVMS | cavms_dashboard.json | cavms_alerts.json |