Configuration Guide - fleXRPL/datadog-monitor-deployer GitHub Wiki
Configuration Guide
This guide provides detailed information about configuring monitors using the DataDog Monitor Deployer.
Configuration File Format
Monitor configurations can be defined in either YAML or JSON format. YAML is recommended for better readability.
Basic Structure
monitors:
- name: "Monitor Name"
type: "monitor_type"
query: "monitor_query"
message: "alert_message"
tags: []
options: {}
Monitor Properties
Required Fields
Field | Type | Description |
---|---|---|
name |
string | Display name of the monitor |
type |
string | Type of monitor (e.g., "metric alert", "log alert") |
query |
string | Monitor query/condition |
message |
string | Alert notification message |
Optional Fields
Field | Type | Description |
---|---|---|
tags |
array | List of tags for categorization |
priority |
integer | Alert priority (1-5) |
restricted_roles |
array | Roles with access to the monitor |
options |
object | Additional monitor options |
Monitor Types
Supported Types
metric alert
- Threshold alerts on metricsservice check
- Status-based monitoringevent alert
- Event-based monitoringquery alert
- Complex query monitoringcomposite
- Combined monitor conditionslog alert
- Log-based monitoringprocess alert
- Process monitoringtrace-analytics alert
- APM monitoringslo alert
- SLO monitoringevent-v2 alert
- Enhanced event monitoringaudit alert
- Audit log monitoringrum alert
- Real user monitoringci-pipelines alert
- CI pipeline monitoringerror-tracking alert
- Error tracking
Options Configuration
Common Options
options:
notify_no_data: true
no_data_timeframe: 10
notify_audit: false
timeout_h: 0
evaluation_delay: 900
new_host_delay: 300
include_tags: true
require_full_window: false
renotify_interval: 60
Thresholds Configuration
options:
thresholds:
critical: 90
warning: 80
ok: 70
critical_recovery: 85
warning_recovery: 75
Notification Configuration
options:
notification_preset_name: "custom"
notification_targets:
- type: "slack"
channel: "#alerts"
- type: "email"
address: "[email protected]"
- type: "pagerduty"
service_key: "key123"
Template System
Basic Template
template:
defaults:
tags:
- "team:platform"
- "env:production"
options:
notify_no_data: true
evaluation_delay: 900
monitors:
- template: base
name: "CPU Alert"
type: "metric alert"
query: "avg(last_5m):avg:system.cpu.user{*} > 80"
Variable Substitution
template:
variables:
threshold: 80
service: "web"
team: "platform"
monitors:
- name: "{{ service }} CPU Usage"
type: "metric alert"
query: "avg(last_5m):avg:system.cpu.user{service:{{ service }}} > {{ threshold }}"
tags:
- "team:{{ team }}"
Advanced Configuration
Composite Monitors
monitors:
- name: "Service Health"
type: "composite"
query: "12345 && 67890"
message: "Multiple conditions met"
options:
notify_no_data: false
Scheduled Downtime
downtime:
scope: "env:production"
start: "2024-03-01T00:00:00Z"
end: "2024-03-02T00:00:00Z"
message: "Scheduled maintenance"
Monitor Groups
groups:
infrastructure:
monitors:
- name: "CPU Alert"
type: "metric alert"
query: "avg(last_5m):avg:system.cpu.user{*} > 80"
- name: "Memory Alert"
type: "metric alert"
query: "avg(last_5m):avg:system.mem.used{*} > 90"
Environment-Specific Configuration
Using Environment Variables
monitors:
- name: "${SERVICE_NAME} Alert"
type: "metric alert"
query: "avg(last_5m):avg:system.cpu.user{service:${SERVICE_NAME}} > ${THRESHOLD}"
Environment Overrides
environments:
production:
threshold: 90
notification_channel: "#prod-alerts"
staging:
threshold: 80
notification_channel: "#staging-alerts"
Validation
Schema Validation
The tool validates configurations against a JSON schema that ensures:
- Required fields are present
- Field types are correct
- Values are within allowed ranges
- Enum values are valid
Query Validation
Queries are validated for:
- Syntax correctness
- Metric existence
- Tag validity
- Function support
Best Practices
-
Naming Conventions
- Use descriptive names
- Include environment/service
- Be consistent
-
Organization
- Group related monitors
- Use templates for common patterns
- Maintain clear structure
-
Version Control
- Commit configurations
- Use meaningful commits
- Review changes
-
Documentation
- Comment complex queries
- Include runbooks
- Document variables
Additional Resources
- Monitor Types - Examples of different monitor types
- Templating Guide - Advanced templating usage
- Best Practices - Configuration best practices
- DataDog API Documentation