Getting Started - fleXRPL/datadog-monitor-deployer GitHub Wiki
This guide will help you get started with the DataDog Monitor Deployer tool, covering basic concepts and initial setup.
Monitor as Code is an approach where monitoring configurations are defined in version-controlled files rather than through a UI. Benefits include:
- Version control
- Reproducibility
- Automation
- Consistency
- Code review process
- Disaster recovery
DataDog Monitor Deployer supports all monitor types:
- Metric monitors
- Log monitors
- APM monitors
- Process monitors
- Network monitors
- Event monitors
- Custom monitors
pip install datadog-monitor-deployer
Set up your DataDog credentials:
export DD_API_KEY='your-api-key'
export DD_APP_KEY='your-app-key'
mkdir my-monitors
cd my-monitors
Recommended project structure:
my-monitors/
├── monitors/
│ ├── infrastructure/
│ │ ├── cpu.yaml
│ │ └── memory.yaml
│ ├── application/
│ │ ├── errors.yaml
│ │ └── latency.yaml
│ └── business/
│ └── transactions.yaml
├── templates/
│ └── common.yaml
└── README.md
The dd-monitor
command provides several operations:
# List all monitors
dd-monitor list
# Deploy monitors
dd-monitor deploy monitors/infrastructure/cpu.yaml
# Validate configuration
dd-monitor validate monitors/infrastructure/cpu.yaml
# Delete a monitor
dd-monitor delete <monitor-id>
Basic monitor configuration:
monitors:
- name: "Service CPU Usage"
type: "metric alert"
query: "avg(last_5m):avg:system.cpu.user{service:myapp} > 80"
message: |
Service is experiencing high CPU usage.
{{#is_alert}}
Current value: {{value}}
{{/is_alert}}
tags:
- "team:platform"
- "env:production"
monitors:
- name: "{{ service }} CPU Usage"
type: "metric alert"
query: "avg(last_5m):avg:system.cpu.user{service:{{ service }}} > {{ threshold }}"
variables:
service: "myapp"
threshold: 80
# templates/base.yaml
template:
defaults:
tags:
- "team:platform"
- "env:production"
options:
notify_no_data: true
evaluation_delay: 900
# monitors/cpu.yaml
monitors:
- template: base
name: "CPU Alert"
type: "metric alert"
query: "avg(last_5m):avg:system.cpu.user{*} > 80"
-
Organization
- Use consistent naming conventions
- Group related monitors
- Use templates for common patterns
-
Version Control
- Commit monitor configurations to Git
- Use branches for changes
- Review changes through PRs
-
Automation
- Integrate with CI/CD pipelines
- Automate validation
- Use automated testing
-
Documentation
- Document monitor purpose
- Include runbooks
- Maintain change history
- Create configuration file
- Validate configuration
- Test in development
- Deploy to production
- Verify in DataDog UI
- Modify configuration
- Validate changes
- Review differences
- Deploy updates
- Verify changes
- Learn about different monitor types
- Create your First Monitor
- Explore advanced templating
- Review best practices