DataDog monitoring - department-of-veterans-affairs/abd-vro GitHub Wiki

How to get access

Please be informed that the DOTS team is no longer responsible for Datadog and will not be resolving Datadog requests. How to Submit a ServiceNow Ticket for DataDog Requests: The steps below show how to submit new monitoring requests, or, updates to existing monitoring (requests to the ECC group. This group is now responsible for ALL Datadog issues such as adding or modifying user roles Enterprise monitoring, dashboards, or alerts for an Application Service, or to add or remove services, network devices, or administering other equipment to/from monitoring tasks. Go to the ServiceNow Portal at ECC (Enterprise Command Center) Monitoring Services - your IT Service Portal (va.gov) Complete the HelpDesk form. Use the drop down to select the application option where available Follow the remaining steps as applicable You can attach any supporting documents such as a list of servers using the “paperclip” in the lower right-hand corner of the form. Someone on the Monitoring Design Team will contact you to discuss your requirements. To check the status of your ticket/request, please go to https://yourit.va.gov/va and click on your name in the upper right corner of the screen.

DataDog

Log in using LHDI's Okta sign-in page is https://ablevets-dots-va.okta.com/

Our deprecated DataDog account:

Custom Metrics

We've received conflicting feedback regarding use of custom metrics and Datadog's REST API. Please see the clarified use case info below, knowing that yes, the REST API is available for judicious use, provided awareness:

  1. Get the Datadog API and APP Key Environment Variables:
  • VRO Tenants are encouraged to use the shared global helm template that has been populated in each LHDI deployment of VRO. To reference this shared gloabl template: _datadog.tpl in your project's helm, you would add the following in the "env" section of your deployment.yaml: {{- include "vro.datadog.envVars" . | nindent 12 }}
  • use EP Merge app as an example
  • Please note that the environment variables as expected by the Datadog Python SDK are as follows:
  1. For more relevant documentation and additional API example code, please access the following docs:
Example call:
## Dynamic Points
# Post time-series data that can be graphed on Datadog's dashboards.
# Curl command
curl -X POST "https://api.ddog-gov.com/api/v2/series" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: ${DD_API_KEY}" \
-d @- << EOF
{
  "series": [
    {
      "metric": "system.load.1",
      "type": 0,
      "points": [
        {
          "timestamp": 1703868203,
          "value": 0.6
        }
      ],
      "resources": [
        {
          "name": "dummyhost",
          "type": "host"
        }
      ]
    }
  ]
}
EOF

Be Mindful - When Using Datadog Custom Metrics

If used incorrectly, custom metrics can become prohibitively expensive in Datadog.

The main issue is when custom metrics are combined with highly variable tags (such as an ICN), which can greatly increase the cost. This is because we are charged for the all the metrics and tags combinations used during a billing period. For example, if we had a single failure metric but tagged with ICN, and there were failures in an a month for 1000 different users, we would be charged for 1000 metric/tag combinations. So, in general we just need to be mindful to not add unnecessary tags to any metrics we create.

Postgres RDS Metrics

LHDI now supports RDS metrics for Postgres Once enabled you can see Postgres metrics in Datadog using the metrics explorer.