Monitoring using graphite - datacratic/rtbkit GitHub Wiki
In this document we will cover the basics of monitoring RTBkit using Graphite and Zabbix. Graphite allows us to navigate and comb through the hundreds of metrics that are generated every seconds by the RTBkit stack. Those metrics help us understand what is going on under the hood and, with a good understanding of what they represent, they can tell us whether the stack is healthy or not. While Graphite is great at visually representing those values, it falls short when it is time to trigger events or alert on certain conditions. This is where we pair it with Zabbix, an open source software for monitoring infrastructure and applications.
At first glance the sheer amount of data available in Graphite might be overwhelming. The metrics are displayed in Graphite in a way similar to a folder tree but, in reality, they are dot separated keys. The keys have been named in a logical manner but knowing exactly what the values mean will usually require a trip to the source code.
Here is some basic keys that will help you know if you are bidding, winning, seeing impressions and clicks:
Key | Description |
---|---|
router.bid | Total number of Bids sent per seconds |
postAuction.bidResult.WIN.messagesReceived | Total number of Wins received per seconds |
postAuction.delivery.IMPRESSION.delivered | Total number of Impressions received per seconds |
postAuction.delivery.CLICK.delivered | Total number of Clicks received per seconds |
A more detailed list is available here
Zabbix offers a broad range of checks that are useful for monitoring the health of a system but doesn't support Graphite out of the box.
Meant to be used as an example, the following script can be run in a cron and will take care of linking the two systems. It starts by retrieving the list of keys monitored in Zabbix and extracts the ones that match the pattern "graphite[(.)]". The next step is to query graphite for the value in (.) and it finally pushes the result back to Zabbix via the trap system.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
import numpy
import base64
import socket
import urllib2
from zabbix_api import ZabbixAPI
config = {
'zabbix': {
'tz': "UTC",
'host': "127.0.0.1",
'api': {
'protocol': "http",
'username': "api",
'password': "!!!my-zabbix-password!!!",
'rpcport': 10051,
'path': "",
},
},
'graphite': {
'url': "https://!!!my-graphite-server!!!/render?from=-5minutes&rawData=true&target=",
},
}
class Re(object):
def __init__(self):
self.lastMatch = None
def match(self,pattern,text,opt=re.M|re.I):
self.lastMatch = re.match(pattern,text,opt)
return self.lastMatch
def search(self,pattern,text):
self.lastMatch = re.search(pattern,text)
return self.lastMatch
class ZabbixSender():
def Send(self, host, key, value):
request = """<req>
<host>%s</host>
<key>%s</key>
<data>%s</data>
</req>""" % (base64.b64encode(host),
base64.b64encode(key),
base64.b64encode(str(value)))
return self.TcpSend(config['zabbix']['host'],
config['zabbix']['api']['rpcport'],
request)
def TcpSend(self, host, port, message):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
sock.send(message)
reply = sock.recv(16384) # Set a limit at 16k
sock.close()
return reply
if __name__ == "__main__":
regex = Re()
zabSender = ZabbixSender()
# Connect to Zabbix
zabApi = ZabbixAPI(server = "%s://%s" % (config['zabbix']['api']['protocol'],
config['zabbix']['host']),
path = config['zabbix']['api']['path'])
zabApi.login(config['zabbix']['api']['username'],
config['zabbix']['api']['password'])
# Query Zabbix to retrieve its list of hosts
zabHosts = zabApi.host.get({'output': "extend"})
# Query Zabbix to retrieve its list of items
zabItems = zabApi.item.get({'output': "extend"})
# Iterate through the items and find the ones we defined as "graphite[.*]"
for item in zabItems:
if regex.match(r"^graphite\[(.*)\]$", item['key_']):
host = next(d for (i, d) in enumerate(zabHosts) if d['hostid'] == item['hostid'])['host']
key = item['key_']
# Request the corresponding data from Graphite
http = urllib2.urlopen(config['graphite']['url'] + regex.lastMatch.group(1))
lines = http.read().strip().split("\n")
http.close()
value = 0.0
for line in lines:
samples = line.split("|")[1].split(",")
value += sum(map(float, samples)) / len(samples)
# Average when the key contains stars
if key.find("*") > -1 and len(lines) > 0:
value /= len(lines)
# Send the value back to Zabbix as a trap message
print "Sending [value:%s] for [host:%s][key:%s]" % (value, host, key)
zabSender.Send(host, key, value)
Here is an example of how a check on the key router.bid
could be implemented in Zabbix:
-
Define an Item that will gather data on our metric:
-
Define a Trigger that will alert on certain conditions (like value < 1):
-
Launch the script outlined in the "Zabbix" section.
Additional checks can then be added by creating more of the graphite[my.key.here] items.