python prometheus - ghdrako/doc_snipets GitHub Wiki

pip install prometheus_client
import http.server
from prometheus_client import start_http_server
class MyHandler(http.server.BaseHTTPRequestHandler):
  def do_GET(self):
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b"Hello World")
if __name__ == "__main__":
  start_http_server(8000)
  server = http.server.HTTPServer(('localhost', 8001), MyHandler)
  server.serve_forever()

prometheus.yml to scrape http://localhost:8000/metrics

global:
  scrape_interval: 10s
scrape_configs:
  - job_name: example
    static_configs:
    - targets:
      - localhost:8000

Counter

Counters track either the number or size of events. They are mainly used to track how often a particular code path is executed.

REQUESTS tracks the number of Hello Worlds returned

from prometheus_client import Counter
REQUESTS = Counter('hello_worlds_total','Hello Worlds requested.')
class MyHandler(http.server.BaseHTTPRequestHandler):
  def do_GET(self):
    REQUESTS.inc()
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b"Hello World")

Metrics are automatically registered with the client library in the** default registry**. A registry is a place where metrics are registered, to be exposed. The default registry is the registry used by default when querying /metrics.

rate(hello_worlds_total[1m])

Counting Exceptions

import random
from prometheus_client import Counter
REQUESTS = Counter('hello_worlds_total','Hello Worlds requested.')
EXCEPTIONS = Counter('hello_world_exceptions_total','Exceptions serving Hello World.')
class MyHandler(http.server.BaseHTTPRequestHandler):
  def do_GET(self):
    REQUESTS.inc()
    with EXCEPTIONS.count_exceptions():
      if random.random() < 0.2:
        raise Exception
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b"Hello World")
 rate(hello_world_exceptions_total[1m])

The number of exceptions isn’t that useful without knowing how many requests are going through. You can calculate the more useful ratio of exceptions with:

rate(hello_world_exceptions_total[1m]) / rate(hello_worlds_total[1m])

You may notice gaps in the exception ratio graph for periods when there are no requests. This is because you are dividing by zero,which in floating-point math results in a NaN, or Not a Number. Returning a zero would be incorrect as the exception ratio is not zero, it is undefined.

You can also use count_exceptions as a function decorator:

EXCEPTIONS = Counter('hello_world_exceptions_total','Exceptions serving Hello World.')
class MyHandler(http.server.BaseHTTPRequestHandler):
  @EXCEPTIONS.count_exceptions()
  def do_GET(self):
    ...

Counting Size

Prometheus uses 64-bit floating-point numbers for values so you are not limited to incrementing counters by one. You can in fact increment counters by any non- negative number.

import random
from prometheus_client import Counter
REQUESTS = Counter('hello_worlds_total','Hello Worlds requested.')
SALES = Counter('hello_world_sales_euro_total','Euros made serving Hello World.')
class MyHandler(http.server.BaseHTTPRequestHandler):
  def do_GET(self):
    REQUESTS.inc()
    euros = random.random()
    SALES.inc(euros)
    self.send_response(200)
    self.end_headers()
    self.wfile.write("Hello World for {} euros.".format(euros).encode())

The Gauge

Gauges are a snapshot of some current state. While for counters how fast it is increasing is what you care about, for gauges it is the actual value of the gauge.

Gauges have three main methods you can use: inc, dec, and set. Similar to the methods on counters, inc and dec default to changing a gauge’s value by one.

import time
from prometheus_client import Gauge 
INPROGRESS = Gauge('hello_worlds_inprogress','Number of Hello Worlds in progress.')
LAST = Gauge('hello_world_last_time_seconds','The last time a Hello World was served.')
class MyHandler(http.server.BaseHTTPRequestHandler):
  def do_GET(self):
    INPROGRESS.inc()
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b"Hello World")
    LAST.set(time.time())
    INPROGRESS.dec()

The main use case for such a metric is detecting if it has been too long since a request was handled. The PromQL expression time() - hello_world_last_time_seconds will tell you how many seconds it is since the last request. These are both very common use cases, so utility functions are also provided for the track_inprogress has the advantage of being both shorter and taking care of correctly handling exceptions for you. set_to_current_time is a little less useful in Python, as time.time() returns Unix time, in seconds;5 but in other languages’ client libraries, the set_to_current_time equivalents make usage simpler and clearer.

from prometheus_client import Gauge
INPROGRESS = Gauge('hello_worlds_inprogress','Number of Hello Worlds in progress.')
LAST = Gauge('hello_world_last_time_seconds','The last time a Hello World was served.')
class MyHandler(http.server.BaseHTTPRequestHandler):
  @INPROGRESS.track_inprogress()
  def do_GET(self):
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b"Hello World")
    LAST.set_to_current_time()

It is also strongly recommended that you include the unit of your metric at the end of its name. For example, a counter for bytes processed might be myapp_requests_processed_bytes_total.

Callbacks

In Python, gauges have a set_function method, which allows you to specify a function to be called at exposition time. Your function must return a floating-point value for the metric when called

import time
from prometheus_client import Gauge
TIME = Gauge('time_seconds',
'The current time.')
TIME.set_function(lambda: time.time())
import http.server
from prometheus_client import Counter, start_http_server
http_requests = Counter('my_app_http_request','Description: Num of HTTP request')
class SampleServer(http.server.BaseHTTPRequestHandler):
  def do_GET(self):
    http_requests.inc()
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b"Simple Counter Example")
if __name__ == "__main__":
  start_http_server(5555)
  server = http.server.HTTPServer(('localhost', 5551), SampleServer)
  server.serve_forever()

Run this code block as python counter.py in your terminal and it will start a simple HTTP server.

To ingest these metrics into Prometheus, you need to again configure the scrape_configs file in the prometheus.yml file as follows:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: node
    static_configs:
    - targets:
      - localhost:9100
  - job_name: my_application
    static_configs:
    - targets:
      - localhost:5555

You will be able to access the metric through the Expression browser via the PromQL expression rate(my_app_http_request_total[1m]), once you restart Prometheus. The counter will increase every time you hit the application URL at http://localhost:5551.

Gauge

from prometheus_client import Gauge
sample_gauge_1 = Gauge('my_increment_example_requests', 'Description of increment gauge')
sample_gauge_2 = Gauge('my_decrement_example_requests', 'Description of decrement gauge')
sample_gauge_3 = Gauge('my_set_example_requests', 'Description of set gauge')
sample_gauge_1.inc() # This will increment by 1
sample_gauge_2.dec(10) # This will decrement by given value of 10
sample_gauge_3.set(48) # This will set to the given value of 48

inc() method will increment the value for 'my_incre ment_example_requests' by 1, decrement the value for 'my_decrement_exam ple_requests' by 10, and set the value for 'my_set_example_requests' to 48.

Sumary

import http.server
import time
from prometheus_client import Summary, start_http_server
LATENCY = Summary('latency_in_seconds','Time for a request')
class SampleServer(http.server.BaseHTTPRequestHandler):
  def do_GET(self):
    start_time = time.time()
    self.send_response(200)
    self.end_headers()
    self.wfile.write(b"My application with a Summary metric")
    LATENCY.observe(time.time() - start_time)
if __name__ == "__main__":
  start_http_server(5555)
  server = http.server.HTTPServer(('localhost', 5551), SampleServer)
  server.serve_forever()

The metric endpoint at http://localhost:5555/metrics will have two time series: latency_in_seconds_count and latency_in_seconds_sum. The former represents the number of observe calls that were made and the latter is the sum of the values passed to observe. If you divide the rate of these two metrics rate(latency_in_seconds_count[1m])/rate(latency_in_seconds_sum[1m]) —you will get the average latency for the last minute.

Histograms

Histograms help you track the size and number of events in buckets while allowing you to aggregate calculations of quantiles. Histograms can be used to measure request durations for a specific HTTP request call. Code instrumentation for histograms also uses the observe method, and you can combine it with time to track latency.

The following code adds 10 seconds as a request latency:

from prometheus_client import Histogram
req_latency = Histogram('request_latency_seconds', 'Description of histogram')
req_latency.observe(10) # Observe 10 (seconds in this case)

Instrumentation is usually applied at the service level or to the client libraries that are being used in the services. It’s important to note that there are limits to the level of instrumentation that you can/should add, even though Prometheus is efficient in handling multiple metrics.