python prometheus - ghdrako/doc_snipets GitHub Wiki
pip install prometheus_client
import http.server
from prometheus_client import start_http_server
class MyHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b"Hello World")
if __name__ == "__main__":
start_http_server(8000)
server = http.server.HTTPServer(('localhost', 8001), MyHandler)
server.serve_forever()
prometheus.yml to scrape http://localhost:8000/metrics
global:
scrape_interval: 10s
scrape_configs:
- job_name: example
static_configs:
- targets:
- localhost:8000
Counter
Counters track either the number or size of events. They are mainly used to track how often a particular code path is executed.
REQUESTS tracks the number of Hello Worlds returned
from prometheus_client import Counter
REQUESTS = Counter('hello_worlds_total','Hello Worlds requested.')
class MyHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
REQUESTS.inc()
self.send_response(200)
self.end_headers()
self.wfile.write(b"Hello World")
Metrics are automatically registered with the client library in the** default registry**. A registry is a place where metrics are registered, to be exposed. The default registry is the registry used by default when querying /metrics
.
rate(hello_worlds_total[1m])
Counting Exceptions
import random
from prometheus_client import Counter
REQUESTS = Counter('hello_worlds_total','Hello Worlds requested.')
EXCEPTIONS = Counter('hello_world_exceptions_total','Exceptions serving Hello World.')
class MyHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
REQUESTS.inc()
with EXCEPTIONS.count_exceptions():
if random.random() < 0.2:
raise Exception
self.send_response(200)
self.end_headers()
self.wfile.write(b"Hello World")
rate(hello_world_exceptions_total[1m])
The number of exceptions isn’t that useful without knowing how many requests are going through. You can calculate the more useful ratio of exceptions with:
rate(hello_world_exceptions_total[1m]) / rate(hello_worlds_total[1m])
You may notice gaps in the exception ratio graph for periods when there are no requests. This is because you are dividing by zero,which in floating-point math results in a NaN, or Not a Number. Returning a zero would be incorrect as the exception ratio is not zero, it is undefined.
You can also use count_exceptions as a function decorator:
EXCEPTIONS = Counter('hello_world_exceptions_total','Exceptions serving Hello World.')
class MyHandler(http.server.BaseHTTPRequestHandler):
@EXCEPTIONS.count_exceptions()
def do_GET(self):
...
Counting Size
Prometheus uses 64-bit floating-point numbers for values so you are not limited to incrementing counters by one. You can in fact increment counters by any non- negative number.
import random
from prometheus_client import Counter
REQUESTS = Counter('hello_worlds_total','Hello Worlds requested.')
SALES = Counter('hello_world_sales_euro_total','Euros made serving Hello World.')
class MyHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
REQUESTS.inc()
euros = random.random()
SALES.inc(euros)
self.send_response(200)
self.end_headers()
self.wfile.write("Hello World for {} euros.".format(euros).encode())
The Gauge
Gauges are a snapshot of some current state. While for counters how fast it is increasing is what you care about, for gauges it is the actual value of the gauge.
Gauges have three main methods you can use: inc, dec, and set
. Similar to the
methods on counters, inc and dec
default to changing a gauge’s value by one.
import time
from prometheus_client import Gauge
INPROGRESS = Gauge('hello_worlds_inprogress','Number of Hello Worlds in progress.')
LAST = Gauge('hello_world_last_time_seconds','The last time a Hello World was served.')
class MyHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
INPROGRESS.inc()
self.send_response(200)
self.end_headers()
self.wfile.write(b"Hello World")
LAST.set(time.time())
INPROGRESS.dec()
The main use case for such a metric is detecting if it has been too long since a request was handled. The PromQL expression
time() - hello_world_last_time_seconds
will tell you how many seconds it is since the last request.
These are both very common use cases, so utility functions are also provided for
the track_inprogress
has the advantage of being both shorter and taking care of correctly handling exceptions for you.
set_to_current_time
is a little less useful in Python, as time.time()
returns Unix time,
in seconds;5 but in other languages’ client libraries, the set_to_current_time equivalents make usage simpler and clearer.
from prometheus_client import Gauge
INPROGRESS = Gauge('hello_worlds_inprogress','Number of Hello Worlds in progress.')
LAST = Gauge('hello_world_last_time_seconds','The last time a Hello World was served.')
class MyHandler(http.server.BaseHTTPRequestHandler):
@INPROGRESS.track_inprogress()
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b"Hello World")
LAST.set_to_current_time()
It is also strongly recommended that you include the unit of your metric at the end
of its name. For example, a counter for bytes processed might be myapp_requests_processed_bytes_total
.
Callbacks
In Python, gauges have a set_function method, which allows you to specify a function to be called at exposition time. Your function must return a floating-point value for the metric when called
import time
from prometheus_client import Gauge
TIME = Gauge('time_seconds',
'The current time.')
TIME.set_function(lambda: time.time())
import http.server
from prometheus_client import Counter, start_http_server
http_requests = Counter('my_app_http_request','Description: Num of HTTP request')
class SampleServer(http.server.BaseHTTPRequestHandler):
def do_GET(self):
http_requests.inc()
self.send_response(200)
self.end_headers()
self.wfile.write(b"Simple Counter Example")
if __name__ == "__main__":
start_http_server(5555)
server = http.server.HTTPServer(('localhost', 5551), SampleServer)
server.serve_forever()
Run this code block as python counter.py
in your terminal and it will start a simple HTTP server.
To ingest these metrics into Prometheus, you need to again configure the scrape_configs file in the prometheus.yml file as follows:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: node
static_configs:
- targets:
- localhost:9100
- job_name: my_application
static_configs:
- targets:
- localhost:5555
You will be able to access the metric through the Expression browser via the PromQL
expression rate(my_app_http_request_total[1m])
, once you restart Prometheus.
The counter will increase every time you hit the application URL at http://localhost:5551
.
Gauge
from prometheus_client import Gauge
sample_gauge_1 = Gauge('my_increment_example_requests', 'Description of increment gauge')
sample_gauge_2 = Gauge('my_decrement_example_requests', 'Description of decrement gauge')
sample_gauge_3 = Gauge('my_set_example_requests', 'Description of set gauge')
sample_gauge_1.inc() # This will increment by 1
sample_gauge_2.dec(10) # This will decrement by given value of 10
sample_gauge_3.set(48) # This will set to the given value of 48
inc()
method will increment the value for 'my_incre
ment_example_requests' by 1, decrement the value for 'my_decrement_exam
ple_requests' by 10, and set the value for 'my_set_example_requests' to 48.
Sumary
import http.server
import time
from prometheus_client import Summary, start_http_server
LATENCY = Summary('latency_in_seconds','Time for a request')
class SampleServer(http.server.BaseHTTPRequestHandler):
def do_GET(self):
start_time = time.time()
self.send_response(200)
self.end_headers()
self.wfile.write(b"My application with a Summary metric")
LATENCY.observe(time.time() - start_time)
if __name__ == "__main__":
start_http_server(5555)
server = http.server.HTTPServer(('localhost', 5551), SampleServer)
server.serve_forever()
The metric endpoint at http://localhost:5555/metrics
will have two time series: latency_in_seconds_count
and latency_in_seconds_sum
. The former represents the number of observe calls that were made and the latter is the sum of the values passed to observe. If you divide the rate of these two metrics rate(latency_in_seconds_count[1m])/rate(latency_in_seconds_sum[1m])
—you will get the average latency for the last minute.
Histograms
Histograms help you track the size and number of events in buckets while allowing you to aggregate calculations of quantiles. Histograms can be used to measure request durations for a specific HTTP request call. Code instrumentation for histograms also uses the observe method, and you can combine it with time to track latency.
The following code adds 10 seconds as a request latency:
from prometheus_client import Histogram
req_latency = Histogram('request_latency_seconds', 'Description of histogram')
req_latency.observe(10) # Observe 10 (seconds in this case)
Instrumentation is usually applied at the service level or to the client libraries that are being used in the services. It’s important to note that there are limits to the level of instrumentation that you can/should add, even though Prometheus is efficient in handling multiple metrics.