prometheus PromQL - ghdrako/doc_snipets GitHub Wiki

Data model

Metrics

Every metric has – at a minimum – a name that identifies it. This name can contain letters, digits, underscores, and/or colons. To be valid, it must match the [a-zA-Z_:][a-zA-Z0-9_:]* regex. Additionally, metrics may also include a HELP text and a TYPE field. These are optional but highly recommended to improve usability.

  • The HELP line just provides some arbitrary information on what the metric is intended to represent, which can be helpful to consumers of your Prometheus data when determining if a particular metric is relevant to them.
  • The TYPE line indicates what type of metric your metric is. The type doesn’t affect how the data is stored in Prometheus’s TSDB at all – it’s all the same under the hood.
A full metric that’s been exposed to Prometheus may look like this:
# HELP mastering_prometheus_readers_total Number of readers of thisbook
# TYPE mastering_prometheus_readers_total counter
mastering_prometheus_readers_total 123467890

Every metric has a metric name to identify it and can have one or many key/value pairs called labels.

<metric name>{<label name>=<label value>, ...}

A time series in Prometheus is represented as follows:

<metric_name>[{<label_1="value_1">,<label_N="value_N">}] <datapoint_numerical_value>
  • Metric metric name is nothing more than the value of a special label called "name". So, if you have a metric named "beverages_total", internally, it's represented as "name=beverages_total". Keep in mind that labels surrounded by "" are internal to Prometheus, and any label prefixed with "" is only available in some phases of the metrics collection cycle.

The combination of labels (key/values) and the metric name defines the identity of a time series.

Every metric name in Prometheus must match the following regular expression:

"[a-zA-Z_:][a-zA-Z0-9_:]*"
  • Labels - Labels, or the key/value pairs associated with a certain metric, add dimensionality to the metrics. Using in PromQL

Metric types

  • Counter
  • Gauge
  • Histogram
  • summaries.

Gauge

Metryki typu Gauge reprezentują wartości, które mogą zarówno rosnąć, jak i maleć. Wskazuja biezacy stan.

Gauges are a snapshot of state, and usually when aggregating them you want to take a sum, average, minimum, or maximum. Consider the metric node_filesystem_size_bytes from your Node Exporter, which reports the size of each of your mounted filesystems, and has device, fstype, and mountpoint labels. You can calculate total filesystem size on each machine with:

sum without(device, fstype, mountpoint)
(node_filesystem_size_bytes)

This works as without tells the sum aggregator to sum everything up with the same labels, ignoring those three.

{instance="localhost:9100",job="node"} 32511390720

The metric name is also no longer present, as this is no longer node_filesystem_free_bytes because math has been performed on it.

sum without(device, fstype, mountpoint, instance)
(node_filesystem_size_bytes)

Output

{job="node"} 32511390720
max without(device, fstype, mountpoint)
(node_filesystem_size_bytes)
{instance="localhost:9100",job="node"} 30792601600

Counter

Metryki tego typu reprezentują kumulacyjnie rosnące wartości, które nigdy nie maleją.

  • rate - Oblicza średnią szybkość zmiany danej metryki na sekundę w danym okresie czasu. Jest przydatna do obserwowania tempa wzrostu lub spadku danej metryki w określonym interwale czasowym.Przykład użycia: rate(http_requests_total[5m]) - obliczy średnią liczbę żądań HTTP na sekundę w ciągu ostatnich 5 minut.
  • increase - Oblicza całkowitą zmianę danej metryki w danym okresie czasowym. Zwraca wartość bezwzględną różnicy między początkową a końcową wartością metryki w określonym przedziale czasowym. increase jest użyteczne do określenia całkowitej ilości zmian danej metryki w określonym czasie. Przykład użycia: increase(http_requests_total[5m]) - zwróci całkowitą liczbę żądań HTTP w ciągu ostatnich 5 minut.

Counters track the number or size of events, and the value your applications expose on their /metrics is the total since it started. But that total is of little use to you on its own; what you really want to know is how quickly the counter is increasing over time. This is usually done using the rate function, though the increase and irate functions also operate on counter values.

For example, to calculate the amount of network traffic received per second, you could use:

rate(node_network_receive_bytes_total[5m])

The [5m] says to provide rate with 5 minutes of data, so the returned value will be an average over the last 5 minutes

{device="lo",instance="localhost:9100",job="node"}
1859.389655172414
{device="wlan0",instance="localhost:9100",job="node"}
1314.5034482758622

The values here are not integers, as the 5-minute window rate is looking at does not perfectly align with the samples that Prometheus has scraped. Some estimation is used to fill in the gaps between the data points you have and the boundaries of the range. The output of rate is a gauge, so the same aggregations apply as for gauges. The node_network_receive_bytes_total metric has a device label, so if you aggregate it away you will get the total bytes received per machine per second:

sum without(device)(rate(node_network_receive_bytes_total[5m]))

Running this query will give you a result like:

{instance="localhost:9100",job="node"} 3173.8931034482762

You can filter down which time series to request, so you could only look at eth0 and then aggregate it across all machines by aggregating away the instance label:

sum without(instance)
(rate(node_network_receive_bytes_total{device="eth0"}[5m]))

When you run this query the instance label is gone, but the device label remains as you did not ask for it to be removed:

{device="eth0",job="node"} 3173.8931034482762

Summary

A summary metric will usually contain both a _sum and _count, and sometimes a time series with no suffix with a quantile label. The _sum and _count are both counters.

Histogram

Histogram metrics allow you to track the distribution of the size of events, allowing you to calculate quantiles from them. For example, you can use histograms to calculate the 0.9 quantile (which is also known as the 90th percentile) latency.

Selectors

Filter by labels

Limiting by labels is done using selectors.

For example:

process_resident_memory_bytes{job="node"}
jvm_memory_bytes_used {area="heap"}
jvm_memory_bytes_used {area!="heap"}
jvm_memory_bytes_used {job=~"fed.+"} #  where the “Job” label values begin with fed are selected
jvm_memory_bytes_used {job!~"fed.+"} #  fetches all data except those where the “Job” label value begins with fed
jvm_memory_bytes_used {job=~"f.+|j.+"}
jvm_memory_bytes_used {area=~"heap|nonheap"} # where the “area” label value is either heap or nonheap

is a selector that will return all time series with the name process_resident_ memory_bytes and a job label of node. This particular selector is most properly called an instant vector selector, as it returns the values of the given time series at a given instant. Vector here basically means a one-dimensional list, as a selector can return zero or more time series, and each time series will have one sample. The job="node" is called a matcher, and you can have many matchers in one selector that are ANDed together.

Selector type

  • Label matchers
  • Instant vectors
  • Range vectors

Matchers

Matchers are employed to restrict a query search to a specific set of label values. Four available label matcher operators: =, !=, =~, and !~

Example

matcher operator example
= node_cpu_seconds_total{cpu="0"}
!= node_cpu_seconds_total{cpu!="0"}
=~ {job=~"n.*"}
!= node_cpu_seconds_total{mode!~"(system|user)"}

Operators =~ and !~ are PromQL matchers for this operation and they both accept RE2 type regex syntax.

Prometheus provides a way to negate the regex matcher, by using !~. This matcher excludes results that match the expression and allows all the remaining time series.

Filter by multiple labels

Example multiple matchers

{mode="idle", mode="iowait", mode="irq", mode="nice", mode="softirq", mode="steal", mode="user", mode="system"}
node_filesystem_size_bytes{job="node",mountpoint=~"/run/.*",mountpoint!~"/run/user/.*"}
jvm_memory_bytes_used {instance=~"10.1.150.12:8080",area!~"heap", job=~'j.+'}
jvm_memory_bytes_used {area=~"heap|nonheap", job=~'j.+'} # “job” starts with J and “area” = heap or “area” = nonheap

Internally, the metric name is stored in a label called __name__ so process_resident_ memory_bytes{job="node"} is syntactic sugar for {name="process_resident_memory_bytes",job="node"}.

The selector {} returns an error, which is a safety measure to avoid accidentally returning all the time series inside the Prometheus server as that could be expensive. To be more precise, at least one of the matchers in a selector must not match the empty string. So {foo=""} and {foo=~".*"} will return an error, while {foo="",bar="x"},{foo!=""}, or {foo=~".+"} are permitted.

Instant vectors

Instant vector selectors are named as such because they return a list of samples, relative to the query evaluation time, for the time series that match them. This list is called an instant vector, as it's a result at a given instant. A sample is a data point of a time series, composed of a value and a timestamp. This timestamp, in most cases, reflects the time when the scrape occurred and that value was ingested, with the exception of metrics pushed to the Pushgateway, which, due to their nature, will never have timestamps. However, if functions are applied or operations are performed on the time series, the timestamp for the instant vector samples will reflect the query time and not the ingested time.

The way instant vectors operate – by only returning the most recent samples relative to query time that match the selector - means that Prometheus will not return time series that are considered stale. A stale marker (a special kind of sample that marks that time series as stale) is inserted when either the originating target disappears from the discovery mechanism, or if they are not present in the scrape after the last successful one where they existed. A time series with a stale marker as its last sample will not be returned when using instant vector selectors.

Every example in the Label Matchers section was an instant vector selector, and so every result was an instant vector.

Range vectors

A range vector selector is similar to an instant vector selector, but it returns a set of samples for a given time range, for each time series that matches it.

To define a range vector selector query, you have to set an instant vector selector and append a range using square brackets [ ].

Note the range vector cannot be directly graphed, but can be viewed in the console Range vectors are always used with the rate function, for example:

rate(process_cpu_seconds_total[1m])

DURATIONS

Abbreviation Unit
s Seconds
m Minutes
h Hours
d Days
w Weeks
y Years

Example

jvm_memory_bytes_used [1m]
http_requests_total{code="200"}[2m]
jvm_memory_bytes_used{area="heap", instance="10.1.150.150:30000",job="federate",exported_job="federate"}[1m]

Subqueries

max_over_time(rate(http_requests_total{handler="/health", instance="172.17.0.9:8000"}[5m])[1h:1m])

Component | Description rate(http_requests_total{handler="/health", instance="172.17.0.9:8000"}[5m]) | The inner query to be run, which in this case is aggregating five minutes' worth of data into an instant vector. [1h | Just like a range vector selector, this defines the size of the range relative to the query evaluation time. :1m] | The resolution step to use. If not defined, it defaults to the global evaluation interval max_over_time | The subquery returns a range vector, which is now able to become the argument of this aggregation operation over time.

Subqueries are fairly expensive to evaluate, so it is strongly discouraged to use them for dashboarding, as recording rules would produce the same result given enough time. Similarly, they should not be used in recording rules for the same reason. Subqueries are best suited for exploratory querying, where it is not known in advance which aggregations are needed to be looked at over time.

A subquery is a part of a query that allows you to do a range query within a query. The syntax for a subquery uses square brackets, like range selectors. But it takes two different durations: the range and the resolution. The range is the range returned by the subquery, and the resolution acts as a step:

max_over_time( rate(http_requests_total[5m])[30m:1m] )

The preceding query runs rate(http_requests_total[5m]) every minute (1m) for the last 30 minutes (30m), then feeds the result in a max_over_time() function.

The resolution can be omitted, such as in [30m:]. In this case, the global evaluation interval is used as resolution.

The offset modifier - Select Past/Historical Data

The offset modifier allows you to query data in the past. This means that we can offset the query time of an instant or range vector selector relative to the current time. It is applied on a per-selector basis, which means that offsetting one selector but not another effectively unlocks the ability to compare current behavior with past behavior for each of the matched time series. To use this modifier, we need to specify it right after the selector and add the offset time.

Example

jvm_memory_bytes_used Offset 7d
jvm_memory_bytes_used[1m] Offset 7d
jvm_memory_bytes_used {instance =~ "10.1.150.150.*"}[1m] Offset 7d
http_requests_total{code="200"}[2m] offset 1h

Operators

PromQL allows the use of binary, vector matching, and aggregation operators.

  • Binary operators Example we have the following instant vector:
process_open_fds{instance="172.17.0.10:8000", job="hey-service"} 8
process_open_fds{instance="172.17.0.11:8000", job="hey-service"} 23

To that, we apply a comparison operator such as the following:

process_open_fds{job="hey-service"} > 10

The result will be as follows:

process_open_fds{instance="172.17.0.11:8000", job="hey-service"} 23

This operation shows that we have effectively filtered the results of the instant vector, which is fundamental for alerting

Moreover, we can use the bool modifier to not only return all matched time series but also modify each returned sample to become 1 or 0, depending on whether the sample would be kept or dropped by the comparison operator.

Using the bool modifier is the only way to compare scalars; for example, 42 == bool 42. Therefore, we can apply the same query with the bool modifier to our previous example:

process_open_fds{job="hey-service"} > bool 10

This would return the following:

process_open_fds{instance="172.17.0.10:8000", job="hey-service"} 0
process_open_fds{instance="172.17.0.11:8000", job="hey-service"} 1

Aggregation Operators

Aggregation operators work only on instant vectors, and they also output instant vectors.

operator example description
without sum without(fstype, mountpoint)(node_filesystem_size_bytes) group the time series, ignoring the fstype and mountpoint labels
without sum without()(node_filesystem_size_bytes) valid equivalent to node_filesystem_size_bytes with the only difference being the metric name is removed
by sum by(job, instance, device)(node_filesystem_size_bytes) group the time series using job, instance, device
by sum by()(node_filesystem_size_bytes) equivalent sum(node_filesystem_size_bytes)

Operators

All 11 aggregation operators use the same grouping logic.

operator example
sum sum without(fstype, mountpoint, device)(node_filesystem_size_bytes)
count count without(device)(node_disk_read_bytes_total)
avg przyklad
group przyklad
stddev and stdvar przyklad
min and max max without(device, fstype, mountpoint)(node_filesystem_size_bytes)
topk and bottomk topk without(device, fstype, mountpoint)(2, node_filesystem_size_bytes)
quantile quantile without(cpu)(0.9, rate(node_cpu_seconds_total{mode="system"}[5m]))
count_values przyklad

Where without specifies the labels to remove, by specifies the labels to keep. Accordingly, some care is required when using by to ensure you don’t remove target labels that you would like to propagate in your alerts or use in your dashboards. You cannot use both by and without in the same aggregation. Generally, you should prefer to use without rather than by.

There are two cases where you might find by more useful. The first is that unlike without, by does keep the __name__ label if told explicitly. This allows you to use expressions like:

sort_desc(count by(__name__)({__name__=~".+"}))

to investigate how many time series have the same metric names.

The second is cases where you do want to remove any labels you do not know about. For example, info metrics, are expected to add more labels over time.

Identyfikacja typu metryki

typ slowa kluczowe w nazwie metryki
Counter(licznik) "total", "counter", "requests_total", itp
Gauge (Miernik) Metryki tego typu mogą mieć nazwy wskazujące na bieżący stan. ex current_connections
Histogram ex http_request_duration_seconds_bucket
Summary "sum"

Aby sprawdzić typ metryki w Prometheusie, można użyć kwerendy PromQL, która wyciąga metryki wraz z ich typami. Na przykład:

metric_name

Wynik zwróci metrykę razem z jej typem. Jeśli korzystasz z interfejsu Prometheus, zwykle można to znaleźć w kolumnie "Type" obok nazwy metryki.

PromQL

sum (calculate sum over dimensions)
min (select minimum over dimensions)
max (select maximum over dimensions)
avg (calculate the average over dimensions)
stddev (calculate population standard deviation over dimensions)
stdvar (calculate population standard variance over dimensions)
count (count number of elements in the vector)
count_values (count number of elements with the same value)
bottomk (smallest k elements by sample value)
topk (largest k elements by sample value)
quantile (calculate ?-quantile (0 ? ? ? 1) over dimensions)

Get the total memory in bytes:

node_memory_MemTotal_bytes

Get a sum of the total memory in bytes:

sum(node_memory_MemTotal_bytes)

Get a percentage of total memory used:

((sum(node_memory_MemTotal_bytes) - sum(node_memory_MemFree_bytes) - sum(node_memory_Buffers_bytes) - sum(node_memory_Cached_bytes)) / sum(node_memory_MemTotal_bytes)) * 100

Using a function with your query:

irate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[5m])

Using an operation and a function with your query:

avg(irate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[5m]))

Grouping your queries:

avg(irate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[5m])) by (instance)
http_requests_total
http_requests_total{job="apiserver", handler="/api/comments"}

# Range query for 5min
http_requests_total{job="apiserver", handler="/api/comments"}[5m]

# Pattern matching
http_requests_total{job=~".*server"}
http_requests_total{status!~"4.."}

# Aggregate and group by
sum(rate(http_requests_total[5m])) by (job)
sum by(a,b)(mymetric{field="value")

# Math
(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024

# Top keys
topk(3, sum(rate(instance_cpu_time_ns[5m])) by (app, proc))

# Count and group
count(instance_cpu_time_ns) by (app)

Vector matching

Vector matching, as the name implies, is an operation only available between vectors.

  • One-to-one Since binary operators require two operands, as we described previously, when vectors of the same size and label set are located on each side of one operator, that is, one-to-one, samples with the exact same label/value pairs are matched together, while the metric name and all non-matching elements are dropped. We'll start by using the following instant vectors:
node_filesystem_avail_bytes{instance="172.17.0.13:9100", job="node-exporter-service", mountpoint="/Users"} 100397019136
node_filesystem_avail_bytes{instance="172.17.0.13:9100", job="node-exporter-service", mountpoint="/data"} 14120038400
node_filesystem_size_bytes{instance="172.17.0.13:9100", job="node-exporter-service", mountpoint="/Users"} 250685575168
node_filesystem_size_bytes{instance="172.17.0.13:9100", job="node-exporter-service", mountpoint="/data"} 17293533184

We'll then apply the following operation:

node_filesystem_avail_bytes{} / node_filesystem_size_bytes{} * 100

This will return the resulting instant vector:

{instance="172.17.0.13:9100", job="node-exporter-service", mountpoint="/Users"} 40.0489813060515
{instance="172.17.0.13:9100", job="node-exporter-service", mountpoint="/data"} 81.64923991971679

It might be useful to aggregate vectors with mismatching labels. In those situations, you can apply the ignoring keyword right after the binary operator to ignore the specified labels. Additionally, it is also possible to restrict which labels from both sides should be used in matching by using the on keyword after the binary operator.

  • Many-to-one and one-to-many to perform operations where the element of one side is matched with several elements on the other side of the operation. If the higher cardinality is on the left-hand side of the operation, you can use the group_left modifier after either on or ignoring; if it's on the right-hand side, then group_right should be applied. The group_left operation is commonly used for its ability to copy labels over from the right side of the expression.

Logical operators

node_filesystem_size_bytes and node_filesystem_size_bytes < 200000
node_filesystem_avail_bytes > 200000 or node_filesystem_avail_bytes < 2500000
node_filesystem_avail_bytes unless node_filesystem_avail_bytes < 200000

the unless logical operator will return the elements from the first expression that do not match the label name/value pairs from the second. In set theory, this is called a complement. Practically speaking, this operator works in the opposite way to and, which means it can also be used as an if not statement.

Aggregation operators

  • https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators Operator | Description | Requirements -------- | ----------- | ------------ sum | Sums the elements |   min | Selects the minimum element |   max | Selects the maximum element |   avg | Calculates the average of the elements |   stddev | Calculates the standard deviation of the elements |   stdvar | Calculates the standard variance of the elements |   count | Counts the number of elements |   count_values | Counts the number of elements with the same value |   bottomk | The lower k elements by sample | Requires the number of elements (k) as a scalar topq | The higher k elements by sample value | Requires the number of elements (k) as a scalar quantile | Calculates the quantile of the elements | Requires the quantile (0 ≤ φ ≤ 1) definition as a scalar

The operators that require a parameter (such as count_values, bottomk, topk, and quantile) need to specify it before the vector expression. There are two available modifiers to use in conjunction with aggregation operators that take a list of label names: without allows you to define which labels to aggregate away, effectively dropping those labels from the resulting vector, while by does exactly the opposite; that is, it allows you to specify which labels to keep from being aggregated. Only a single modifier can be used per aggregation operator. These modifiers will influence which dimensions will be aggregated by the operators.

rate - funkcja okresla szybkość jak przyrasta licznik w okineczasowym - srednia na sekunde w danym oknie czasowym. Sosuje sie na couterze bo on increase - jaki jest max w oknie czasowymw oknie czasow

sum(jvm_memory_bytes_used)
sum by (area) (jvm_memory_bytes_used)  # group the data by area
sum by (job) (jvm_memory_bytes_used)
sum by (area, job) (jvm_memory_bytes_used)
topk(2, sum by (area, job) (jvm_memory_bytes_used)) #  find out the top two jobs and area that are consuming the most memory
bottomk(2, sum by (area, job) (jvm_memory_bytes_used)) #  bottom area and two jobs consuming the least memory

Until now, we have aggregated the instant vector, which actually aggregated the single latest timestamped value and did not take into consideration the range of data generated.

Range vectors return a range of all data collected, so the vector cannot be directly used in the aggregation operators. We will first have to use the varied functions offered by PromQL to fetch the most relevant data point from the range. Relevance depends on the characteristics of the data.

avg_over_time(jvm_memory_bytes_used[1m])
TOPK(5, AVG by (area, job) (avg_over_time(jvm_memory_bytes_used[5m]))) # return all data in the last five minutes, average it per range, and then further use the AVG aggregation operator to find the average value of memory consumed grouped by area and job. Then we use topK to return the top five areas and jobs with maximum memory used. 
sum by (app) (rate(message_executiuon_tiome_secound_count[1m]))   - to daje srednio na sekunde - czli przyrost w oknie czasowym
sum by (app) (increase(message_executiuon_tiome_secound_count[1m])) - przyrost w zdefiniowanym oknie tutaj okno jest w 1 minucie
sum(rate(http_requests_total[5m]))
sum by (handler) (rate(http_requests_total[5m]))

PromQL has almost 50 different functions for a variety of use cases, such as math; sorting; counter, gauge and histogram manipulation; label transformations; aggregations over time; type conversions; and finally, date and time functions.

Logical and Arithmetic Operators

PromQL enables us to apply varied operators on the result sets, allowing us to combine different datasets so as to compare and derive meaningful insights.

Use Case 1: Let’s begin with a use case wherein we compare current data with the historical data collected seven days back to identify any rise in memory consumed.

As shown in the following query, we use the comparison operator > between the two result sets to identify the labeled data where consumption is more than it was seven days before:

jvm_memory_bytes_used > 1 * (jvm_memory_bytes_used offset 7d)

As can be seen, we have simply used the operator between the previously fetched two vectors. We can use any selector criteria to select the data, and then can use the operators to do the needful; in this case we are comparing and identifying the ones where the consumption has increased

Use Case 2: As we know, the data returned by the metric jvm_memory_ bytes_used is in bytes. In this use case, we will use a scalar arithmetic operation to convert the value to megabytes. The following query uses the multiplication operator to multiply the value by 0.000001 to convert it to megabytes:

jvm_memory_bytes_used * 0.000001

Use Case 3: Let’s now use two different metrics’ data. In this use case, we will consider the jvm_memory_bytes_usage metric along with jvm_ memory_bytes_committed. We will use the subtraction operator to identify the bytes remaining to consume, and further use scalar multiplication to convert the data into megabytes. The following query enables us to find the difference and returns the data in megabytes:

(jvm_memory_bytes_committed - jvm_memory_bytes_used) * 0.000001

Use Case 4: Next, let’s apply an aggregation operator to the output of Use Case 3 to return area- and job-wise bytes remaining. Use the following query; we also use the scalar multiplier on the final output to convert it to megabytes:

sum by (area,job) (jvm_memory_bytes_committed - jvm_memory_bytes_used) * 0.000001

We can also apply topk to return the top two with maximum bytes remaining, as in the following query:

TOPK(2, sum by (job) (jvm_memory_bytes_committed - jvm_memory_bytes_used) )* 0.000001

Use Case 5: we have simply used the OR operator between the two outputs, and it returns the expected output. The output returns rows where either the “job” value is jira or the “exported_job” value is jira

(jvm_memory_bytes_used {job="jira"}) or (jvm_memory_bytes_used{exported_job="jira"})

Istio metrics

istio_requests_total

sum by (app) (increase(istio_requests_total {app='acp_service})[5m])) - przyrost requestow w okine 5 minutowej - ile przybywalo requestow na 5 minut

istio_request_duration_miillisecond_count - czas trwania zadania

Histogram z bucket metric

sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
⚠️ **GitHub.com Fallback** ⚠️