Handling Timeouts on Scraping - promregator/promregator GitHub Wiki

Symptom

Operating Promregator version 0.4.0 or later, you receive the following error message:

Not all targets could be scraped within the current promregator.endpoint.maxProcessingTime. Consider increasing promregator.endpoint.maxProcessingTime or promregator.endpoint.threads, but mind the implications.

The message appears sporadically or even regularly.

Background / Root Cause

In Single Endpoint Scraping mode, Promregator is scraping its complete set of configured targets based on a thread pool. The size of the thread pool can be configured using the configuration option promregator.endpoint.threads. For each target a thread is allocated from the pool. The thread then prepares and sends the scraping request to the configured target. Waiting for the response (the thread does not get returned to the thread pool for this), the thread also handles the processing of the result which it has retrieved from the target. Each request to the target must have finished its processing after promregator.endpoint.maxProcessingTime milliseconds. Any response, which is retrieved after this timeout, will be discarded by Promregator. This is to ensure that requests may not be hanging (for example due to targets with a big latency) and thus will block other - healthy - scraping requests.

In Single Target Scraping mode, Promregator is scraping targets individually. There, for each target subject to scraping, the target may respond within promregator.endpoint.maxProcessingTime milliseconds. A similar approach on thread pooling (also using promregator.endpoint.threads threads) and timeout handling based on promregator.endpoint.maxProcessingTime is applied.

The error message above indicates that one (or several) threads have passed beyond promregator.endpoint.maxProcessingTime. There may be multiple reasons to that:

The target was too slow in answering the scraping request.
There are too many targets to scrape compared to the number of threads available in the thread pool.

Solution

For a first brief analysis, you may use the metric promregator_scrape_duration_seconds to determine how long the scraping requests took in total.

Depending on the root cause there may be multiple possible solutions:

If you encounter the error message only very sporadically, then it is likely that only a very low number of targets sometimes are not able to answer the scraping requests in a timely manner. Usually, this then can be ignored without further risk.

In case that you want to analyze further and determine which of the scraping targets is hitting the timeout, it is recommended to make use of the metric promregator_request_latency. Note that you have to enable this metric before you are able to use it. Further details can be found at https://github.com/promregator/promregator/blob/master/docs/enrichment.md
If you encounter the error message often or even regularly, it is an indicator that Promregator is misconfigured.
- If you have a larger number (>20) of targets and you are using Single Endpoint Scraping mode, consider changing this to Single Target Scraping mode. The latter is more efficient in scraping, as it distributes your load more evenly across the scraping interval.
- If you are already using Single Target Scraping mode, try to analyze, whether you there are targets involved, which have a higher latency than you have set with promregator.endpoint.maxProcessingTime. For the analysis it is recommended to make use of the metric promregator_request_latency. Note that you have to enable this metric before you are able to use it. Further details on this can be found at https://github.com/promregator/promregator/blob/master/docs/enrichment.md . If you have such a target, consider improving the scraping performance of that target, or alternatively increase the configuration parameter promregator.endpoint.maxProcessingTime. When doing the latter, consider the constraints, which is documented for this configuration parameter.
- If you are already using Single Target Scraping mode and you cannot spot a slow-responding target, then the number of targets is too high for the number of threads available in the thread pool. Consider then to increase the configuration parameter promregator.endpoint.threads. It is recommended to do that only in small steps combined with interleaved tests, as already small increases in this value may provide significant benefit. If you are increasing this value, also consider the impact on your memory consumption as it is being described in the documentation of the configuration parameter promregator.endpoint.threads.

References

This topic has also been discussed in https://github.com/promregator/promregator/issues/59#issuecomment-405342624.