prometheus Google Cloud Managed Service for Prometheus - ghdrako/doc_snipets GitHub Wiki

obraz

You can use Managed Service for Prometheus in one of four modes:

Recomended ist first option. Managed Service for Prometheus offers an operator for managed data collection in Kubernetes environments.

Managed collection runs Prometheus-based collectors as a Daemonset and ensures scalability by only scraping targets on colocated nodes. You configure the collectors with lightweight custom resources to scrape exporters using pull collection, then the collectors push the scraped data to the central data store Monarch. Google Cloud never directly accesses your cluster to pull or scrape metric data; your collectors push data to Google Cloud.

Operation of Prometheus—generating scrape configurations, scaling ingestion, scoping rules to the right data, and so forth—is fully handled by the Kubernetes operator. Scraping and rules are configured by using lightweight custom resources (CRs).

Enable manage Collection

Managed collection is enabled by default for the following:

GKE Autopilot clusters running GKE version 1.25 or greater.
GKE Standard clusters running GKE version 1.27 or greater. You can override this default when creating the cluster; see Disable managed collection.

If you are running in a GKE environment that does not enable managed collection by default, then see Enable managed collection manually.

Managed collection on GKE is automatically upgraded when new in-cluster component versions are released.

Quering CloudMonitoring by Grafana using PromQL

PodMonitoring

https://github.com/GoogleCloudPlatform/prometheus-engine/blob/v0.4.3-gke.0/doc/api.md#podmonitoring

# The content of this page is licensed under the Creative Commons Attribution 4.0
# License, and code samples are licensed under the Apache 2.0 License

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: flask-example
  labels:
    app.kubernetes.io/name: flask-example
spec:
  selector:
    matchLabels:
      app: flask-example
  endpoints:
  - port: metrics
    interval: 5s

fetch prometheus_target
| metric 'prometheus.googleapis.com/custom_collector_cpu_usage/gauge'

#The content of this page is licensed under the Creative Commons Attribution 
#4.0 License, and code samples are licensed under the Apache 2.0 License

import logging
import random
import time

import psutil
from flask import Flask
from prometheus_client import Counter, Gauge, generate_latest, Histogram, REGISTRY

logger = logging.getLogger(__name__)

app = Flask(__name__)

CONTENT_TYPE_LATEST = str('text/plain; version=0.0.4; charset=utf-8')

number_of_requests = Counter(
    'number_of_requests',
    'The number of requests, a counter so the value can increase or reset to zero.'
)

custom_collector_cpu_usage = Gauge(
    'custom_collector_cpu_usage',
    'The current value of cpu usage, a gauge so it can go up or down.',
    ['server_name']
)

PYTHON_REQUESTS_COUNTER = Counter("python_requests", "total requests")
PYTHON_FAILED_REQUESTS_COUNTER = Counter("python_failed_requests", "failed requests")
PYTHON_LATENCIES_HISTOGRAM = Histogram(
    "python_request_latency", "request latency by path"
)


@app.route('/metrics', methods=['GET'])
def get_data():
    """Returns all data as plaintext."""
    number_of_requests.inc()
    custom_collector_cpu_usage.labels('custom_collector_cpu_usage').set(int(psutil.cpu_percent() * 10))
    return generate_latest(REGISTRY), 200


@app.route("/")
# [START monitoring_sli_metrics_prometheus_latency]
@PYTHON_LATENCIES_HISTOGRAM.time()
# [END monitoring_sli_metrics_prometheus_latency]
def homepage():
    # count request
    PYTHON_REQUESTS_COUNTER.inc()
    # fail 10% of the time
    if random.randint(0, 100) > 90:
        PYTHON_FAILED_REQUESTS_COUNTER.inc()
        # [END monitoring_sli_metrics_prometheus_counts]
        return ("error!", 500)
    else:
        random_delay = random.randint(0, 5000) / 1000
        # delay for a bit to vary latency measurement
        time.sleep(random_delay)
        return "home page"


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)