prometheus Google Cloud Managed Service for Prometheus - ghdrako/doc_snipets GitHub Wiki
- https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed
- https://cloud.google.com/stackdriver/docs/managed-prometheus#gmp-data-collection
- https://cloud.google.com/stackdriver/docs/managed-prometheus
You can use Managed Service for Prometheus in one of four modes:
- with managed data collection,
- with self-deployed data collection,
- with the OpenTelemetry Collector, or
- with the Ops Agent.
Recomended ist first option. Managed Service for Prometheus offers an operator for managed data collection in Kubernetes environments.
Managed collection runs Prometheus-based collectors as a Daemonset and ensures scalability by only scraping targets on colocated nodes. You configure the collectors with lightweight custom resources to scrape exporters using pull collection, then the collectors push the scraped data to the central data store Monarch. Google Cloud never directly accesses your cluster to pull or scrape metric data; your collectors push data to Google Cloud.
Operation of Prometheus—generating scrape configurations, scaling ingestion, scoping rules to the right data, and so forth—is fully handled by the Kubernetes operator. Scraping and rules are configured by using lightweight custom resources (CRs).
Enable manage Collection
- https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed
- https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#enable-mgdcoll-gke
Managed collection is enabled by default for the following:
- GKE Autopilot clusters running GKE version 1.25 or greater.
- GKE Standard clusters running GKE version 1.27 or greater. You can override this default when creating the cluster; see Disable managed collection.
If you are running in a GKE environment that does not enable managed collection by default, then see Enable managed collection manually.
Managed collection on GKE is automatically upgraded when new in-cluster component versions are released.
Quering CloudMonitoring by Grafana using PromQL
- https://cloud.google.com/stackdriver/docs/managed-prometheus/query
- https://cloud.google.com/monitoring/promql
- https://cloud.google.com/stackdriver/docs/managed-prometheus/best-practices/ingest-and-query
PodMonitoring
# The content of this page is licensed under the Creative Commons Attribution 4.0
# License, and code samples are licensed under the Apache 2.0 License
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
name: flask-example
labels:
app.kubernetes.io/name: flask-example
spec:
selector:
matchLabels:
app: flask-example
endpoints:
- port: metrics
interval: 5s
fetch prometheus_target
| metric 'prometheus.googleapis.com/custom_collector_cpu_usage/gauge'
#The content of this page is licensed under the Creative Commons Attribution
#4.0 License, and code samples are licensed under the Apache 2.0 License
import logging
import random
import time
import psutil
from flask import Flask
from prometheus_client import Counter, Gauge, generate_latest, Histogram, REGISTRY
logger = logging.getLogger(__name__)
app = Flask(__name__)
CONTENT_TYPE_LATEST = str('text/plain; version=0.0.4; charset=utf-8')
number_of_requests = Counter(
'number_of_requests',
'The number of requests, a counter so the value can increase or reset to zero.'
)
custom_collector_cpu_usage = Gauge(
'custom_collector_cpu_usage',
'The current value of cpu usage, a gauge so it can go up or down.',
['server_name']
)
PYTHON_REQUESTS_COUNTER = Counter("python_requests", "total requests")
PYTHON_FAILED_REQUESTS_COUNTER = Counter("python_failed_requests", "failed requests")
PYTHON_LATENCIES_HISTOGRAM = Histogram(
"python_request_latency", "request latency by path"
)
@app.route('/metrics', methods=['GET'])
def get_data():
"""Returns all data as plaintext."""
number_of_requests.inc()
custom_collector_cpu_usage.labels('custom_collector_cpu_usage').set(int(psutil.cpu_percent() * 10))
return generate_latest(REGISTRY), 200
@app.route("/")
# [START monitoring_sli_metrics_prometheus_latency]
@PYTHON_LATENCIES_HISTOGRAM.time()
# [END monitoring_sli_metrics_prometheus_latency]
def homepage():
# count request
PYTHON_REQUESTS_COUNTER.inc()
# fail 10% of the time
if random.randint(0, 100) > 90:
PYTHON_FAILED_REQUESTS_COUNTER.inc()
# [END monitoring_sli_metrics_prometheus_counts]
return ("error!", 500)
else:
random_delay = random.randint(0, 5000) / 1000
# delay for a bit to vary latency measurement
time.sleep(random_delay)
return "home page"
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)