Varadhi Metrics Documentation - flipkart-incubator/varadhi GitHub Wiki
Overview
Varadhi uses a robust metrics system to provide visibility into application performance, resource utilization, and operational health. This document covers how to enable and use metrics in Varadhi, focusing on JVM and Vert.x metrics.
Varadhi leverages the following metrics technologies:
- Micrometer: Primary metrics facade providing a vendor-neutral API
- Prometheus: For metrics storage and querying
- Grafana: For metrics visualization
- OpenTelemetry: For metrics export and standards-compliant instrumentation
Metrics Categories
Varadhi exposes metrics in the following categories:
- JVM Metrics: Memory, garbage collection, threads, classloading
- Vert.x Metrics:
- Event Loop metrics
- Event Bus metrics
- Worker Pool metrics
- Vert.x Cluster metrics
- Application-Specific Metrics:
- Producer metrics
- HTTP API metrics
Enabling Metrics
Metrics collection is enabled by default in Varadhi. The configuration can be adjusted in the following ways:
Configuration File
In conf/configuration.yml
:
# Producer metrics can be toggled
producerOptions:
metricEnabled: true
Helm Configuration
When deploying with Helm, metrics can be configured in values.yaml
:
varadhi:
app:
producerOptions:
metricEnabled: true
# OpenTelemetry configuration
otlpConfig:
otlp.url: "http://otel-collector:4318/v1/metrics"
otlp.step: "20s"
otlp.aggregationTemporality: "CUMULATIVE"
otlp.resourceAttributes:
otlp.headers:
Metrics Backends
Varadhi supports multiple metrics backends:
Prometheus
Prometheus is the recommended metrics backend for production deployments. Varadhi exposes metrics in Prometheus format through an OpenTelemetry collector.
Configuration:
- The OpenTelemetry collector is configured in
setup/docker/configs/otel-collector.yml
- Prometheus scrapes metrics from the OpenTelemetry collector as configured in
setup/docker/configs/prometheus.yml
JMX
JMX metrics are useful for local development and debugging. They're enabled by default and accessible through tools like JConsole or VisualVM.
OTLP (OpenTelemetry Protocol)
OTLP is a standardized protocol for transmitting telemetry data (metrics, logs, and traces) between services and observability backends.
- It is a unified transport layer between the application and monitoring systems.
- Enables seamless integration with various backends.
- Reduces vendor lock-in while standardizing telemetry data collection.
JVM Metrics
JVM metrics provide insight into the Java Virtual Machine's performance and resource usage.
Available JVM Metrics
Metric Name | Type | Description |
---|---|---|
jvm.memory.used |
Gauge | Used memory by memory pool |
jvm.memory.committed |
Gauge | Committed memory by memory pool |
jvm.memory.max |
Gauge | Max memory by memory pool |
jvm.gc.memory.allocated |
Counter | Allocated memory (bytes) |
jvm.gc.memory.promoted |
Counter | Promoted memory to old generation (bytes) |
jvm.gc.max.data.size |
Gauge | Max size of old generation (bytes) |
jvm.gc.live.data.size |
Gauge | Size of long-lived heap memory pool after GC (bytes) |
jvm.gc.pause |
Timer | GC pause duration |
jvm.threads.peak |
Gauge | Peak thread count |
jvm.threads.daemon |
Gauge | Current daemon thread count |
jvm.threads.live |
Gauge | Current thread count |
jvm.threads.states |
Gauge | Thread count by state |
jvm.classes.loaded |
Gauge | Number of loaded classes |
jvm.classes.unloaded |
Counter | Total number of unloaded classes |
Enabling JVM Metrics
JVM metrics are enabled by default in Varadhi. The configuration is in VaradhiApplication.java
:
var metricsOptions = new MicrometerMetricsOptions()
.setMicrometerRegistry(services.getMeterRegistry())
.setMetricsNaming(MetricsNaming.v4Names())
.setRegistryName("default")
.addDisabledMetricsCategory(MetricsDomain.HTTP_SERVER)
.setJvmMetricsEnabled(true)
.setEnabled(true);
Vert.x Metrics
Vert.x metrics provide visibility into the internal operations of the Vert.x framework.
Event Loop Metrics
Metric Name | Type | Description |
---|---|---|
vertx.eventloop.delay |
Timer | Event loop delay |
vertx.eventloop.usage |
Gauge | Event loop usage |
vertx.eventloop.queue.delay |
Timer | Event loop queue delay |
vertx.eventloop.queue.size |
Gauge | Event loop queue size |
Event Bus Metrics
Metric Name | Type | Description |
---|---|---|
vertx.eventbus.handlers |
Gauge | Number of event bus handlers |
vertx.eventbus.messages.received |
Counter | Messages received on event bus |
vertx.eventbus.messages.sent |
Counter | Messages sent on event bus |
vertx.eventbus.messages.published |
Counter | Messages published on event bus |
vertx.eventbus.messages.delivered |
Counter | Messages successfully delivered |
vertx.eventbus.messages.pending.local |
Gauge | Pending local messages |
vertx.eventbus.messages.pending.remote |
Gauge | Pending remote messages |
vertx.eventbus.messages.errors |
Counter | Message errors |
Worker Pool Metrics
Metric Name | Type | Description |
---|---|---|
vertx.pool.tasks.submitted |
Counter | Tasks submitted to the worker pool |
vertx.pool.tasks.completed |
Counter | Tasks completed by the worker pool |
vertx.pool.tasks.active |
Gauge | Active tasks in the worker pool |
vertx.pool.tasks.pending |
Gauge | Pending tasks in the worker pool |
vertx.pool.tasks.time |
Timer | Execution time for tasks |
vertx.pool.usage |
Gauge | Worker pool thread usage |
vertx.pool.size |
Gauge | Worker pool size |
Cluster Metrics
Metric Name | Type | Description |
---|---|---|
vertx.cluster.nodes |
Gauge | Number of nodes in the cluster |
vertx.cluster.nodes.joined |
Counter | Nodes that joined the cluster |
vertx.cluster.nodes.left |
Counter | Nodes that left the cluster |
Enabling Vert.x Metrics
Vert.x metrics are configured in VaradhiApplication.java
:
var metricsOptions = new MicrometerMetricsOptions()
.setMicrometerRegistry(services.getMeterRegistry())
.setMetricsNaming(MetricsNaming.v4Names())
.setRegistryName("default")
.addDisabledMetricsCategory(MetricsDomain.HTTP_SERVER)
.setJvmMetricsEnabled(true)
.setEnabled(true);
VertxOptions vertxOptions = config.getVertxOptions()
.setTracingOptions(new OpenTelemetryOptions(services.getOpenTelemetry()))
.setMetricsOptions(metricsOptions)
.setEventBusOptions(eventBusOptions);
Application-Specific Metrics
Producer Metrics
Producer metrics provide insights into message production performance.
Configuration:
producerOptions:
metricEnabled: true
HTTP API Metrics
HTTP API metrics track API usage and performance.
Monitoring Setup
Local Development
For local development, Varadhi includes a Docker Compose setup with Prometheus and Grafana:
-
Start the monitoring stack:
cd setup/docker docker-compose -f prometheus-compose.yml up -d
-
Access Prometheus UI: http://localhost:9090
-
Access Grafana: http://localhost:3000 (default credentials: admin/admin)
Production Deployment
For production deployments, configure Prometheus and Grafana using the provided Helm charts:
cd setup/helm
helm install varadhi-metrics ./varadhi -f values/metrics.values.yaml
Best Practices
-
Alert on Critical Metrics: Configure alerts for critical metrics such as high memory usage, excessive GC pauses, or event loop delays.
-
Retention Policy: Configure appropriate retention policies for metrics based on your monitoring needs.
-
Dashboard Organization: Organize dashboards by component (JVM, Vert.x, Application) and criticality.
-
Correlate Metrics: Correlate metrics with logs and traces for better troubleshooting.
-
Regular Review: Regularly review metrics to identify trends and potential issues before they become critical.
Troubleshooting
Common Issues
-
Metrics Not Appearing:
- Verify metrics are enabled in configuration
- Check connectivity between components (Varadhi → OpenTelemetry → Prometheus)
- Verify the metrics endpoint is accessible
-
High Memory Usage:
- Review JVM memory settings
- Check for memory leaks using heap dumps
- Monitor GC metrics for inefficient garbage collection
-
Event Loop Delays:
- Check for blocking operations on event loop threads
- Consider increasing event loop pool size
- Optimize long-running operations