Varadhi Metrics Documentation - flipkart-incubator/varadhi GitHub Wiki

Overview

Varadhi uses a robust metrics system to provide visibility into application performance, resource utilization, and operational health. This document covers how to enable and use metrics in Varadhi, focusing on JVM and Vert.x metrics.

Varadhi leverages the following metrics technologies:

  • Micrometer: Primary metrics facade providing a vendor-neutral API
  • Prometheus: For metrics storage and querying
  • Grafana: For metrics visualization
  • OpenTelemetry: For metrics export and standards-compliant instrumentation

Metrics Categories

Varadhi exposes metrics in the following categories:

  1. JVM Metrics: Memory, garbage collection, threads, classloading
  2. Vert.x Metrics:
    • Event Loop metrics
    • Event Bus metrics
    • Worker Pool metrics
    • Vert.x Cluster metrics
  3. Application-Specific Metrics:
    • Producer metrics
    • HTTP API metrics

Enabling Metrics

Metrics collection is enabled by default in Varadhi. The configuration can be adjusted in the following ways:

Configuration File

In conf/configuration.yml:

# Producer metrics can be toggled
producerOptions:
  metricEnabled: true

Helm Configuration

When deploying with Helm, metrics can be configured in values.yaml:

varadhi:
  app:
    producerOptions:
      metricEnabled: true
    
# OpenTelemetry configuration
otlpConfig:
  otlp.url: "http://otel-collector:4318/v1/metrics"
  otlp.step: "20s"
  otlp.aggregationTemporality: "CUMULATIVE"
  otlp.resourceAttributes:
  otlp.headers:

Metrics Backends

Varadhi supports multiple metrics backends:

Prometheus

Prometheus is the recommended metrics backend for production deployments. Varadhi exposes metrics in Prometheus format through an OpenTelemetry collector.

Configuration:

  1. The OpenTelemetry collector is configured in setup/docker/configs/otel-collector.yml
  2. Prometheus scrapes metrics from the OpenTelemetry collector as configured in setup/docker/configs/prometheus.yml

JMX

JMX metrics are useful for local development and debugging. They're enabled by default and accessible through tools like JConsole or VisualVM.

OTLP (OpenTelemetry Protocol)

OTLP is a standardized protocol for transmitting telemetry data (metrics, logs, and traces) between services and observability backends.

  1. It is a unified transport layer between the application and monitoring systems.
  2. Enables seamless integration with various backends.
  3. Reduces vendor lock-in while standardizing telemetry data collection.

JVM Metrics

JVM metrics provide insight into the Java Virtual Machine's performance and resource usage.

Available JVM Metrics

Metric Name Type Description
jvm.memory.used Gauge Used memory by memory pool
jvm.memory.committed Gauge Committed memory by memory pool
jvm.memory.max Gauge Max memory by memory pool
jvm.gc.memory.allocated Counter Allocated memory (bytes)
jvm.gc.memory.promoted Counter Promoted memory to old generation (bytes)
jvm.gc.max.data.size Gauge Max size of old generation (bytes)
jvm.gc.live.data.size Gauge Size of long-lived heap memory pool after GC (bytes)
jvm.gc.pause Timer GC pause duration
jvm.threads.peak Gauge Peak thread count
jvm.threads.daemon Gauge Current daemon thread count
jvm.threads.live Gauge Current thread count
jvm.threads.states Gauge Thread count by state
jvm.classes.loaded Gauge Number of loaded classes
jvm.classes.unloaded Counter Total number of unloaded classes

Enabling JVM Metrics

JVM metrics are enabled by default in Varadhi. The configuration is in VaradhiApplication.java:

var metricsOptions = new MicrometerMetricsOptions()
    .setMicrometerRegistry(services.getMeterRegistry())
    .setMetricsNaming(MetricsNaming.v4Names())
    .setRegistryName("default")
    .addDisabledMetricsCategory(MetricsDomain.HTTP_SERVER)
    .setJvmMetricsEnabled(true)
    .setEnabled(true);

Vert.x Metrics

Vert.x metrics provide visibility into the internal operations of the Vert.x framework.

Event Loop Metrics

Metric Name Type Description
vertx.eventloop.delay Timer Event loop delay
vertx.eventloop.usage Gauge Event loop usage
vertx.eventloop.queue.delay Timer Event loop queue delay
vertx.eventloop.queue.size Gauge Event loop queue size

Event Bus Metrics

Metric Name Type Description
vertx.eventbus.handlers Gauge Number of event bus handlers
vertx.eventbus.messages.received Counter Messages received on event bus
vertx.eventbus.messages.sent Counter Messages sent on event bus
vertx.eventbus.messages.published Counter Messages published on event bus
vertx.eventbus.messages.delivered Counter Messages successfully delivered
vertx.eventbus.messages.pending.local Gauge Pending local messages
vertx.eventbus.messages.pending.remote Gauge Pending remote messages
vertx.eventbus.messages.errors Counter Message errors

Worker Pool Metrics

Metric Name Type Description
vertx.pool.tasks.submitted Counter Tasks submitted to the worker pool
vertx.pool.tasks.completed Counter Tasks completed by the worker pool
vertx.pool.tasks.active Gauge Active tasks in the worker pool
vertx.pool.tasks.pending Gauge Pending tasks in the worker pool
vertx.pool.tasks.time Timer Execution time for tasks
vertx.pool.usage Gauge Worker pool thread usage
vertx.pool.size Gauge Worker pool size

Cluster Metrics

Metric Name Type Description
vertx.cluster.nodes Gauge Number of nodes in the cluster
vertx.cluster.nodes.joined Counter Nodes that joined the cluster
vertx.cluster.nodes.left Counter Nodes that left the cluster

Enabling Vert.x Metrics

Vert.x metrics are configured in VaradhiApplication.java:

var metricsOptions = new MicrometerMetricsOptions()
    .setMicrometerRegistry(services.getMeterRegistry())
    .setMetricsNaming(MetricsNaming.v4Names())
    .setRegistryName("default")
    .addDisabledMetricsCategory(MetricsDomain.HTTP_SERVER)
    .setJvmMetricsEnabled(true)
    .setEnabled(true);

VertxOptions vertxOptions = config.getVertxOptions()
    .setTracingOptions(new OpenTelemetryOptions(services.getOpenTelemetry()))
    .setMetricsOptions(metricsOptions)
    .setEventBusOptions(eventBusOptions);

Application-Specific Metrics

Producer Metrics

Producer metrics provide insights into message production performance.

Configuration:

producerOptions:
  metricEnabled: true

HTTP API Metrics

HTTP API metrics track API usage and performance.

Monitoring Setup

Local Development

For local development, Varadhi includes a Docker Compose setup with Prometheus and Grafana:

  1. Start the monitoring stack:

    cd setup/docker
    docker-compose -f prometheus-compose.yml up -d
    
  2. Access Prometheus UI: http://localhost:9090

  3. Access Grafana: http://localhost:3000 (default credentials: admin/admin)

Production Deployment

For production deployments, configure Prometheus and Grafana using the provided Helm charts:

cd setup/helm
helm install varadhi-metrics ./varadhi -f values/metrics.values.yaml

Best Practices

  1. Alert on Critical Metrics: Configure alerts for critical metrics such as high memory usage, excessive GC pauses, or event loop delays.

  2. Retention Policy: Configure appropriate retention policies for metrics based on your monitoring needs.

  3. Dashboard Organization: Organize dashboards by component (JVM, Vert.x, Application) and criticality.

  4. Correlate Metrics: Correlate metrics with logs and traces for better troubleshooting.

  5. Regular Review: Regularly review metrics to identify trends and potential issues before they become critical.

Troubleshooting

Common Issues

  1. Metrics Not Appearing:

    • Verify metrics are enabled in configuration
    • Check connectivity between components (Varadhi → OpenTelemetry → Prometheus)
    • Verify the metrics endpoint is accessible
  2. High Memory Usage:

    • Review JVM memory settings
    • Check for memory leaks using heap dumps
    • Monitor GC metrics for inefficient garbage collection
  3. Event Loop Delays:

    • Check for blocking operations on event loop threads
    • Consider increasing event loop pool size
    • Optimize long-running operations

References