Syslog load testing - opennms-forge/opennms-benchmark GitHub Wiki

This scenario for this experiment is for the use case when OpenNMS is running in a larger monitoring application stack. The Minion is used as an Syslog receiver. Received Syslog messages are persisted in the OpenNMS database and forwarded to Kafka into the events topic. OpenNMS Alarmd is not configured to create or manage alarm lifecycles.

🎯 Goals

  • Identify how many Syslog messages can be reliably processed without data loss
  • Use standard settings for Syslogd
  • Capture performance characteristics for the main parts:
    • Database: CPU, Memory, Load, Network
    • Core: CPU,Memory, Load, Network
    • Kafka: CPU,Memory, Load, Network
    • Minion: CPU,Memory, Load, Network
    • SNMP Simulator: CPU,Memory, Load, Network

🧟 Non-Goals

  • Variables for tuning Syslogd with queue/batch sizes, and batch interval
  • Mixed workloads because of a large set of combinations, e.g. Pollerd + Collectd, Syslog + SNMP Traps, Flows and Traps
  • Service outages and network latency
  • Alarm lifecycle processing based on Syslog
  • Kernel parameters for UDP socket tuning

Experiment 03

Tool used for load testing: https://github.com/indigo423/syslog-load-testing

Configuration

cd experiments/c1km1_4c16g_kfk_syslog
ansible-playbook -i ../../ansible-inventory.yml experiment.yml

Goal

  • Collectd load behavior by increasing workload incrementally
  • Identify resource constraints by increasing workload affecting, Core, PostgreSQL, Minion and Kafka
  • Identify the number of Syslog messages per second the system is saturated

Compute and Storage

This benchmark lab is intended to give you a tool, that allows you to deploy 6 virtual machines.

Component CPU RAM Disk Network Description
OpenNMS Core 4 16GB 30GB 10Gbps OpenNMS Horizon 33.1.8
PostgreSQL 4 8GB 30GB 10Gbps PostgreSQL 15
Apache Kafka 4 8GB 30GB 10Gbps Kafka 3.9.1
OpenNMS Minion 4 8GB 30GB 10Gbps Horizon Minion 33.1.8
Net-SNMP Simulator 4 8GB 30GB 10Gbps Net-SNMP Agent
Monitoring 4 8GB 30GB 10Gbps Prometheus, Jaeger, Grafana
Component Azure SKU CPU Type IOPS Cap
OpenNMS Core Standard_B4s_v2 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 3570
PostgreSQL Standard_B4ls_v2 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 3570
Apache Kafka Standard_B4ls_v2 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 3570
OpenNMS Minion Standard_B4ls_v2 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 3570
Net-SNMP Simulator Standard_B4ls_v2 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 3570
Monitoring Standard_B4ls_v2 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz 3570

Configuration

Component Description
Syslogd Default configuration
Kafka Producer Enabled with event forwarding
Network Latency Low latency

OpenNMS Core Services

For this experiment the following daemons in the service-configuration.xml where enabled:

  • Manager
  • TestLoadLibraries
  • Eventd
  • Alarmd
  • Provisiond
  • JettyServer
  • KarafStartupMonitor
  • Syslogd

Java Virtual Machine Settings

Content: opennms.conf

JAVA_HEAP_SIZE=8192
JAVA_INITIAL_HEAP_SIZE=8192
ADDITIONAL_MANAGER_OPTIONS="-XX:+UseG1GC -javaagent:/opt/prom-jmx-exporter/jmx_prometheus_javaagent.jar=9299:/opt/prom-jmx-exporter/config.yaml"

PostgreSQL 15

Content: postgresql.conf

# DB Version: 15
# OS Type: linux
# DB Type: mixed
# Total Memory (RAM): 7 GB
# CPUs num: 4
# Connections num: 100
# Data Storage: hdd

max_connections = 100
shared_buffers = 1792MB
effective_cache_size = 5376MB
maintenance_work_mem = 448MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 8822kB
huge_pages = off
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2

Configuration files

Requisition

A network inventory with 10229 Linux nodes is provisioned. The Syslog messages for a load test are associated to a single node.

Syslog load test

Load is increased by sending various Syslog messages in batches to the Minion.

[!NOTE] The measurements are taken from a 15 second sampling window.

Batch volume Traps total per batch Load Duration Processing Duration
01 ~100 syslog messages per second 90.000 ~15m to Minion ~15m
02 ~500 syslogs messages per second 450.000 ~15m to Minion ~40m
03 ~1000 syslogs messages per second 450.000 ~15m to Minion ~55m
04 ~9000 syslog messages per second 3.600.000 ~15m to Core terminated after 45 m

OpenNMS Internal Metrics

System Metrics

Database

Full Node Exporter Database

OpenNMS Core

Full Node Exporter Core

Kafka

Full Node Exporter Kafka

OpenNMS Minion

Full Node Exporter Minion

Syslog load generation

Full Node Exporter SNMPSIM