Syslog load testing - opennms-forge/opennms-benchmark GitHub Wiki

This scenario for this experiment is for the use case when OpenNMS is running in a larger monitoring application stack. The Minion is used as an Syslog receiver. Received Syslog messages are persisted in the OpenNMS database and forwarded to Kafka into the events topic. OpenNMS Alarmd is not configured to create or manage alarm lifecycles.

🎯 Goals

Identify how many Syslog messages can be reliably processed without data loss
Use standard settings for Syslogd
Capture performance characteristics for the main parts:
- Database: CPU, Memory, Load, Network
- Core: CPU,Memory, Load, Network
- Kafka: CPU,Memory, Load, Network
- Minion: CPU,Memory, Load, Network
- SNMP Simulator: CPU,Memory, Load, Network

🧟 Non-Goals

Variables for tuning Syslogd with queue/batch sizes, and batch interval
Mixed workloads because of a large set of combinations, e.g. Pollerd + Collectd, Syslog + SNMP Traps, Flows and Traps
Service outages and network latency
Alarm lifecycle processing based on Syslog
Kernel parameters for UDP socket tuning

Experiment 03

Tool used for load testing: https://github.com/indigo423/syslog-load-testing

Configuration

cd experiments/c1km1_4c16g_kfk_syslog
ansible-playbook -i ../../ansible-inventory.yml experiment.yml

Goal

Collectd load behavior by increasing workload incrementally
Identify resource constraints by increasing workload affecting, Core, PostgreSQL, Minion and Kafka
Identify the number of Syslog messages per second the system is saturated

Compute and Storage

This benchmark lab is intended to give you a tool, that allows you to deploy 6 virtual machines.

Component	CPU	RAM	Disk	Network	Description
OpenNMS Core	4	16GB	30GB	10Gbps	OpenNMS Horizon 33.1.8
PostgreSQL	4	8GB	30GB	10Gbps	PostgreSQL 15
Apache Kafka	4	8GB	30GB	10Gbps	Kafka 3.9.1
OpenNMS Minion	4	8GB	30GB	10Gbps	Horizon Minion 33.1.8
Net-SNMP Simulator	4	8GB	30GB	10Gbps	Net-SNMP Agent
Monitoring	4	8GB	30GB	10Gbps	Prometheus, Jaeger, Grafana

Component	Azure SKU	CPU Type	IOPS Cap
OpenNMS Core	Standard_B4s_v2	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	3570
PostgreSQL	Standard_B4ls_v2	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	3570
Apache Kafka	Standard_B4ls_v2	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	3570
OpenNMS Minion	Standard_B4ls_v2	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	3570
Net-SNMP Simulator	Standard_B4ls_v2	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	3570
Monitoring	Standard_B4ls_v2	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	3570

Configuration

Component	Description
Syslogd	Default configuration
Kafka Producer	Enabled with event forwarding
Network Latency	Low latency

OpenNMS Core Services

For this experiment the following daemons in the service-configuration.xml where enabled:

Manager
TestLoadLibraries
Eventd
Alarmd
Provisiond
JettyServer
KarafStartupMonitor
Syslogd

Java Virtual Machine Settings

Content: opennms.conf

JAVA_HEAP_SIZE=8192
JAVA_INITIAL_HEAP_SIZE=8192
ADDITIONAL_MANAGER_OPTIONS="-XX:+UseG1GC -javaagent:/opt/prom-jmx-exporter/jmx_prometheus_javaagent.jar=9299:/opt/prom-jmx-exporter/config.yaml"

PostgreSQL 15

Content: postgresql.conf

# DB Version: 15
# OS Type: linux
# DB Type: mixed
# Total Memory (RAM): 7 GB
# CPUs num: 4
# Connections num: 100
# Data Storage: hdd

max_connections = 100
shared_buffers = 1792MB
effective_cache_size = 5376MB
maintenance_work_mem = 448MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 8822kB
huge_pages = off
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2

Configuration files

Requisition

A network inventory with 10229 Linux nodes is provisioned. The Syslog messages for a load test are associated to a single node.

Requisitions
Provisioning via ReST using the provisioning.sh script

Syslog load test

Load is increased by sending various Syslog messages in batches to the Minion.

[!NOTE] The measurements are taken from a 15 second sampling window.

Batch	volume	Traps total per batch	Load Duration	Processing Duration
01	~100 syslog messages per second	90.000	~15m to Minion	~15m
02	~500 syslogs messages per second	450.000	~15m to Minion	~40m
03	~1000 syslogs messages per second	450.000	~15m to Minion	~55m
04	~9000 syslog messages per second	3.600.000	~15m to Core	terminated after 45 m

OpenNMS Internal Metrics

System Metrics

Syslog load testing - opennms-forge/opennms-benchmark GitHub Wiki

🎯 Goals

🧟 Non-Goals

Experiment 03

Configuration

Goal

Compute and Storage

Configuration

OpenNMS Core Services

Java Virtual Machine Settings

PostgreSQL 15

Configuration files

Requisition

Syslog load test

OpenNMS Internal Metrics

System Metrics

Database

OpenNMS Core

Kafka

OpenNMS Minion

Syslog load generation