Syslog load testing - opennms-forge/opennms-benchmark GitHub Wiki
This scenario for this experiment is for the use case when OpenNMS is running in a larger monitoring application stack. The Minion is used as an Syslog receiver. Received Syslog messages are persisted in the OpenNMS database and forwarded to Kafka into the events topic. OpenNMS Alarmd is not configured to create or manage alarm lifecycles.
🎯 Goals
- Identify how many Syslog messages can be reliably processed without data loss
- Use standard settings for Syslogd
- Capture performance characteristics for the main parts:
- Database: CPU, Memory, Load, Network
- Core: CPU,Memory, Load, Network
- Kafka: CPU,Memory, Load, Network
- Minion: CPU,Memory, Load, Network
- SNMP Simulator: CPU,Memory, Load, Network
🧟 Non-Goals
- Variables for tuning Syslogd with queue/batch sizes, and batch interval
- Mixed workloads because of a large set of combinations, e.g. Pollerd + Collectd, Syslog + SNMP Traps, Flows and Traps
- Service outages and network latency
- Alarm lifecycle processing based on Syslog
- Kernel parameters for UDP socket tuning
Experiment 03
Tool used for load testing: https://github.com/indigo423/syslog-load-testing
Configuration
cd experiments/c1km1_4c16g_kfk_syslog
ansible-playbook -i ../../ansible-inventory.yml experiment.yml
Goal
- Collectd load behavior by increasing workload incrementally
- Identify resource constraints by increasing workload affecting, Core, PostgreSQL, Minion and Kafka
- Identify the number of Syslog messages per second the system is saturated
Compute and Storage
This benchmark lab is intended to give you a tool, that allows you to deploy 6 virtual machines.
Component | CPU | RAM | Disk | Network | Description |
---|---|---|---|---|---|
OpenNMS Core | 4 | 16GB | 30GB | 10Gbps | OpenNMS Horizon 33.1.8 |
PostgreSQL | 4 | 8GB | 30GB | 10Gbps | PostgreSQL 15 |
Apache Kafka | 4 | 8GB | 30GB | 10Gbps | Kafka 3.9.1 |
OpenNMS Minion | 4 | 8GB | 30GB | 10Gbps | Horizon Minion 33.1.8 |
Net-SNMP Simulator | 4 | 8GB | 30GB | 10Gbps | Net-SNMP Agent |
Monitoring | 4 | 8GB | 30GB | 10Gbps | Prometheus, Jaeger, Grafana |
Component | Azure SKU | CPU Type | IOPS Cap |
---|---|---|---|
OpenNMS Core | Standard_B4s_v2 | Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz | 3570 |
PostgreSQL | Standard_B4ls_v2 | Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz | 3570 |
Apache Kafka | Standard_B4ls_v2 | Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz | 3570 |
OpenNMS Minion | Standard_B4ls_v2 | Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz | 3570 |
Net-SNMP Simulator | Standard_B4ls_v2 | Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz | 3570 |
Monitoring | Standard_B4ls_v2 | Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz | 3570 |
Configuration
Component | Description |
---|---|
Syslogd | Default configuration |
Kafka Producer | Enabled with event forwarding |
Network Latency | Low latency |
OpenNMS Core Services
For this experiment the following daemons in the service-configuration.xml
where enabled:
- Manager
- TestLoadLibraries
- Eventd
- Alarmd
- Provisiond
- JettyServer
- KarafStartupMonitor
- Syslogd
Java Virtual Machine Settings
Content: opennms.conf
JAVA_HEAP_SIZE=8192
JAVA_INITIAL_HEAP_SIZE=8192
ADDITIONAL_MANAGER_OPTIONS="-XX:+UseG1GC -javaagent:/opt/prom-jmx-exporter/jmx_prometheus_javaagent.jar=9299:/opt/prom-jmx-exporter/config.yaml"
PostgreSQL 15
Content: postgresql.conf
# DB Version: 15
# OS Type: linux
# DB Type: mixed
# Total Memory (RAM): 7 GB
# CPUs num: 4
# Connections num: 100
# Data Storage: hdd
max_connections = 100
shared_buffers = 1792MB
effective_cache_size = 5376MB
maintenance_work_mem = 448MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 8822kB
huge_pages = off
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
Configuration files
Requisition
A network inventory with 10229 Linux nodes is provisioned. The Syslog messages for a load test are associated to a single node.
- Requisitions
- Provisioning via ReST using the provisioning.sh script
Syslog load test
Load is increased by sending various Syslog messages in batches to the Minion.
[!NOTE] The measurements are taken from a 15 second sampling window.
Batch | volume | Traps total per batch | Load Duration | Processing Duration |
---|---|---|---|---|
01 | ~100 syslog messages per second | 90.000 | ~15m to Minion | ~15m |
02 | ~500 syslogs messages per second | 450.000 | ~15m to Minion | ~40m |
03 | ~1000 syslogs messages per second | 450.000 | ~15m to Minion | ~55m |
04 | ~9000 syslog messages per second | 3.600.000 | ~15m to Core | terminated after 45 m |