Case 1: Benchmark Broadcasting of events to a file event sink - Nastel/gocypher-cybench-java GitHub Wiki

Benchmarking TNT4J API

We benchmarked TNT4J framework (which is available in https://github.com/Nastel/TNT4J). TNT4J is a logging framework for tracking and tracing applications, activities, transactions, behavior and performance via a Java API. This benchmark shows how later version of TNT4J library introduced a significant performance overhead and how a code benchmark helped us avoid shipping slow code to our end customers.

Benchmark Case

TNT4J is constantly updated by adding new functions and improving existing features. One such capability allows to broadcast events (messages) to various destinations using multiple event sinks. For this benchmark we will take function which broadcasts and stores events (messages) to multiple files on the file system. In the TNT4J version 2.13 event delivery to multiple destinations (or storing to file system) was serial, since TNT4J 2.15 this functionality was updated to support parallel event delivery. We are using CyBench to benchmark both TNT4J versions in order to see if code updates improve performance and not negatively affect JVM GC.

Benchmark Implementation

This benchmark implementation is based on JMH framework. Benchmark launch, results aggregation, storing and comparison is based on CyBench tools. Benchmark code is the same for both version of TNT4J. We created one object of class responsible for storing events (messages) and configure stream events to several files (in our case 5) in the file system and call storing methods in the benchmark method in order to measure writing throughput. So we open the stream to 5 files once during setup phase, continuously call log and write during benchmark execution, close all open resources during tear down phase.

Benchmark settings:

  • 1 JVM (Java 1.8) for benchmark execution
  • 1 Thread, 2 iterations
  • JVM heap settings -Xmx4gb and -Xms4gb
  • TNT4J 2.13 Event implementation stores message in a serial way (single threaded)
  • TNT4J 2.15 Event implementation stores message in a parallel way (multi threaded threaded)

See code snippet below (benchmark implementation):

import com.gocypher.cybench.core.annotation.BenchmarkMetaData;
import com.jkoolcloud.tnt4j.core.OpLevel;
import com.jkoolcloud.tnt4j.sink.EventSink;
import com.gocypher.cybench.core.annotation.BenchmarkTag;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Fork(1)
@Threads(1)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Warmup(iterations = 1, time = 1, timeUnit = TimeUnit.SECONDS)
public class BroadcastingEventSinkFactoryBenchmarks {

	private Map<String, Object> cfg;
	private String message = "Write text, Write text, Write text, Write text, Write text, Write text, Write text, Write text";
	private EventSink bces;

	@Setup(Level.Trial)
	public void setUp() {
		try {
			cfg = new HashMap<>();
			int sCount = 5;
			for (int i = 0; i < sCount; i++) {
				// Sink to file system
				cfg.put("EventSinkFactory." + i, "com.jkoolcloud.tnt4j.sink.impl.FileEventSinkFactory");
				cfg.put("EventSinkFactory." + i + ".FileName",
						"c:/development/tnt4j-benchmarks-2_13-ws/TNT4J/logs/activity_sink" + (i + 1) + ".log");

			}
			BroadcastingEventSinkFactory bf = new BroadcastingEventSinkFactory();
			bf.setConfiguration(cfg);
			bces = bf.getEventSink("Broadcasting test sink");
			bces.open();
		} catch (Exception e) {
			e.printStackTrace();
			System.err.println("Error on write " + e.getMessage());
		}

	}

	@Benchmark
	@BenchmarkTag(tag = "ccab7585-1836-4c99-a81a-9e074d2f824d")
	public void broadcastEvents(Blackhole bh) {
		try {
			bces.log(OpLevel.WARNING, "Log WARN text");
			bces.write(message);
		} catch (Exception e) {
			e.printStackTrace();
			System.err.println("Error on write " + e.getMessage());
		}
	}

	@TearDown(Level.Trial)
	public void tearDown() {
		try {
			bces.close();
		} catch (Exception e) {
			System.err.println("Error on write " + e.getMessage());
		}
	}
}

Benchmark Results

Now let’s compare benchmark results for both TNT4J versions, see table below:

TNT4J Version Throughput score Units Garbage Collector calls count Garbage Collector time Threads started count
2.13 12,025.25 ops/s 2 2 ms 13
2.15 1,183.30 ops/s 1184 459 ms 138,053

The benchmark results exposes 10x degradation in throughput and significant GC overhead between version 2.15 and 2.13. This performance slowdown seems to be caused by introduction of multiple threads in version 2.15. TNT4J 2.15 version is significantly slower than version 2.13 in throughput and GC.

Detailed benchmark reports (threads, GC, CPU and more) are available here: https://app.cybench.io/cybench/benchmark/view/ccab7585-1836-4c99-a81a-9e074d2f824d?tagType=benchmarkTag

Conclusion

In this case we saw how code benchmarks are able to expose critical performance bottlenecks very early in the development. We used results of this benchmark to revisit TNT4J implementation before shipping slow code and avoid significant impact on user experience.