Stats Collector - learn-tibco-cep/tutorials GitHub Wiki

TIBCO BusinessEvents® provides a few ways to collect application performance stats:

  • Performance Profiler as described in Developers Guide. The profiler can be turned on/off by using JMX client (e.g., VisualVM), TIBCO Hawk, or BE functions.
  • In-memory Performance Statistics as described in appendix of Architects Guide. The statistics are collected in JMX MBeans and thus can be viewed by using JMX client such as VisualVM.
  • OpenTelemetry Data as described in Architects Guide. Telemetry can be configured in a CDD file, and the data can be collected and viewed by using tools such as Jaeger or Premetheus, etc.

We'll discuss these features in separate tutorials.

The Async Service tutorial includes a utility for aggregating and printing out performance statistics of detailed process steps. This utility is implemented by using standard BE catalog functions, and included in the project folder Stats.

Stats Collector Functions

The stats utility can be called by using the following interface functions:

Name Description
initializeStats Configure constants, and create empty stats collection
addStat Add a value, e.g., elapsed time, to a specified stat name
resetStats Reset stats with specified min data count
getStatCount Return the count of values in a named stat
printStats Print out all stats in the collection

The function resetStats can be used to collect time-dependent stats that varies over time. It is optionally called by the function printStats to recalculate average stat values after stats are printed out. The function resetStats depends on an argument minCount. If minCount is greater than 0, it resets only the data values with data count more than minCount, but it does not reset the overall event rate. It is useful in case, e.g., when the system performance is worse after a reboot. By resetting the counts periodically, you can print out the time-dependent trend of the collected values. On the other hand, a call to resetStats with argument minCount = 0 will reset not only all stat values, but also all data counts, which will be used for pre-tests and idle-reset which are described as follows.

Sometimes, performance tests must not count the data collected immediately after a system reboot due to initial slow performance. The Stats utility can be used to ignore such pre-test data. Refer the preprocessor function onClientRequest as an example for complete reset of stats after a specified number pre-test requests.

We often run multiple sets of performance tests consecutively while the system is running. To collect performance stats of individual set of tests, the Stats utility supports automatic reset after a specified system idle time. For example, this tutorial is configured to reset stats after 1-minute idle time. In other words, if you run 2 performance tests with more than 1-minute idle time in between, the stats will be completely reset before the second test. Refer the the preprocessor function onClientRequest as an example for how such automatic idle-reset is implemented.

Examples

In this tutorial, the Async Service handler uses this utility to collect stats on elapsed time of client and service requests, as well as number of service responses per client request.

First, initializeStats is called at engine startup as configured in the Demo.cdd.

Then, at the end of life of each handler, it calls updateStats function to perform the following tasks:

  • Use addStat to collect performance stats of the handler;
  • Use printStats to print out all stats periodically based on the value counts.

Besides, a few more stats for steps in the preprocessor functions, such as acquiring locks or loading data from cache, are collected in the preprocessors onClientRequest and onServiceResponse.

Following is a sample output of the stats collector:

[Complete-HandleElapsed] reset: 61029, elapsed: 134556 ms, rate: 457.274/s, count: 500, avg: 73.314, max: 236.000, min: 3.000
[Complete-ExpectedResponses] reset: 61027, elapsed: 134556 ms, rate: 457.259/s, count: 500, avg: 1.988, max: 3.000, min: 1.000
[SendServiceRequest] reset: 61145, elapsed: 134556 ms, rate: 458.159/s, count: 503, avg: 0.266, max: 43.000, min: 0.000
[ClientRequestDelay] reset: 61146, elapsed: 134556 ms, rate: 458.166/s, count: 503, avg: 2.054, max: 36.000, min: 0.000
[ServiceElapsed] reset: 122026, elapsed: 134556 ms, rate: 914.266/s, count: 994, avg: 35.925, max: 144.000, min: 0.000
[ServiceEnd2End] reset: 122026, elapsed: 134556 ms, rate: 914.266/s, count: 994, avg: 54.985, max: 236.000, min: 3.000
[AcquireHandlerLock] reset: 122211, elapsed: 134556 ms, rate: 915.619/s, count: 991, avg: 1.356, max: 84.000, min: 0.000
[RetrieveLockedHandler] reset: 122223, elapsed: 134556 ms, rate: 915.715/s, count: 992, avg: 2.288, max: 54.000, min: 0.000
[ServiceResponseDelay] reset: 122225, elapsed: 134556 ms, rate: 915.715/s, count: 990, avg: 2.890, max: 45.000, min: 0.000

This sample output includes 9 stats. For example, the stat name Complete-HandleElapsed and Complete-ExpectedResponses present data from the last 500 of 61029 completed client requests, the handler life spanned 73.314 ms on average, and each handler received average of 1.988 service responses. The average request injection rate was 457.274/s.

The stat name ServiceElapsed is the elapsed time of of the Mock Service for each service response, which is 35.925 ms on average. In other words, the handler waited a total of 72 ms for 2 responses randomly delayed by the Mock, which is the majority of the handler's lifetime.

Tips

Following practical tips may be helpful in BE applications.

Use Scorecard to hold constants

The BE Scorecard is an in-memory data structure that can contain mutable but less volatile set of data. The stats collector uses a Scorecard Config to hold constant literals used by multiple functions, which avoids the bad coding style of replicating same literals in multiple places.

Use Concurrent Map to create thread-safe in-memory data store

BE supports concurrent hash map in a set of standard catalog functions in Collections.Map.Concurrent. Volatile data structure, such as performance stats, must be manipulated with strict thread-safety, which is why the stats collector is implemented on top of concurrent hash maps.

Use stats collector to print out application performance stats

The stats collector in this tutorial is a general-purpose utility, and it can be used by any BE application without change. It is often more convenient to print out stats collected by this utility when a process involves asynchronous steps such as the use-case of this tutorial, because other approaches for statistics collection usually will require additional manual computations.