Stats Collector - learn-tibco-cep/tutorials GitHub Wiki
TIBCO BusinessEvents® provides a few ways to collect application performance stats:
- Performance Profiler as described in Developers Guide. The profiler can be turned on/off by using JMX client (e.g., VisualVM), TIBCO Hawk, or BE functions.
- In-memory Performance Statistics as described in appendix of Architects Guide. The statistics are collected in JMX MBeans and thus can be viewed by using JMX client such as
VisualVM
. - OpenTelemetry Data as described in
Architects Guide
. Telemetry can be configured in a CDD file, and the data can be collected and viewed by using tools such as Jaeger or Premetheus, etc.
We'll discuss these features in separate tutorials.
The Async Service tutorial includes a utility for aggregating and printing out performance statistics of detailed process steps. This utility is implemented by using standard BE catalog functions, and included in the project folder Stats.
Stats Collector Functions
The stats utility can be called by using the following interface functions:
Name | Description |
---|---|
initializeStats | Configure constants, and create empty stats collection |
addStat | Add a value, e.g., elapsed time, to a specified stat name |
resetStats | Reset stats with specified min data count |
getStatCount | Return the count of values in a named stat |
printStats | Print out all stats in the collection |
The function resetStats
can be used to collect time-dependent stats that varies over time. It is optionally called by the function printStats
to recalculate average stat values after stats are printed out. The function resetStats
depends on an argument minCount
. If minCount
is greater than 0, it resets only the data values with data count more than minCount
, but it does not reset the overall event rate. It is useful in case, e.g., when the system performance is worse after a reboot. By resetting the counts periodically, you can print out the time-dependent trend of the collected values. On the other hand, a call to resetStats
with argument minCount = 0
will reset not only all stat values, but also all data counts, which will be used for pre-tests and idle-reset which are described as follows.
Sometimes, performance tests must not count the data collected immediately after a system reboot due to initial slow performance. The Stats
utility can be used to ignore such pre-test
data. Refer the preprocessor function onClientRequest as an example for complete reset of stats after a specified number pre-test
requests.
We often run multiple sets of performance tests consecutively while the system is running. To collect performance stats of individual set of tests, the Stats
utility supports automatic reset after a specified system idle time. For example, this tutorial is configured to reset stats after 1-minute idle time. In other words, if you run 2 performance tests with more than 1-minute idle time in between, the stats will be completely reset before the second test. Refer the the preprocessor function onClientRequest as an example for how such automatic idle-reset is implemented.
Examples
In this tutorial, the Async Service
handler uses this utility to collect stats on elapsed time of client and service requests, as well as number of service responses per client request.
First, initializeStats
is called at engine startup as configured in the Demo.cdd
.
Then, at the end of life of each handler, it calls updateStats function to perform the following tasks:
- Use
addStat
to collect performance stats of the handler; - Use
printStats
to print out all stats periodically based on the value counts.
Besides, a few more stats for steps in the preprocessor functions, such as acquiring locks or loading data from cache, are collected in the preprocessors onClientRequest and onServiceResponse.
Following is a sample output of the stats collector:
[Complete-HandleElapsed] reset: 61029, elapsed: 134556 ms, rate: 457.274/s, count: 500, avg: 73.314, max: 236.000, min: 3.000
[Complete-ExpectedResponses] reset: 61027, elapsed: 134556 ms, rate: 457.259/s, count: 500, avg: 1.988, max: 3.000, min: 1.000
[SendServiceRequest] reset: 61145, elapsed: 134556 ms, rate: 458.159/s, count: 503, avg: 0.266, max: 43.000, min: 0.000
[ClientRequestDelay] reset: 61146, elapsed: 134556 ms, rate: 458.166/s, count: 503, avg: 2.054, max: 36.000, min: 0.000
[ServiceElapsed] reset: 122026, elapsed: 134556 ms, rate: 914.266/s, count: 994, avg: 35.925, max: 144.000, min: 0.000
[ServiceEnd2End] reset: 122026, elapsed: 134556 ms, rate: 914.266/s, count: 994, avg: 54.985, max: 236.000, min: 3.000
[AcquireHandlerLock] reset: 122211, elapsed: 134556 ms, rate: 915.619/s, count: 991, avg: 1.356, max: 84.000, min: 0.000
[RetrieveLockedHandler] reset: 122223, elapsed: 134556 ms, rate: 915.715/s, count: 992, avg: 2.288, max: 54.000, min: 0.000
[ServiceResponseDelay] reset: 122225, elapsed: 134556 ms, rate: 915.715/s, count: 990, avg: 2.890, max: 45.000, min: 0.000
This sample output includes 9 stats. For example, the stat name Complete-HandleElapsed
and Complete-ExpectedResponses
present data from the last 500 of 61029 completed client requests, the handler life spanned 73.314 ms
on average, and each handler received average of 1.988
service responses. The average request injection rate was 457.274/s.
The stat name ServiceElapsed
is the elapsed time of of the Mock Service
for each service response, which is 35.925 ms
on average. In other words, the handler waited a total of 72 ms
for 2 responses randomly delayed by the Mock
, which is the majority of the handler's lifetime.
Tips
Following practical tips may be helpful in BE applications.
Scorecard
to hold constants
Use The BE Scorecard is an in-memory data structure that can contain mutable but less volatile set of data. The stats collector uses a Scorecard Config to hold constant literals used by multiple functions, which avoids the bad coding style of replicating same literals in multiple places.
Concurrent Map
to create thread-safe in-memory data store
Use BE supports concurrent hash map in a set of standard catalog functions in Collections.Map.Concurrent
. Volatile data structure, such as performance stats, must be manipulated with strict thread-safety, which is why the stats collector is implemented on top of concurrent hash maps.
Use stats collector to print out application performance stats
The stats collector in this tutorial is a general-purpose utility, and it can be used by any BE application without change. It is often more convenient to print out stats collected by this utility when a process involves asynchronous steps such as the use-case of this tutorial, because other approaches for statistics collection usually will require additional manual computations.