Sampling - euspectre/kedr GitHub Wiki
Event Sampling
The amount of data collected and output by KernelStrider can be rather high and the data are produced at a significant rate (e.g. as high as 10-100Mb per second or more in some cases). The output subsystem of KernelStrider may not always cope well with that and may lose events as a result, making the collected data unusable.
To alleviate this problem, KernelStrider supports event sampling, similar to what ThreadSanitizer uses.
The idea is that many events to be processed are often from the fragments of the analyzed kernel modules that are executed again and again. Sampling means skipping some of these repetitive events, in the hopes that this will not result in too much missed races.
Note that only memory access events are affected by sampling. Synchronization events, function call events, etc., are always output.
You can set sampling rate when starting KernelStrider, specify it in --sampling_rate
parameter for kedr.py, for example:
kedr.py start --tools=KernelStrider --targets=my_driver --sampling_rate=20
Sampling rate is an integer, 0 - 31. 0 (default) means that sampling is disabled and all events will be recorded and output, 1 - minimal sampling, 31 - the most aggressive sampling.
Note that if you load kernel-mode components of KernelStrider manually rather than with kedr.py, you can set sampling rate with sampling_rate
parameter of kedr_mem_core.ko module.
Experimental results
Kernel module under analysis: e1000
OS: ROSA Fresh R2, 32-bit
Use case: http://speedof.me network speed test, ran it once for each sampling rate listed below.
Sampling rate | Trace size, Mb | Events passed to TSan | Races found |
---|---|---|---|
0 (disabled) | 81.7 | 19444327 | 39 |
1 | 87.7 | 20711761 | 39 |
5 | 73.6 | 17291371 | 42 |
10 | 162.5 | 38902371 | 39 |
15 | 65.1 | 15317664 | 44 |
20 | 39.6 | 8022815 | 46 |
25 | 11.8 | 3608802 | 27 |
30 | 14.4 | 6184400 | 25 |
31 | 7.5 | 3214116 | 30 |
The load on the driver was apparently different each time to some extent. This was probably the cause of the trace size increase with sampling rate 10, etc.
The trend is still visible though. The reduction of trace size (compared to the size with no sampling) becomes significant starting from sampling rate 15. On the other hand, the missed races appear around sampling rate 25.
So, sampling rate of 15-20 looks reasonable here: 20-50% less trace size, 20-60% less events for ThreadSanitizer to process without too many lost races.
Perhaps, the reasonable sampling rates are indeed somewhere between 15 and 25 as the developers of ThreadSanitizer suggest.