Hit Rate - bitfaster/BitFaster.Caching GitHub Wiki

Below we compare LRU, MemoryCache, ConcurrentLru (TU-Q) and ConcurrentLfu (W-TinyLfu) hit rate using several well-known traces from the literature. The program used to conduct the tests is here.

This replicates the test suite used in Java's popular Caffeine library.

Wikibench

The result of replaying 45 million web requests from the September 2007 Wikibench dataset is shown below [4]. The trace files used are 1190153705, 1190157306, 1190160907, 1190164508, 1190168109.

Glimpse

Glimpse is a text information retrieval utility trace. The total size of text files used as input is roughly 50MB[1,3]. The trace is mirrored here: gli.trace.gz.

The results from the glimpse trace are shown below. Glimpse uses looping access pattern.

ARC Database

The ARC database trace is available from Dharmendra S. Modha's IBM page and is mirrored here: DS1.lis.gz.

Results for the OLTP trace are show below. The OLTP trace was described as a database server running at a commercial site running an ERP application on top of a commercial database[2].

ARC Search (S3)

The ARC search traces are available from Dharmendra S. Modha's IBM page and are mirrored here: S1.lis.gz, S2.lis.gz, S3.lis.gz.

Results for the S3 trace are shown below, the search traces are described as disk read accesses initiated by a large commercial search engine in response to various web search requests[2].

ARC OLTP

The ARC OLTP trace is available from Dharmendra S. Modha's IBM page and is mirrored here: OLTP.lis.gz.

Results for the OLTP trace are show below. The OLTP trace was described as a one hour page reference trace to a CODASYL database[2,5].

Scan resistance

A scan is a sequence of one-time access requests. Scans can cause cache pollution, evicting popular cache entries by replacing them with less popular items and thereby reducing hit rate. A cache is scan-resistant when it allows one-time sequential requests to pass through without polluting the cache.

The charts below are based on a sequential scan through all keys interleaved with keys from a Zipfian distribution, with parameter s = 0.5 and s = 0.86 respectively. If there are N items, the probability of accessing an item numbered i or less is (i / N)^s. This replicates a similar test found in [5].

Here N = 50,000, and we take 1 million sample keys. This test was repeated with the cache configured to different sizes expressed as a percentage N (e.g. 10% would be a cache with a capacity 5000).

References

[1] The LRU-K Page Replacement Algorithm For Database Disk Buffering.
[2] ARC: A Self-Tuning, Low Overhead Replacement Cache.
[3] LIRS: An Efficient Low Inter-reference Recency Set Replacement Policy to Improve Buffer Cache Performance
[4] TinyLFU: A Highly Efficient Cache Admission Policy
[5] 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm