RocksDB Configuration - derekm/pravega GitHub Wiki

Introduction

This page is intended to help users to configure RocksDB in Pravega.

In Pravega, RocksDB is used as a cache implementation in the Segment Store. The Segment Store cache allows events being written to be immediately available in the cache to serve tail reads. As one can easily infer, allowing RocksDB to use more memory space may provide higher operation performance, but at the cost of consuming more memory resources in Segment Store nodes. For this reason, depending on your platform and workload, it may be necessary to appropriately configuring RocksDB to fit your needs.

We tried to simplify the number of RocksDB parameters to be exposed in Pravega configuration. In any case, we recommend you to get familiar with the basic concepts of RocksDB memory management in order to take informed decisions on the parameters to change.

Empirical Results

In what follows, we present a battery of empirical results of the observed memory usage of the Segment Store process (which runs RocksDB). The idea is to extend this section with experiments using different platforms and workloads.

Jarvis Platform

Platform: The next battery of experiments has been executed in a virtualized platform. The cluster contains 4 nodes (CentOS 7 VMs) that run the different Pravega services in containers (Mesos): 1 Master node and 3 slaves (running 1 Pravega Controller instance, 1 Segment Store server and other services, such as Bookkeeper and Zookeeper). In particular, we set a memory limit of 6GB in the container running the Segment Store that will be enforced by cgroups (we will see an Out Of Memory error if the Segment Store exceeds the limit). Note that having a single Segment Store server makes it easier to induce stress and analyze its memory consumption.

Workload: We executed a continuous moderate read/write workload against Pravega (mediumScale). In total, there were 8 readers and 4 writers exhibiting a total throughput of 6MBps (3MBps write/3MBps read). The workload was longevity-oriented, so all the experiments results were obtained after at least 48 hours of execution.

Results: We first provide results for the default configuration. The rest of experiments only change the specified parameter w.r.t. the default configuration.

writeBufferSizeMB=64, readCacheSizeMB=8, cacheBlockSizeKB=32 (default configuration): The segment store process reaches memory stability at 2GB (approx.)
writeBufferSizeMB=256: The segment store process reaches memory stability around 3.5GB (approx.)
readCacheSizeMB=64: The segment store process reaches memory stability at 3.2GB (approx.)
readCacheSizeMB=256: Container out of memory error after 3h.
readCacheSizeMB=128, writeBufferSizeMB=128: The segment store process reaches memory stability at 4.9GB (approx.)
cacheBlockSizeKB=4: The segment store process reaches memory stability at 2GB (approx.)
cacheBlockSizeKB=16: The segment store process reaches memory stability at 2GB (approx.)

Observations: In this platform, it seems that an increase of writeBufferSizeMB and (especially) readCacheSizeMB has a higher memory footprint than expected. Moreover, given the amount of items stored in the cache, cacheBlockSizeKB seems to have a minor role regarding memory footprint.