WiredTiger Performance Benchmarks - wiredtiger/rocksdb GitHub Wiki

Introduction

Running a scaled down version of the RocksDB benchmarks from here: https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks

The benchmark is based on the original LevelDB benchmark, but the operation counts are extended significantly.

Running with 50 million operations, instead of a billion used in the original RocksDB version (I am not patient, and my hardware is more constrained). I'm also using snappy compression instead of zlib.

Running on an AWS c3.8xlarge instance:

  • 32 cores of Xeon E5-2680 CPU
  • 60GB RAM
  • 2x320GB SSD drives

General settings:

The benchmarks were run with:

  • 50 and 200 million operations (unless otherwise stated)
  • 1 GB cache size (RocksDB has several other caches as well)
  • 800 byte records (quite large)
  • 16 byte keys
  • WiredTiger develop branch at Git commit: 6881a66651755ed0a46560fee9e49fd886d82edb
  • RocksDB master branch at Git commit: 930cb0b9ee12c18eb461ef78748ed5b9bcf80d98
  • WiredTiger RocksDB fork on wiredtiger branch at Git commit: 5a064e377f983e9b3335b4401fc848b0b23730b1

Fill Sequential

50 million operations

Measurement WiredTiger RocksDB
Ops/sec 386842 228285
micros/op 2.73 4.38
Throughput (MB/s) 285 177
99% Latency 3.99 6.82
99.99% Latency 38.67 30.98
Max Latency 162095 213925
DB size (GB) 29 40

200 million operations

Measurement WiredTiger RocksDB
Ops/sec 349930 226305
micros/op 2.86 4.41
Throughput (MB/s) 272 176
99% Latency 4.35 6.75
99.99% Latency 39.4 34
Max Latency 392138 214074
DB size (GB) 115 159

Notes:

  • This test artificially loads all the items into the RocksDB database at the final level. Thus skipping all of the "normal" LSM merge overhead. WiredTiger is loading into the LSM tree and doing background merges.
  • The test is also single threaded. WiredTiger should be able to increase the throughput with multiple threads, if there is I/O capacity.

Fill Random

50 million operations

Measurement WiredTiger RocksDB
Ops/sec 296399 229644
micros/op 3.37 4.355
Throughput (MB/s) 230 178
99% Latency 10.6 6.77
99.99% Latency 88 19.74
Max Latency 34444 213805
DB size (GB) 18 25

200 million operations

Measurement WiredTiger RocksDB
Ops/sec 295439 227928
micros/op 3.38 4.38
Throughput (MB/s) 229 177
99% Latency 10.6 6.5
99.99% Latency 96 22
Max Latency 204659 214027
DB size (GB) 115 100

Notes:

  • The database size referenced from a fill random is measured after allowing a compact operation to finish.
  • The database size when populated with fill random is smaller than with fill sequential, because not all keys in the range are inserted (and some keys are inserted multiple times). For this reason benchmarks that follow a populate phase should be based on a database generated via fill sequential, followed by an overwrite (the overwrite avoids any tricks re: loading sequentially).

Overwrite

50 million operations

Measurement WiredTiger RocksDB
Ops/sec 301910 41720
micros/op 3.31 23.97
Throughput (MB/s) 234 32
99% Latency 10.6 1047
99.99% Latency 88 1199
Max Latency 32596 3208136
DB size (GB) 47 51

200 million operations

Measurement WiredTiger RocksDB
Ops/sec 292151 17954
micros/op 3.42 55.7
Throughput (MB/s) 227 14
99% Latency 10.7 1142
99.99% Latency 108 1199
Max Latency 1294245 6528779
DB size (GB) 221 180

Notes:

  • This is a more realistic test than the fill random and fill sequential tests, it equates to loading into an LSM tree that already contains some data.

Read Random

50 million items in DB. 32 threads, 500k ops each thread

Measurement WiredTiger RocksDB
Ops/sec 151604 450157
micros/op 6.59 2.22
99% Latency 1302 190
99.99% Latency 2852 1962
Max Latency 19560 19970
DB size (GB) 44 40

200 million items in DB. 32 threads, 500k ops each thread

Measurement WiredTiger RocksDB
Ops/sec 69864 21365
micros/op 14.31 46.8
99% Latency 3023 3241
99.99% Latency 18404 19930
Max Latency 4960972 251701
DB size (GB) 196 162

Notes:

  • In the smaller test WiredTiger is slower than RocksDB, because it is operating on a database with multiple levels, thus reads often require more lookups to satisfy. The performance improves over time as the data set is read into file system cache - thus running the test for longer improves the WiredTiger numbers.
  • The RocksDB version of this test is based on a LSM tree generated with a sequential load. Thus all data has been loaded into a single level of the LSM tree - and each search should only ever need a single lookup (plus a bloom filter lookup). WiredTiger is searching in a tree that has been allowed to settle, but still contains several levels (chunks).
  • WiredTiger matches the RocksDB configuration of disabling mmap support. If mmap support is not disabled, WiredTiger is able to make much more effective use of the OS filesystem cache, especially when the working set is smaller than available memory.
  • The readrandom test produces larger databases because compression is disabled

Scripts

The scripts used to generate the results follow.

WiredTiger

Fill Sequential:

#!/bin/bash

# RocksDB uses block_size 65536, WiredTiger 4096
# RocksDB cache size 10485760, WiredTiger 1GB (RocksDB has other caches)

RESULT_DIR=results
echo "Load 1B keys sequentially into database....."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=50000000; t=1; vs=800; bs=4096; cs=1000000000; of=500000; si=1000000; ./db_bench_wiredtiger --benchmarks=fillseq --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --disable_data_sync=$dds --target_file_size_base=$mb --stats_per_interval=1 --use_existing_db=0

du -s -k $RESULT_DIR

Fill Random:

#!/bin/bash

# NOTES: RocksDB disables compactions during fillrandom.
# RocksDB uses target_file_size_base 1073741824, WiredTiger 67108864 
# RocksDB uses block_size 65536, WiredTiger 4096
# RocksDB cache size 10485760, WiredTiger 1GB (RocksDB has other caches)

RESULT_DIR=results
echo "Bulk load database ...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=2;ctrig=10000000; delay=10000000; stop=10000000; wbn=30; mbc=20; mb=67108864;wbs=268435456; dds=1; sync=0; r=50000000; t=1; vs=800; bs=4096; cs=1000000000; of=500000; si=1000000; ./db_bench_wiredtiger --benchmarks=fillrandom --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --cache_size=$cs --bloom_bits=10 --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --disable_data_sync=$dds --target_file_size_base=$mb --stats_per_interval=1 --use_existing_db=0
echo "Running manual compaction...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=2;ctrig=10000000; delay=10000000; stop=10000000; wbn=30; mbc=20; mb=67108864;wbs=268435456; dds=1; sync=0; r=50000000; t=1; vs=800; bs=4096; cs=1000000000; of=500000; si=1000000; ./db_bench_wiredtiger --benchmarks=compact --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --disable_data_sync=$dds --target_file_size_base=$mb --stats_per_interval=1 --use_existing_db=1
du -s -k $RESULT_DIR

Overwrite:

#!/bin/bash

# RocksDB uses block_size 65536, WiredTiger 8192
# RocksDB cache size 10485760, WiredTiger 1GB (RocksDB has other caches)

RESULT_DIR=results
echo "Overwriting the 1B keys in database in random order...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=50000000; t=1; vs=800; bs=4096; cs=1000000000; of=500000; si=1000000; ./db_bench_wiredtiger --benchmarks=overwrite --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --disable_data_sync=$dds --target_file_size_base=$mb --stats_per_interval=1 --use_existing_db=1

du -s -k $RESULT_DIR

Read Random:

#!/bin/bash

# RocksDB uses block_size 65536, WiredTiger 4096
# RocksDB cache size 10485760, WiredTiger 1GB (RocksDB has other caches)

RESULT_DIR=results
echo "Load 1B keys sequentially into database....."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=1; sync=0; r=50000000; t=1; vs=800; bs=4096; cs=1000000000; of=500000; si=1000000; ./db_bench_wiredtiger --benchmarks=fillseq --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=none --stats_interval=$si --disable_data_sync=$dds --target_file_size_base=$mb --stats_per_interval=1 --use_existing_db=0
echo "Allowing populated database to settle."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=1000000; t=1; vs=800; bs=4096; cs=1000000000; of=500000; si=1000000; ./db_bench_wiredtiger --benchmarks=compact --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=none --stats_interval=$si --disable_data_sync=$dds --target_file_size_base=$mb --stats_per_interval=1 --use_existing_db=1
echo "Reading 1B keys in database in random order...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=50000000; reads=500000; t=32; vs=800; bs=4096; cs=1000000000; of=500000; si=100000; ./db_bench_wiredtiger --benchmarks=readrandom --mmap_read=0 --statistics=1 --histogram=1 --num=$r --reads=$reads --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=none --stats_interval=$si --disable_data_sync=$dds --target_file_size_base=$mb --stats_per_interval=1 --use_existing_db=1

du -s -k $RESULT_DIR

RocksDB

These scripts are copied directly from the RocksDB benchmark page referenced above. The changes are operation count and cache size. The original had a cache size of 100MB configured.

Fill Sequential:

#!/bin/bash

RESULT_DIR=results
echo "Load 1B keys sequentially into database....."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=50000000; t=1; vs=800; bs=65536; cs=1000000000; of=500000; si=1000000; ./db_bench --benchmarks=fillseq --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --compression_ratio=50 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=0

du -s -k $RESULT_DIR

Fill Random:

#!/bin/bash

RESULT_DIR=results
echo "Bulk load database into L0...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=2;ctrig=10000000; delay=10000000; stop=10000000; wbn=30; mbc=20; mb=1073741824;wbs=268435456; dds=1; sync=0; r=50000000; t=1; vs=800; bs=65536; cs=1000000000; of=500000; si=1000000; ./db_bench --benchmarks=fillrandom --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --compression_ratio=50 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --memtablerep=vector --use_existing_db=0 --disable_auto_compactions=1 --source_compaction_factor=10000000
echo "Running manual compaction to do a global sort map-reduce style...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=2;ctrig=10000000; delay=10000000; stop=10000000; wbn=30; mbc=20; mb=1073741824;wbs=268435456; dds=1; sync=0; r=50000000; t=1; vs=800; bs=65536; cs=1000000000; of=500000; si=1000000; ./db_bench --benchmarks=compact --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --compression_ratio=50 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --memtablerep=vector --use_existing_db=1 --disable_auto_compactions=1 --source_compaction_factor=10000000
du -s -k $RESULT_DIR

Overwrite:

#!/bin/bash

RESULT_DIR=results
echo "Overwriting the 1B keys in database in random order...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=50000000; t=1; vs=800; bs=65536; cs=1000000000; of=500000; si=1000000; ./db_bench --benchmarks=overwrite --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --compression_ratio=50 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=1

du -s -k $RESULT_DIR

Read Random:

#!/bin/bash

RESULT_DIR=results
echo "Load 1B keys sequentially into database....."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=1; sync=0; r=50000000; t=1; vs=800; bs=4096; cs=1000000000; of=500000; si=1000000; ./db_bench --benchmarks=fillseq --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=6 --open_files=$of --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=none --stats_interval=$si --compression_ratio=50 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=0
echo "Reading 1B keys in database in random order...."
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=50000000; reads=500000; t=32; vs=800; bs=4096; cs=1000000000; of=500000; si=100000; ./db_bench --benchmarks=readrandom --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --reads=$reads --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=6 --open_files=$of --verify_checksum=1 --db=$RESULT_DIR --sync=$sync --disable_wal=1 --compression_type=none --stats_interval=$si --compression_ratio=50 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=1

du -s -k $RESULT_DIR