Concurrent ART flavor overhead in single threaded workloads - laurynas-biveinis/unodb GitHub Wiki
How much overhead concurrency management adds for single-threaded workloads?
Optimistic lock coupling performs extra actions at every step. How expensive are they if the workload is single-threaded?
2021-10-27, after an optimization round
micro_benchmark_key_prefix
: 246% (unpredictable_prepend_key_prefix
) to 155% (unpredictable_get_shared_length
)micro_benchmark_n4
: 226% (full_n4_sequential_delete/100
) to 9% (n4_random_gets/65535
)micro_benchmark_n16
: 378% (full_n16_tree_full_scan/64
) to 9% (minimal_n16_tree_random_gets/16383
)micro_benchmark_n48
: 260% (full_n48_tree_random_delete/512
) to 11% (full_n48_tree_random_gets/131064
)micro_benchmark_n256
: 668% (full_n256_tree_full_scan/128
) to 11% (full_n256_tree_random_gets/131064
)
2021-03-08, after the initial OLC commit:
Comparing unodb::db to unodb::olc_db (from ./micro_benchmark_key_prefix)
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
unpredictable_get_shared_length<[unodb::db vs. unodb::olc_db]> +1.5010 +1.4950 1 2 1 2
unpredictable_leaf_key_prefix_split<[unodb::db vs. unodb::olc_db]> +1.8351 +1.8390 17 49 17 49
unpredictable_cut_key_prefix<[unodb::db vs. unodb::olc_db]> +1.8779 +1.8815 18 53 18 53
unpredictable_prepend_key_prefix<[unodb::db vs. unodb::olc_db]> +2.3361 +2.3415 19 63 19 63
What about the mutex version?
unpredictable_get_shared_length<[unodb::db vs. unodb::mutex_db]>_mean +0.3133 +0.3153 1 1 1 1
unpredictable_leaf_key_prefix_split<[unodb::db vs. unodb::mutex_db]>_mean +0.2164 +0.2167 18 22 18 22
unpredictable_cut_key_prefix<[unodb::db vs. unodb::mutex_db]>_mean +0.2127 +0.2124 19 23 19 23
unpredictable_prepend_key_prefix<[unodb::db vs. unodb::mutex_db]>_mean +0.1494 +0.1496 19 22 19 22
So, for this benchmark we have mutex version overhead of 15%-31% and OLC overhead of 150%-230%. For Node4 ops, mutex: 2%-35%, OLC: 10% (large Node4 tree full scan) to 220% (small Node4 sequential delete). For Node16 ops, mutex: 2%-80%, OLC: 13% (small node random gets) to 440% (small full tree full scan). For Node48 ops, mutex: ~0% (small full tree random gets) to 45% (small full tree full scan), OLC: 10% (large full tree random gets) to 250% (small full tree full scan). For Node256 ops, mutex: 2% (small minimal tree random gets) to 145% (small full tree full scan), OLC: 10% (large minimal tree random gets) to 700% (small full tree full scan).