Effect of removing std::uncaught_exceptions calls on OL obsoleting critical section enter and exit - laurynas-biveinis/unodb GitHub Wiki
The initial OLC ART implementation introduced unique_write_lock_obsoleting_guard
class that took a write-locked lock in constructor and in destructor decided whether to write unlock and obsolete on normal path or just write unlock on error path by comparing std::uncaught_exceptions
results in constructor and destructor. This did not work out due to guard lifetime extending beyond their variable scope due to copy elision (?) and I had to introduce explicit commit
and abort
calls, removing the need for decision in destructor, thus making std::uncaught_exceptions
calls redundant.
Removing them has beneficial performance effect. Node4 microbenchmarks: a ~2% slowdown (large Node4 tree random gets) to a ~7% (sequential insert to full Node4) speedup. Node16: a ~5% slowdown (shrinking a large Node48 tree to Node16 tree randomly) to a ~4% speed up (adding to a Node16 tree sequentially). Node48: a ~0% (growing a small Node16 tree to Node48 randomly) to a 5% speed up (shrinking a Node256 tree to Node48). Node256: 3% slowdown (growing a small Node48 tree to Node256 sequentially) to a 9% speed up (full Node256 tree random deletes).
perf stat baseline on Node256 deletes:
2021-04-22T05:37:48+02:00
Running ../../../unodb/build.release/benchmark/micro_benchmark_node256
Run on (8 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 0.28, 0.60, 0.40
--------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
full_node256_tree_sequential_delete<unodb::olc_db>/192 22.6 us 22.6 us 31015 items_per_second=6.37658M/s size=26.1484k
full_node256_tree_sequential_delete<unodb::olc_db>/512 66.3 us 66.2 us 10580 items_per_second=6.24942M/s size=68.2344k
full_node256_tree_sequential_delete<unodb::olc_db>/4096 522 us 522 us 1342 items_per_second=6.34678M/s size=545.156k
full_node256_tree_sequential_delete<unodb::olc_db>/32768 1602 us 1601 us 437 items_per_second=6.33357M/s size=4.25503M
full_node256_tree_sequential_delete<unodb::olc_db>/196608 5187 us 5185 us 135 items_per_second=5.86828M/s size=25.5237M
full_node256_tree_random_delete<unodb::olc_db>/192 21.6 us 21.6 us 32464 items_per_second=6.67622M/s size=26.1484k
full_node256_tree_random_delete<unodb::olc_db>/512 64.4 us 64.3 us 10876 items_per_second=6.43436M/s size=68.2344k
full_node256_tree_random_delete<unodb::olc_db>/4096 538 us 538 us 1300 items_per_second=6.1522M/s size=545.156k
full_node256_tree_random_delete<unodb::olc_db>/32768 1782 us 1782 us 393 items_per_second=5.69098M/s size=4.25503M
full_node256_tree_random_delete<unodb::olc_db>/196608 8338 us 8336 us 85 items_per_second=3.65034M/s size=25.5237M
Performance counter stats for '../../../unodb/build.release/benchmark/micro_benchmark_node256 --benchmark_filter=delete<unodb::olc_db>':
32,194.83 msec task-clock # 0.999 CPUs utilized
39 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
1,601,915 page-faults # 0.050 M/sec
122,408,024,779 cycles # 3.802 GHz (83.33%)
43,148,309,518 stalled-cycles-frontend # 35.25% frontend cycles idle (83.33%)
20,235,238,905 stalled-cycles-backend # 16.53% backend cycles idle (66.68%)
213,171,307,574 instructions # 1.74 insn per cycle
# 0.20 stalled cycles per insn (83.34%)
41,002,130,533 branches # 1273.562 M/sec (83.34%)
68,024,882 branch-misses # 0.17% of all branches (83.33%)
32.213041437 seconds time elapsed
30.499034000 seconds user
1.696168000 seconds sys
With the patch:
2021-04-22T05:38:57+02:00
Running ./micro_benchmark_node256
Run on (8 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 0.33, 0.57, 0.40
--------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
full_node256_tree_sequential_delete<unodb::olc_db>/192 22.0 us 21.9 us 31895 items_per_second=6.56166M/s size=26.1484k
full_node256_tree_sequential_delete<unodb::olc_db>/512 64.9 us 64.9 us 10784 items_per_second=6.37562M/s size=68.2344k
full_node256_tree_sequential_delete<unodb::olc_db>/4096 513 us 513 us 1367 items_per_second=6.45874M/s size=545.156k
full_node256_tree_sequential_delete<unodb::olc_db>/32768 1567 us 1567 us 446 items_per_second=6.47147M/s size=4.25503M
full_node256_tree_sequential_delete<unodb::olc_db>/196608 5119 us 5117 us 137 items_per_second=5.9461M/s size=25.5237M
full_node256_tree_random_delete<unodb::olc_db>/192 20.9 us 20.8 us 33565 items_per_second=6.91298M/s size=26.1484k
full_node256_tree_random_delete<unodb::olc_db>/512 62.7 us 62.7 us 11173 items_per_second=6.60749M/s size=68.2344k
full_node256_tree_random_delete<unodb::olc_db>/4096 525 us 525 us 1332 items_per_second=6.30611M/s size=545.156k
full_node256_tree_random_delete<unodb::olc_db>/32768 1719 us 1719 us 406 items_per_second=5.89982M/s size=4.25503M
full_node256_tree_random_delete<unodb::olc_db>/196608 8189 us 8187 us 86 items_per_second=3.71681M/s size=25.5237M
Performance counter stats for './micro_benchmark_node256 --benchmark_filter=delete<unodb::olc_db>':
32,104.41 msec task-clock # 0.999 CPUs utilized
72 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
1,612,127 page-faults # 0.050 M/sec
122,057,067,207 cycles # 3.802 GHz (83.33%)
42,951,725,654 stalled-cycles-frontend # 35.19% frontend cycles idle (83.33%)
20,899,238,822 stalled-cycles-backend # 17.12% backend cycles idle (66.67%)
217,487,587,366 instructions # 1.78 insn per cycle
# 0.20 stalled cycles per insn (83.34%)
41,557,226,535 branches # 1294.440 M/sec (83.34%)
189,621,989 branch-misses # 0.46% of all branches (83.33%)
32.123093484 seconds time elapsed
30.356644000 seconds user
1.748267000 seconds sys