Effect of removing std::uncaught_exceptions calls on OL obsoleting critical section enter and exit - laurynas-biveinis/unodb GitHub Wiki

The initial OLC ART implementation introduced unique_write_lock_obsoleting_guard class that took a write-locked lock in constructor and in destructor decided whether to write unlock and obsolete on normal path or just write unlock on error path by comparing std::uncaught_exceptions results in constructor and destructor. This did not work out due to guard lifetime extending beyond their variable scope due to copy elision (?) and I had to introduce explicit commit and abort calls, removing the need for decision in destructor, thus making std::uncaught_exceptions calls redundant.

Removing them has beneficial performance effect. Node4 microbenchmarks: a ~2% slowdown (large Node4 tree random gets) to a ~7% (sequential insert to full Node4) speedup. Node16: a ~5% slowdown (shrinking a large Node48 tree to Node16 tree randomly) to a ~4% speed up (adding to a Node16 tree sequentially). Node48: a ~0% (growing a small Node16 tree to Node48 randomly) to a 5% speed up (shrinking a Node256 tree to Node48). Node256: 3% slowdown (growing a small Node48 tree to Node256 sequentially) to a 9% speed up (full Node256 tree random deletes).

perf stat baseline on Node256 deletes:

2021-04-22T05:37:48+02:00
Running ../../../unodb/build.release/benchmark/micro_benchmark_node256
Run on (8 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 0.28, 0.60, 0.40
--------------------------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
full_node256_tree_sequential_delete<unodb::olc_db>/192          22.6 us         22.6 us        31015 items_per_second=6.37658M/s size=26.1484k
full_node256_tree_sequential_delete<unodb::olc_db>/512          66.3 us         66.2 us        10580 items_per_second=6.24942M/s size=68.2344k
full_node256_tree_sequential_delete<unodb::olc_db>/4096          522 us          522 us         1342 items_per_second=6.34678M/s size=545.156k
full_node256_tree_sequential_delete<unodb::olc_db>/32768        1602 us         1601 us          437 items_per_second=6.33357M/s size=4.25503M
full_node256_tree_sequential_delete<unodb::olc_db>/196608       5187 us         5185 us          135 items_per_second=5.86828M/s size=25.5237M
full_node256_tree_random_delete<unodb::olc_db>/192              21.6 us         21.6 us        32464 items_per_second=6.67622M/s size=26.1484k
full_node256_tree_random_delete<unodb::olc_db>/512              64.4 us         64.3 us        10876 items_per_second=6.43436M/s size=68.2344k
full_node256_tree_random_delete<unodb::olc_db>/4096              538 us          538 us         1300 items_per_second=6.1522M/s size=545.156k
full_node256_tree_random_delete<unodb::olc_db>/32768            1782 us         1782 us          393 items_per_second=5.69098M/s size=4.25503M
full_node256_tree_random_delete<unodb::olc_db>/196608           8338 us         8336 us           85 items_per_second=3.65034M/s size=25.5237M

 Performance counter stats for '../../../unodb/build.release/benchmark/micro_benchmark_node256 --benchmark_filter=delete<unodb::olc_db>':

         32,194.83 msec task-clock                #    0.999 CPUs utilized          
                39      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
         1,601,915      page-faults               #    0.050 M/sec                  
   122,408,024,779      cycles                    #    3.802 GHz                      (83.33%)
    43,148,309,518      stalled-cycles-frontend   #   35.25% frontend cycles idle     (83.33%)
    20,235,238,905      stalled-cycles-backend    #   16.53% backend cycles idle      (66.68%)
   213,171,307,574      instructions              #    1.74  insn per cycle         
                                                  #    0.20  stalled cycles per insn  (83.34%)
    41,002,130,533      branches                  # 1273.562 M/sec                    (83.34%)
        68,024,882      branch-misses             #    0.17% of all branches          (83.33%)

      32.213041437 seconds time elapsed

      30.499034000 seconds user
       1.696168000 seconds sys

With the patch:

2021-04-22T05:38:57+02:00
Running ./micro_benchmark_node256
Run on (8 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 0.33, 0.57, 0.40
--------------------------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
full_node256_tree_sequential_delete<unodb::olc_db>/192          22.0 us         21.9 us        31895 items_per_second=6.56166M/s size=26.1484k
full_node256_tree_sequential_delete<unodb::olc_db>/512          64.9 us         64.9 us        10784 items_per_second=6.37562M/s size=68.2344k
full_node256_tree_sequential_delete<unodb::olc_db>/4096          513 us          513 us         1367 items_per_second=6.45874M/s size=545.156k
full_node256_tree_sequential_delete<unodb::olc_db>/32768        1567 us         1567 us          446 items_per_second=6.47147M/s size=4.25503M
full_node256_tree_sequential_delete<unodb::olc_db>/196608       5119 us         5117 us          137 items_per_second=5.9461M/s size=25.5237M
full_node256_tree_random_delete<unodb::olc_db>/192              20.9 us         20.8 us        33565 items_per_second=6.91298M/s size=26.1484k
full_node256_tree_random_delete<unodb::olc_db>/512              62.7 us         62.7 us        11173 items_per_second=6.60749M/s size=68.2344k
full_node256_tree_random_delete<unodb::olc_db>/4096              525 us          525 us         1332 items_per_second=6.30611M/s size=545.156k
full_node256_tree_random_delete<unodb::olc_db>/32768            1719 us         1719 us          406 items_per_second=5.89982M/s size=4.25503M
full_node256_tree_random_delete<unodb::olc_db>/196608           8189 us         8187 us           86 items_per_second=3.71681M/s size=25.5237M

 Performance counter stats for './micro_benchmark_node256 --benchmark_filter=delete<unodb::olc_db>':

         32,104.41 msec task-clock                #    0.999 CPUs utilized          
                72      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
         1,612,127      page-faults               #    0.050 M/sec                  
   122,057,067,207      cycles                    #    3.802 GHz                      (83.33%)
    42,951,725,654      stalled-cycles-frontend   #   35.19% frontend cycles idle     (83.33%)
    20,899,238,822      stalled-cycles-backend    #   17.12% backend cycles idle      (66.67%)
   217,487,587,366      instructions              #    1.78  insn per cycle         
                                                  #    0.20  stalled cycles per insn  (83.34%)
    41,557,226,535      branches                  # 1294.440 M/sec                    (83.34%)
       189,621,989      branch-misses             #    0.46% of all branches          (83.33%)

      32.123093484 seconds time elapsed

      30.356644000 seconds user
       1.748267000 seconds sys