Effect of optimistic_lock::try_read_lock testing once in the straight path - laurynas-biveinis/unodb GitHub Wiki

baseline commit, patch

All tests filtered for unodb::olc_db.

  • micro_benchmark_key_prefix: 2% slowdown (unpredictable_get_shared_length) to 2% speedup (unpredictable_cut_key_prefix)
  • micro_benchmark_n4: 2% slowdown (full_n4_random_deletes/100) to 6% speedup (full_n4_sequential_insert/32768)
  • micro_benchmark_n16: 9% slowdown (grow_n4_to_n16_randomly/20) to 2% speedup (shrink_n48_to_n16_randomly/16383)
  • micro_benchmark_n48: 0% slowdown (minimal_n48_tree_random_gets/4) to 6% speedup (grow_n16_to_n48_randomly/64)
  • micro_benchmark_n256: 3% slowdown (minimal_n256_tree_random_gets/4) to 5% speedup (n256_random_add/64)

perf stat on --benchmark_filter=full_n4_sequential_insert<unodb::olc_db>/32768 shows significant redution in branch-misses:

baseline:

          7,742.15 msec task-clock                #    0.998 CPUs utilized          
                10      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             1,622      page-faults               #    0.210 K/sec                  
    29,455,317,521      cycles                    #    3.805 GHz                      (83.33%)
    11,735,501,411      stalled-cycles-frontend   #   39.84% frontend cycles idle     (83.31%)
     4,755,144,521      stalled-cycles-backend    #   16.14% backend cycles idle      (66.64%)
    51,664,855,712      instructions              #    1.75  insn per cycle         
                                                  #    0.23  stalled cycles per insn  (83.33%)
     8,439,074,584      branches                  # 1090.017 M/sec                    (83.36%)
        44,511,313      branch-misses             #    0.53% of all branches          (83.35%)

       7.759907566 seconds time elapsed

       7.738543000 seconds user
       0.004001000 seconds sys

patch:

          7,803.40 msec task-clock                #    0.998 CPUs utilized          
                21      context-switches          #    0.003 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             1,719      page-faults               #    0.220 K/sec                  
    29,685,807,340      cycles                    #    3.804 GHz                      (83.31%)
    11,326,665,522      stalled-cycles-frontend   #   38.16% frontend cycles idle     (83.34%)
     4,449,148,459      stalled-cycles-backend    #   14.99% backend cycles idle      (66.68%)
    53,369,829,296      instructions              #    1.80  insn per cycle         
                                                  #    0.21  stalled cycles per insn  (83.34%)
     8,631,659,203      branches                  # 1106.141 M/sec                    (83.34%)
        33,357,159      branch-misses             #    0.39% of all branches          (83.33%)

       7.821357360 seconds time elapsed

       7.799831000 seconds user
       0.004001000 seconds sys