Effect of branching exactly once in remove code path - laurynas-biveinis/unodb GitHub Wiki

baseline commit, patch

Benchmarking unodb::db, filtered for (delete|shrink).*unodb::db, except for n256, filtered for delete.*unodb::db:

  • micro_benchmark_n4: 1% slowdown (shrink_node16_to_n4_randomly<unodb::db>/25) to 8% speedup (shrink_node16_to_n4_sequentially<unodb::db>/4096)
  • micro_benchmark_n16: 3% slowdown (shrink_n48_to_n16_randomly<unodb::db>/16383) to 16% speedup (full_n16_tree_sequential_delete<unodb::db>/4096)
  • micro_benchmark_n48: 3% slowdown (shrink_n256_to_n48_sequentially<unodb::db>/4) to 6% speedup (shrink_n256_to_n48_sequentially<unodb::db>/512)
  • micro_benchmark_n256: 2% speedup (full_n256_tree_random_delete<unodb::db>/4096) to 8% speedup (full_n256_tree_random_delete<unodb::db>/192)

Same for unodb::olc_db:

  • micro_benchmark_n4: 2% slowdown (shrink_node16_to_n4_randomly<unodb::olc_db>/25) to 7% speedup (full_n4_to_minimal_sequential_delete<unodb::olc_db>/32768)
  • micro_benchmark_n16: 1% speedup (shrink_n48_to_n16_randomly<unodb::olc_db>/8) to 4% speedup (full_n16_tree_sequential_delete<unodb::olc_db>/64)
  • micro_benchmark_n48: 0% (shrink_n256_to_n48_sequentially<unodb::olc_db>/2048) to 3% speedup (full_n48_tree_random_delete<unodb::olc_db>/192)
  • micro_benchmark_n256: 3% speedup (full_n256_tree_random_delete<unodb::olc_db>/4096) to 7% speedup (full_n256_tree_sequential_delete<unodb::olc_db>/512)

perf stat baseline:

full_n16_tree_sequential_delete<unodb::db>/4096              36.8 us         36.6 us        19170 items_per_second=7.51171M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              36.8 us         36.6 us        19170 items_per_second=7.5066M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              36.7 us         36.6 us        19170 items_per_second=7.51972M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              36.8 us         36.6 us        19170 items_per_second=7.50578M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              36.7 us         36.5 us        19170 items_per_second=7.52536M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              36.7 us         36.6 us        19170 items_per_second=7.52283M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              37.0 us         36.8 us        19170 items_per_second=7.4785M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              36.7 us         36.6 us        19170 items_per_second=7.52239M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              36.9 us         36.7 us        19170 items_per_second=7.48681M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_mean         36.8 us         36.6 us            9 items_per_second=7.50886M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_median       36.8 us         36.6 us            9 items_per_second=7.51171M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_stddev      0.083 us        0.081 us            9 items_per_second=16.5805k/s size=0

 Performance counter stats for './micro_benchmark_n16 --benchmark_filter=full_n16_tree_sequential_delete<unodb::db>/4096 --benchmark_repetitions=9':

        102,823.50 msec task-clock                #    1.000 CPUs utilized          
               131      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               322      page-faults               #    0.003 K/sec                  
   257,360,050,745      cycles                    #    2.503 GHz                      (83.33%)
    42,199,574,985      stalled-cycles-frontend   #   16.40% frontend cycles idle     (83.33%)
    16,915,470,705      stalled-cycles-backend    #    6.57% backend cycles idle      (66.67%)
   666,337,839,324      instructions              #    2.59  insn per cycle         
                                                  #    0.06  stalled cycles per insn  (83.33%)
   123,503,650,797      branches                  # 1201.123 M/sec                    (83.33%)
       248,129,951      branch-misses             #    0.20% of all branches          (83.33%)

     102.841191760 seconds time elapsed
    
     102.428370000 seconds user
       0.396016000 seconds sys

perf stat patch:

full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.95834M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.95809M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.96869M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.95669M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.96984M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.95272M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.8 us         30.6 us        22801 items_per_second=8.98166M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.95326M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096              30.9 us         30.7 us        22801 items_per_second=8.96822M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_mean         30.9 us         30.7 us            9 items_per_second=8.96306M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_median       30.9 us         30.7 us            9 items_per_second=8.95834M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_stddev      0.033 us        0.033 us            9 items_per_second=9.62308k/s size=0

 Performance counter stats for './micro_benchmark_n16 --benchmark_filter=full_n16_tree_sequential_delete<unodb::db>/4096 --benchmark_repetitions=9':

        123,198.12 msec task-clock                #    1.000 CPUs utilized          
               136      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               322      page-faults               #    0.003 K/sec                  
   308,356,675,493      cycles                    #    2.503 GHz                      (83.33%)
    53,873,212,738      stalled-cycles-frontend   #   17.47% frontend cycles idle     (83.33%)
    19,495,916,458      stalled-cycles-backend    #    6.32% backend cycles idle      (66.67%)
   786,030,393,859      instructions              #    2.55  insn per cycle         
                                                  #    0.07  stalled cycles per insn  (83.33%)
   145,659,967,219      branches                  # 1182.323 M/sec                    (83.33%)
       274,349,767      branch-misses             #    0.19% of all branches          (83.33%)

     123.215514943 seconds time elapsed

     122.822952000 seconds user
       0.376021000 seconds sys
⚠️ **GitHub.com Fallback** ⚠️