Effect of merging find_child and add node type dispatch - laurynas-biveinis/unodb GitHub Wiki

baseline commit, patch commit

Filtered for (grow|add).*unodb::db, except for micro_benchmark_n4, filtered for insert.*unodb::db:

  • micro_benchmark_n4: 1% speedup (full_n4_sequential_insert<unodb::db>/65535) to 2% speedup (full_n4_sequential_insert<unodb::db>/100)
  • micro_benchmark_n16: 0% to 18% speedup (n16_sequential_add<unodb::db>/4096)
  • micro_benchmark_n48: 3% slowdown (n48_random_add<unodb::db>/8) to 2% speedup (grow_n16_to_n48_sequentially<unodb::db>/64)
  • micro_benchmark_n256: 4% slowdown (n256_sequential_add<unodb::db>/512) to 5% speedup (n256_sequential_add<unodb::db>/8)

Same for olc_db:

  • micro_benchmark_n4: 7% slowdown (full_n4_random_insert<unodb::olc_db>/32768) to 2% slowdown (minimal_n4_sequential_insert<unodb::olc_db>/16)
  • micro_benchmark_n16: 0% to 12% slowdown (n16_random_add<unodb::olc_db>/512)
  • micro_benchmark_n48: 0% to 8% slowdown (n48_random_add<unodb::olc_db>/512)
  • micro_benchmark_n256: 1% slowdown (grow_n48_to_n256_sequentially<unodb::olc_db>/2) to 6% slowdown (egrow_n48_to_n256_randomly<unodb::olc_db>/2)

db baseline perf stat

   Performance counter stats for './micro_benchmark_n16 --benchmark_filter=n16_sequential_add<unodb::db>/4096 --benchmark_repetitions=9':
  
           11,089.79 msec task-clock                #    0.998 CPUs utilized          
                  22      context-switches          #    0.002 K/sec                  
                   0      cpu-migrations            #    0.000 K/sec                  
           1,090,338      page-faults               #    0.098 M/sec                  
      27,742,823,612      cycles                    #    2.502 GHz                      (83.33%)
      10,027,962,414      stalled-cycles-frontend   #   36.15% frontend cycles idle     (83.34%)
       5,232,359,170      stalled-cycles-backend    #   18.86% backend cycles idle      (66.67%)
      52,454,714,923      instructions              #    1.89  insn per cycle         
                                                    #    0.19  stalled cycles per insn  (83.34%)
       8,964,685,154      branches                  #  808.373 M/sec                    (83.34%)
          15,288,158      branch-misses             #    0.17% of all branches          (83.32%)

With the patch:

   Performance counter stats for './micro_benchmark_n16 --benchmark_filter=n16_sequential_add<unodb::db>/4096 --benchmark_repetitions=9':

         11,307.40 msec task-clock                #    0.998 CPUs utilized          
                19      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
         1,143,743      page-faults               #    0.101 M/sec                  
    28,284,344,173      cycles                    #    2.501 GHz                      (83.32%)
    10,436,359,901      stalled-cycles-frontend   #   36.90% frontend cycles idle     (83.34%)
     5,473,871,784      stalled-cycles-backend    #   19.35% backend cycles idle      (66.68%)
    54,545,255,294      instructions              #    1.93  insn per cycle         
                                                  #    0.19  stalled cycles per insn  (83.34%)
     9,238,736,513      branches                  #  817.052 M/sec                    (83.34%)
        15,955,426      branch-misses             #    0.17% of all branches          (83.32%)
⚠️ **GitHub.com Fallback** ⚠️