Effect of removing polymorphism on inode entry removal - laurynas-biveinis/unodb GitHub Wiki

Baseline commit, patch

  • micro_benchmark_node4: 4% slowdown (shrink_node16_to_node4_randomly<unodb::db>/25) to 2% speedup (full_node4_to_minimal_sequential_delete<unodb::db>/32768)
  • micro_benchmark_node16: 2% slowdown (full_node16_tree_sequential_delete<unodb::db>/512) to 2% speedup (shrink_node48_to_node16_randomly<unodb::db>/4)
  • micro_benchmark_node48: 3% slowdown (full_node48_tree_sequential_delete<unodb::db>/192_mean) to 2% speedup (full_node48_tree_random_delete<unodb::db>/192)

perf stat on full_node48_tree_sequential_delete<unodb::db>/192_mean

Baseline:

$ perf stat ./micro_benchmark_node48 --benchmark_filter='full_node48_tree_random_delete<unodb::db>/192'
2021-05-05T05:09:27+02:00
Running ./micro_benchmark_node48
Run on (8 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 0.23, 0.05, 0.02
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
full_node48_tree_random_delete<unodb::db>/192       6.18 us         6.16 us       113653 items_per_second=20.1299M/s size=23.7969k

 Performance counter stats for './micro_benchmark_node48 --benchmark_filter=full_node48_tree_random_delete<unodb::db>/192':

          3,362.81 msec task-clock                #    0.995 CPUs utilized          
                 7      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               172      page-faults               #    0.051 K/sec                  
    12,791,617,561      cycles                    #    3.804 GHz                      (83.35%)
     3,362,737,072      stalled-cycles-frontend   #   26.29% frontend cycles idle     (83.35%)
     1,571,608,103      stalled-cycles-backend    #   12.29% backend cycles idle      (66.70%)
    26,394,524,255      instructions              #    2.06  insn per cycle         
                                                  #    0.13  stalled cycles per insn  (83.35%)
     4,368,111,063      branches                  # 1298.947 M/sec                    (83.35%)
        59,282,956      branch-misses             #    1.36% of all branches          (83.26%)

       3.380935961 seconds time elapsed

       3.263031000 seconds user
       0.100092000 seconds sys

Patch:

$ perf stat ./micro_benchmark_node48 --benchmark_filter='full_node48_tree_random_delete<unodb::db>/192'
2021-05-05T05:08:52+02:00
Running ./micro_benchmark_node48
Run on (8 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 0.00, 0.00, 0.00
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
full_node48_tree_random_delete<unodb::db>/192       5.93 us         5.90 us       118402 items_per_second=21.0036M/s size=23.7969k

 Performance counter stats for './micro_benchmark_node48 --benchmark_filter=full_node48_tree_random_delete<unodb::db>/192':

          3,308.91 msec task-clock                #    0.995 CPUs utilized          
                23      context-switches          #    0.007 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               170      page-faults               #    0.051 K/sec                  
    12,588,995,933      cycles                    #    3.805 GHz                      (83.37%)
     3,444,284,998      stalled-cycles-frontend   #   27.36% frontend cycles idle     (83.32%)
     1,400,722,360      stalled-cycles-backend    #   11.13% backend cycles idle      (66.64%)
    26,802,716,169      instructions              #    2.13  insn per cycle         
                                                  #    0.13  stalled cycles per insn  (83.32%)
     4,396,427,876      branches                  # 1328.664 M/sec                    (83.32%)
        37,248,075      branch-misses             #    0.85% of all branches          (83.36%)

       3.326893647 seconds time elapsed

       3.169066000 seconds user
       0.140224000 seconds sys

perf stat on shrink_node16_to_node4_randomly<unodb::db>/25

Baseline:

$ perf stat ./micro_benchmark_node4 --benchmark_filter='shrink_node16_to_node4_randomly<unodb::db>/25'
2021-05-05T05:12:12+02:00
Running ./micro_benchmark_node4
Run on (8 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 0.02, 0.03, 0.01
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
shrink_node16_to_node4_randomly<unodb::db>/25       3.70 us         3.68 us       189954 items_per_second=6.78905M/s size=18.1221k

 Performance counter stats for './micro_benchmark_node4 --benchmark_filter=shrink_node16_to_node4_randomly<unodb::db>/25':

          5,654.66 msec task-clock                #    0.997 CPUs utilized          
                11      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               167      page-faults               #    0.030 K/sec                  
    21,517,414,946      cycles                    #    3.805 GHz                      (83.31%)
     5,577,500,826      stalled-cycles-frontend   #   25.92% frontend cycles idle     (83.31%)
     2,813,964,129      stalled-cycles-backend    #   13.08% backend cycles idle      (66.70%)
    46,581,612,214      instructions              #    2.16  insn per cycle         
                                                  #    0.12  stalled cycles per insn  (83.38%)
     8,556,898,093      branches                  # 1513.247 M/sec                    (83.38%)
        42,495,763      branch-misses             #    0.50% of all branches          (83.31%)

       5.672772693 seconds time elapsed

       5.302824000 seconds user
       0.352187000 seconds sys

Patch:

$ perf stat ./micro_benchmark_node4 --benchmark_filter='shrink_node16_to_node4_randomly<unodb::db>/25'
2021-05-05T05:12:29+02:00
Running ./micro_benchmark_node4
Run on (8 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 0.08, 0.05, 0.01
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
shrink_node16_to_node4_randomly<unodb::db>/25       3.86 us         3.84 us       182458 items_per_second=6.51672M/s size=18.1221k

 Performance counter stats for './micro_benchmark_node4 --benchmark_filter=shrink_node16_to_node4_randomly<unodb::db>/25':

          5,471.37 msec task-clock                #    0.997 CPUs utilized          
                16      context-switches          #    0.003 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               165      page-faults               #    0.030 K/sec                  
    20,812,071,207      cycles                    #    3.804 GHz                      (83.33%)
     5,588,869,142      stalled-cycles-frontend   #   26.85% frontend cycles idle     (83.33%)
     2,985,649,458      stalled-cycles-backend    #   14.35% backend cycles idle      (66.66%)
    44,298,842,993      instructions              #    2.13  insn per cycle         
                                                  #    0.13  stalled cycles per insn  (83.33%)
     8,090,433,590      branches                  # 1478.685 M/sec                    (83.34%)
        49,220,872      branch-misses             #    0.61% of all branches          (83.33%)

       5.489533839 seconds time elapsed

       5.167541000 seconds user
       0.304208000 seconds sys
⚠️ **GitHub.com Fallback** ⚠️