Effect of removing polymorphism on inode entry removal - laurynas-biveinis/unodb GitHub Wiki
-
micro_benchmark_node4
: 4% slowdown (shrink_node16_to_node4_randomly<unodb::db>/25
) to 2% speedup (full_node4_to_minimal_sequential_delete<unodb::db>/32768
) -
micro_benchmark_node16
: 2% slowdown (full_node16_tree_sequential_delete<unodb::db>/512
) to 2% speedup (shrink_node48_to_node16_randomly<unodb::db>/4
) -
micro_benchmark_node48
: 3% slowdown (full_node48_tree_sequential_delete<unodb::db>/192_mean
) to 2% speedup (full_node48_tree_random_delete<unodb::db>/192
)
Baseline:
$ perf stat ./micro_benchmark_node48 --benchmark_filter='full_node48_tree_random_delete<unodb::db>/192'
2021-05-05T05:09:27+02:00
Running ./micro_benchmark_node48
Run on (8 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 0.23, 0.05, 0.02
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
full_node48_tree_random_delete<unodb::db>/192 6.18 us 6.16 us 113653 items_per_second=20.1299M/s size=23.7969k
Performance counter stats for './micro_benchmark_node48 --benchmark_filter=full_node48_tree_random_delete<unodb::db>/192':
3,362.81 msec task-clock # 0.995 CPUs utilized
7 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
172 page-faults # 0.051 K/sec
12,791,617,561 cycles # 3.804 GHz (83.35%)
3,362,737,072 stalled-cycles-frontend # 26.29% frontend cycles idle (83.35%)
1,571,608,103 stalled-cycles-backend # 12.29% backend cycles idle (66.70%)
26,394,524,255 instructions # 2.06 insn per cycle
# 0.13 stalled cycles per insn (83.35%)
4,368,111,063 branches # 1298.947 M/sec (83.35%)
59,282,956 branch-misses # 1.36% of all branches (83.26%)
3.380935961 seconds time elapsed
3.263031000 seconds user
0.100092000 seconds sys
Patch:
$ perf stat ./micro_benchmark_node48 --benchmark_filter='full_node48_tree_random_delete<unodb::db>/192'
2021-05-05T05:08:52+02:00
Running ./micro_benchmark_node48
Run on (8 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 0.00, 0.00, 0.00
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
full_node48_tree_random_delete<unodb::db>/192 5.93 us 5.90 us 118402 items_per_second=21.0036M/s size=23.7969k
Performance counter stats for './micro_benchmark_node48 --benchmark_filter=full_node48_tree_random_delete<unodb::db>/192':
3,308.91 msec task-clock # 0.995 CPUs utilized
23 context-switches # 0.007 K/sec
0 cpu-migrations # 0.000 K/sec
170 page-faults # 0.051 K/sec
12,588,995,933 cycles # 3.805 GHz (83.37%)
3,444,284,998 stalled-cycles-frontend # 27.36% frontend cycles idle (83.32%)
1,400,722,360 stalled-cycles-backend # 11.13% backend cycles idle (66.64%)
26,802,716,169 instructions # 2.13 insn per cycle
# 0.13 stalled cycles per insn (83.32%)
4,396,427,876 branches # 1328.664 M/sec (83.32%)
37,248,075 branch-misses # 0.85% of all branches (83.36%)
3.326893647 seconds time elapsed
3.169066000 seconds user
0.140224000 seconds sys
Baseline:
$ perf stat ./micro_benchmark_node4 --benchmark_filter='shrink_node16_to_node4_randomly<unodb::db>/25'
2021-05-05T05:12:12+02:00
Running ./micro_benchmark_node4
Run on (8 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 0.02, 0.03, 0.01
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
shrink_node16_to_node4_randomly<unodb::db>/25 3.70 us 3.68 us 189954 items_per_second=6.78905M/s size=18.1221k
Performance counter stats for './micro_benchmark_node4 --benchmark_filter=shrink_node16_to_node4_randomly<unodb::db>/25':
5,654.66 msec task-clock # 0.997 CPUs utilized
11 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
167 page-faults # 0.030 K/sec
21,517,414,946 cycles # 3.805 GHz (83.31%)
5,577,500,826 stalled-cycles-frontend # 25.92% frontend cycles idle (83.31%)
2,813,964,129 stalled-cycles-backend # 13.08% backend cycles idle (66.70%)
46,581,612,214 instructions # 2.16 insn per cycle
# 0.12 stalled cycles per insn (83.38%)
8,556,898,093 branches # 1513.247 M/sec (83.38%)
42,495,763 branch-misses # 0.50% of all branches (83.31%)
5.672772693 seconds time elapsed
5.302824000 seconds user
0.352187000 seconds sys
Patch:
$ perf stat ./micro_benchmark_node4 --benchmark_filter='shrink_node16_to_node4_randomly<unodb::db>/25'
2021-05-05T05:12:29+02:00
Running ./micro_benchmark_node4
Run on (8 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 0.08, 0.05, 0.01
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
shrink_node16_to_node4_randomly<unodb::db>/25 3.86 us 3.84 us 182458 items_per_second=6.51672M/s size=18.1221k
Performance counter stats for './micro_benchmark_node4 --benchmark_filter=shrink_node16_to_node4_randomly<unodb::db>/25':
5,471.37 msec task-clock # 0.997 CPUs utilized
16 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
165 page-faults # 0.030 K/sec
20,812,071,207 cycles # 3.804 GHz (83.33%)
5,588,869,142 stalled-cycles-frontend # 26.85% frontend cycles idle (83.33%)
2,985,649,458 stalled-cycles-backend # 14.35% backend cycles idle (66.66%)
44,298,842,993 instructions # 2.13 insn per cycle
# 0.13 stalled cycles per insn (83.33%)
8,090,433,590 branches # 1478.685 M/sec (83.34%)
49,220,872 branch-misses # 0.61% of all branches (83.33%)
5.489533839 seconds time elapsed
5.167541000 seconds user
0.304208000 seconds sys