Effect of merging find_child and add node type dispatch - laurynas-biveinis/unodb GitHub Wiki
Filtered for (grow|add).*unodb::db
, except for micro_benchmark_n4
, filtered for insert.*unodb::db
:
-
micro_benchmark_n4
: 1% speedup (full_n4_sequential_insert<unodb::db>/65535
) to 2% speedup (full_n4_sequential_insert<unodb::db>/100
) -
micro_benchmark_n16
: 0% to 18% speedup (n16_sequential_add<unodb::db>/4096
) -
micro_benchmark_n48
: 3% slowdown (n48_random_add<unodb::db>/8
) to 2% speedup (grow_n16_to_n48_sequentially<unodb::db>/64
) -
micro_benchmark_n256
: 4% slowdown (n256_sequential_add<unodb::db>/512
) to 5% speedup (n256_sequential_add<unodb::db>/8
)
Same for olc_db
:
-
micro_benchmark_n4
: 7% slowdown (full_n4_random_insert<unodb::olc_db>/32768
) to 2% slowdown (minimal_n4_sequential_insert<unodb::olc_db>/16
) -
micro_benchmark_n16
: 0% to 12% slowdown (n16_random_add<unodb::olc_db>/512
) -
micro_benchmark_n48
: 0% to 8% slowdown (n48_random_add<unodb::olc_db>/512
) -
micro_benchmark_n256
: 1% slowdown (grow_n48_to_n256_sequentially<unodb::olc_db>/2
) to 6% slowdown (egrow_n48_to_n256_randomly<unodb::olc_db>/2
)
db baseline perf stat
Performance counter stats for './micro_benchmark_n16 --benchmark_filter=n16_sequential_add<unodb::db>/4096 --benchmark_repetitions=9':
11,089.79 msec task-clock # 0.998 CPUs utilized
22 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
1,090,338 page-faults # 0.098 M/sec
27,742,823,612 cycles # 2.502 GHz (83.33%)
10,027,962,414 stalled-cycles-frontend # 36.15% frontend cycles idle (83.34%)
5,232,359,170 stalled-cycles-backend # 18.86% backend cycles idle (66.67%)
52,454,714,923 instructions # 1.89 insn per cycle
# 0.19 stalled cycles per insn (83.34%)
8,964,685,154 branches # 808.373 M/sec (83.34%)
15,288,158 branch-misses # 0.17% of all branches (83.32%)
With the patch:
Performance counter stats for './micro_benchmark_n16 --benchmark_filter=n16_sequential_add<unodb::db>/4096 --benchmark_repetitions=9':
11,307.40 msec task-clock # 0.998 CPUs utilized
19 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
1,143,743 page-faults # 0.101 M/sec
28,284,344,173 cycles # 2.501 GHz (83.32%)
10,436,359,901 stalled-cycles-frontend # 36.90% frontend cycles idle (83.34%)
5,473,871,784 stalled-cycles-backend # 19.35% backend cycles idle (66.68%)
54,545,255,294 instructions # 1.93 insn per cycle
# 0.19 stalled cycles per insn (83.34%)
9,238,736,513 branches # 817.052 M/sec (83.34%)
15,955,426 branch-misses # 0.17% of all branches (83.32%)