Effect of branching exactly once in remove code path - laurynas-biveinis/unodb GitHub Wiki
Benchmarking unodb::db
, filtered for (delete|shrink).*unodb::db
, except for n256, filtered for delete.*unodb::db
:
-
micro_benchmark_n4
: 1% slowdown (shrink_node16_to_n4_randomly<unodb::db>/25
) to 8% speedup (shrink_node16_to_n4_sequentially<unodb::db>/4096
) -
micro_benchmark_n16
: 3% slowdown (shrink_n48_to_n16_randomly<unodb::db>/16383
) to 16% speedup (full_n16_tree_sequential_delete<unodb::db>/4096
) -
micro_benchmark_n48
: 3% slowdown (shrink_n256_to_n48_sequentially<unodb::db>/4
) to 6% speedup (shrink_n256_to_n48_sequentially<unodb::db>/512
) -
micro_benchmark_n256
: 2% speedup (full_n256_tree_random_delete<unodb::db>/4096
) to 8% speedup (full_n256_tree_random_delete<unodb::db>/192
)
Same for unodb::olc_db
:
-
micro_benchmark_n4
: 2% slowdown (shrink_node16_to_n4_randomly<unodb::olc_db>/25
) to 7% speedup (full_n4_to_minimal_sequential_delete<unodb::olc_db>/32768
) -
micro_benchmark_n16
: 1% speedup (shrink_n48_to_n16_randomly<unodb::olc_db>/8
) to 4% speedup (full_n16_tree_sequential_delete<unodb::olc_db>/64
) -
micro_benchmark_n48
: 0% (shrink_n256_to_n48_sequentially<unodb::olc_db>/2048
) to 3% speedup (full_n48_tree_random_delete<unodb::olc_db>/192
) -
micro_benchmark_n256
: 3% speedup (full_n256_tree_random_delete<unodb::olc_db>/4096
) to 7% speedup (full_n256_tree_sequential_delete<unodb::olc_db>/512
)
perf stat baseline:
full_n16_tree_sequential_delete<unodb::db>/4096 36.8 us 36.6 us 19170 items_per_second=7.51171M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 36.8 us 36.6 us 19170 items_per_second=7.5066M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 36.7 us 36.6 us 19170 items_per_second=7.51972M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 36.8 us 36.6 us 19170 items_per_second=7.50578M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 36.7 us 36.5 us 19170 items_per_second=7.52536M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 36.7 us 36.6 us 19170 items_per_second=7.52283M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 37.0 us 36.8 us 19170 items_per_second=7.4785M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 36.7 us 36.6 us 19170 items_per_second=7.52239M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 36.9 us 36.7 us 19170 items_per_second=7.48681M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_mean 36.8 us 36.6 us 9 items_per_second=7.50886M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_median 36.8 us 36.6 us 9 items_per_second=7.51171M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_stddev 0.083 us 0.081 us 9 items_per_second=16.5805k/s size=0
Performance counter stats for './micro_benchmark_n16 --benchmark_filter=full_n16_tree_sequential_delete<unodb::db>/4096 --benchmark_repetitions=9':
102,823.50 msec task-clock # 1.000 CPUs utilized
131 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
322 page-faults # 0.003 K/sec
257,360,050,745 cycles # 2.503 GHz (83.33%)
42,199,574,985 stalled-cycles-frontend # 16.40% frontend cycles idle (83.33%)
16,915,470,705 stalled-cycles-backend # 6.57% backend cycles idle (66.67%)
666,337,839,324 instructions # 2.59 insn per cycle
# 0.06 stalled cycles per insn (83.33%)
123,503,650,797 branches # 1201.123 M/sec (83.33%)
248,129,951 branch-misses # 0.20% of all branches (83.33%)
102.841191760 seconds time elapsed
102.428370000 seconds user
0.396016000 seconds sys
perf stat patch:
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.95834M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.95809M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.96869M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.95669M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.96984M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.95272M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.8 us 30.6 us 22801 items_per_second=8.98166M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.95326M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096 30.9 us 30.7 us 22801 items_per_second=8.96822M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_mean 30.9 us 30.7 us 9 items_per_second=8.96306M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_median 30.9 us 30.7 us 9 items_per_second=8.95834M/s size=494.813k
full_n16_tree_sequential_delete<unodb::db>/4096_stddev 0.033 us 0.033 us 9 items_per_second=9.62308k/s size=0
Performance counter stats for './micro_benchmark_n16 --benchmark_filter=full_n16_tree_sequential_delete<unodb::db>/4096 --benchmark_repetitions=9':
123,198.12 msec task-clock # 1.000 CPUs utilized
136 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
322 page-faults # 0.003 K/sec
308,356,675,493 cycles # 2.503 GHz (83.33%)
53,873,212,738 stalled-cycles-frontend # 17.47% frontend cycles idle (83.33%)
19,495,916,458 stalled-cycles-backend # 6.32% backend cycles idle (66.67%)
786,030,393,859 instructions # 2.55 insn per cycle
# 0.07 stalled cycles per insn (83.33%)
145,659,967,219 branches # 1182.323 M/sec (83.33%)
274,349,767 branch-misses # 0.19% of all branches (83.33%)
123.215514943 seconds time elapsed
122.822952000 seconds user
0.376021000 seconds sys