Effect of removing branching on children_count in N4 insert_position_search - laurynas-biveinis/unodb GitHub Wiki
Benchmark micro_benchmark_n4 --benchmark_filter='(insert|delete).*unodb::db'
shows variance of 5% slowdown to 3% speedup. Disassembly confirms that comparison is removed, together with a lot of code reshuffling.
Perf stat for the same, baseline:
Performance counter stats for './micro_benchmark_n4 --benchmark_filter=(insert|delete).*unodb::db':
64,525.63 msec task-clock # 1.000 CPUs utilized
82 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
760,108 page-faults # 0.012 M/sec
161,483,904,044 cycles # 2.503 GHz (83.33%)
55,131,401,371 stalled-cycles-frontend # 34.14% frontend cycles idle (83.33%)
27,545,377,239 stalled-cycles-backend # 17.06% backend cycles idle (66.67%)
320,310,195,225 instructions # 1.98 insn per cycle
# 0.17 stalled cycles per insn (83.34%)
58,357,417,306 branches # 904.407 M/sec (83.34%)
354,373,579 branch-misses # 0.61% of all branches (83.33%)
64.542931348 seconds time elapsed
61.789857000 seconds user
2.736259000 seconds sys
Patch:
Performance counter stats for './micro_benchmark_n4 --benchmark_filter=(insert|delete).*unodb::db':
64,861.68 msec task-clock # 1.000 CPUs utilized
102 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
623,909 page-faults # 0.010 M/sec
162,326,540,442 cycles # 2.503 GHz (83.33%)
54,952,251,076 stalled-cycles-frontend # 33.85% frontend cycles idle (83.33%)
27,718,666,852 stalled-cycles-backend # 17.08% backend cycles idle (66.67%)
319,216,689,125 instructions # 1.97 insn per cycle
# 0.17 stalled cycles per insn (83.34%)
58,015,343,373 branches # 894.447 M/sec (83.34%)
362,515,162 branch-misses # 0.62% of all branches (83.33%)
64.879208397 seconds time elapsed
62.094087000 seconds user
2.768271000 seconds sys