Effect of SIMD'ing Node4 insert position search - laurynas-biveinis/unodb GitHub Wiki

Branch 1: baseline commit, patch.

  • micro_benchmark_n4 --benchmark_filter=insert.*unodb::db: 3% speedup to 3% slowdown
  • micro_benchmark_n16 --benchmark_filter=grow.*unodb::db: 3% to 0% slowdown

Branch 2 (_mm_cmple_epu8 with exact clamp): baseline commit, patch.

  • micro_benchmark_n4 --benchmark_filter=insert.*unodb::db: 1% to 5% speedup
  • micro_benchmark_n16 --benchmark_filter=grow.*unodb::db: 2% slowdown to 16% speedup

Branch 3: (@justinasvd version with __builtin_ctzl): baseline commit, patch.

  • micro_benchmark_n4 --benchmark_filter=insert.*unodb::db: 2% slowdown to 4% speedup
  • micro_benchmark_n16 --benchmark_filter=grow.*unodb::db: 4% to 0% slowdown

Branch 4 (jvd horizontal sum): baseline commit, patch

  • micro_benchmark_n4 --benchmark_filter=insert.*unodb::db: 1% slowdown to 3% speedup
  • micro_benchmark_n16 --benchmark_filter=grow.*unodb::db: 4% slowdown to 16% speedup

Branch 4 vs Branch 2:

  • micro_benchmark_n4 --benchmark_filter=insert.*unodb::db: 2% slowdown to 7% speedup
  • micro_benchmark_n16 --benchmark_filter=grow.*unodb::db: 1% slowdown to 1% speedup

All above branches are at Sandy Bridge level. Once support for later CPUs is added, check jvd bextr code.