Effect of SIMD'ing Node4 insert position search - laurynas-biveinis/unodb GitHub Wiki
Branch 1: baseline commit, patch.
micro_benchmark_n4 --benchmark_filter=insert.*unodb::db
: 3% speedup to 3% slowdownmicro_benchmark_n16 --benchmark_filter=grow.*unodb::db
: 3% to 0% slowdown
Branch 2 (_mm_cmple_epu8
with exact clamp): baseline commit, patch.
micro_benchmark_n4 --benchmark_filter=insert.*unodb::db
: 1% to 5% speedupmicro_benchmark_n16 --benchmark_filter=grow.*unodb::db
: 2% slowdown to 16% speedup
Branch 3: (@justinasvd version with __builtin_ctzl
): baseline commit, patch.
micro_benchmark_n4 --benchmark_filter=insert.*unodb::db
: 2% slowdown to 4% speedupmicro_benchmark_n16 --benchmark_filter=grow.*unodb::db
: 4% to 0% slowdown
Branch 4 (jvd horizontal sum): baseline commit, patch
micro_benchmark_n4 --benchmark_filter=insert.*unodb::db
: 1% slowdown to 3% speedupmicro_benchmark_n16 --benchmark_filter=grow.*unodb::db
: 4% slowdown to 16% speedup
Branch 4 vs Branch 2:
micro_benchmark_n4 --benchmark_filter=insert.*unodb::db
: 2% slowdown to 7% speedupmicro_benchmark_n16 --benchmark_filter=grow.*unodb::db
: 1% slowdown to 1% speedup
All above branches are at Sandy Bridge level. Once support for later CPUs is added, check jvd bextr code.