Effect of fully inlining try_read_unlock - laurynas-biveinis/unodb GitHub Wiki
In the cold OLC algorithm paths, GCC 11 calls try_read_unlock
instead of inlining it even though it should compile down to a single instruction, saving code size. Try forcing inlining by __attribute__((always_inline,flatten))
.
micro_benchmark_key_prefix
: 2% slowdown (unpredictable_cut_key_prefix
) to 2% speedup (unpredictable_get_shared_length
)micro_benchmark_n4
: 2% slowdown (n4_full_scan/100
) to 2% speedup (full_n4_to_minimal_sequential_delete/512
)micro_benchmark_n16
: 3% slowdown (shrink_n48_to_n16_randomly/16383
) to 6% speedup (n16_random_add/4096
)micro_benchmark_n48
: 2% slowdown (full_n48_tree_sequential_delete/4096
) to 8% speedup (n48_random_add/4096
)micro_benchmark_n256
: 4% slowdown (full_n256_tree_sequential_delete/4096
) to 4% speedup (grow_n48_to_n256_sequentially/2048
)