Effect of fully inlining try_read_unlock - laurynas-biveinis/unodb GitHub Wiki

In the cold OLC algorithm paths, GCC 11 calls try_read_unlock instead of inlining it even though it should compile down to a single instruction, saving code size. Try forcing inlining by __attribute__((always_inline,flatten)).

baseline commit, patch

  • micro_benchmark_key_prefix: 2% slowdown (unpredictable_cut_key_prefix) to 2% speedup (unpredictable_get_shared_length)
  • micro_benchmark_n4: 2% slowdown (n4_full_scan/100) to 2% speedup (full_n4_to_minimal_sequential_delete/512)
  • micro_benchmark_n16: 3% slowdown (shrink_n48_to_n16_randomly/16383) to 6% speedup (n16_random_add/4096)
  • micro_benchmark_n48: 2% slowdown (full_n48_tree_sequential_delete/4096) to 8% speedup (n48_random_add/4096)
  • micro_benchmark_n256: 4% slowdown (full_n256_tree_sequential_delete/4096) to 4% speedup (grow_n48_to_n256_sequentially/2048)