Effect of tagged node pointers - laurynas-biveinis/unodb GitHub Wiki

baseline commit, patch

Take 1

micro_benchmark_key_prefix: 2% slowdown (unpredictable_get_shared_length) to 15% speedup (unpredictable_leaf_key_prefix_split)

unodb::db:

  • micro_benchmark_n4: 23% slowdown (shrink_node16_to_n4_randomly<unodb::db>/16383) to 30% speedup (full_n4_sequential_insert<unodb::db>/65535)
  • micro_benchmark_n16: 335% slowdown (full_n16_tree_random_delete<unodb::db>/246000) to 15% speedup (n16_random_add<unodb::db>/10)
  • micro_benchmark_n48: 115% slowdown (full_n48_tree_random_delete<unodb::db>/196608) to 24% speedup (full_n48_tree_random_delete<unodb::db>/512)
  • micro_benchmark_n256: 102% slowdown (full_n256_tree_sequential_delete<unodb::db>/196608) to 25% speedup (full_n256_tree_random_delete<unodb::db>/512)

unodb::olc_db:

  • micro_benchmark_n4: 23% slowdown (n4_full_scan<unodb::olc_db>/512) to 18% speedup (minimal_n4_random_insert<unodb::db>/64)
  • micro_benchmark_n16: 22% slowdown (full_n16_tree_full_scan<unodb::olc_db>/512) to 13% speedup (grow_n4_to_n16_randomly<unodb::olc_db>/64)
  • micro_benchmark_n48: 11% slowdown (grow_n16_to_n48_sequentially<unodb::olc_db>/4096) to 4% speedup (full_n48_tree_sequential_delete<unodb::olc_db>/196608)
  • micro_benchmark_n256: 4% slowdown (minimal_n256_tree_full_scan<unodb::olc_db>/4096) to 4% speedup (grow_n48_to_n256_randomly<unodb::olc_db>/512)

Take 2

perf report on the biggest regression (335% slowdown on full_n16_tree_random_delete<unodb::db>/246000) shows significant differences in glibc allocation code paths. As the branches are not orthogonal to the allocator due to tagged pointer nodes being slightly smaller, the allocator starts to matter. Let's try with LD_PRELOAD-ed JEMalloc, same code

micro_benchmark_key_prefix: 3% slowdown (unpredictable_get_shared_length) to 15% speedup (unpredictable_cut_key_prefix)

unodb::db:

  • micro_benchmark_n4: 4% slowdown (n4_full_scan<unodb::db>/4096) to 23% speedup (full_n4_to_minimal_sequential_delete<unodb::db>/100)
  • micro_benchmark_n16: 3% slowdown (full_n16_tree_full_scan<unodb::db>/512) to 14% speedup (n16_random_add<unodb::db>/16383)
  • micro_benchmark_n48: 4% slowdown (grow_n16_to_n48_sequentially<unodb::db>/512) to 27% speedup (full_n48_tree_random_delete<unodb::db>/192)
  • micro_benchmark_n256: 6% slowdown (minimal_n256_tree_full_scan<unodb::db>/512) to 32% speedup (full_n256_tree_random_delete<unodb::db>/192)

unodb::olc_db:

  • micro_benchmark_n4: 17% slowdown (minimal_n4_random_insert<unodb::olc_db>/16) to 9% speedup (shrink_node16_to_n4_randomly<unodb::olc_db>/64)
  • micro_benchmark_n16: 2% slowdown (n16_sequential_add<unodb::olc_db>/4096) to 9% speedup (full_n16_tree_sequential_delete<unodb::olc_db>/64)
  • micro_benchmark_n48: 3% slowdown (n48_random_add<unodb::olc_db>/2) to 8% speedup (shrink_n256_to_n48_randomly<unodb::olc_db>/64)
  • micro_benchmark_n256: 8% slowdown (grow_n48_to_n256_sequentially<unodb::olc_db>/8) to 3% speedup (grow_n48_to_n256_sequentially<unodb::olc_db>/64)

Tree sizes

  • N4 unodb::db: 8.06248M to 7.99998M
  • N4 unodb::olc_db: 9.33331M to 8.66665M
  • N16 unodb::db: 37.6469M to 37.347M
  • N16 unodb::olc_db: 42.5216M to 40.1217M
  • N48 unodb::db: 25.3126M to 25.1141M
  • N48 unodb::olc_db: 28.3604M to 26.7724M
  • N256 unodb::db: 30.9042M to 30.6493M
  • N256 unodb::olc_db: 34.7363M to 32.6969M
⚠️ **GitHub.com Fallback** ⚠️