weeklydevblog_2025 04 20 - lustre-ge/lustre GitHub Wiki
This week’s update is entirely focused on the continued development and performance benchmarking of the lustre_voxel
library. The priority was bringing block caching performance into sub-nanosecond territory while adhering strictly to Data-Oriented Design (DOD) and Data-Oriented Programming (DOP) standards.
A major focus was comparative benchmarking between sequential and parallel block registry access for cache efficiency.
Three test passes were conducted:
-
Initial Reference (Old Parallel):
- 85 blocks, 17.239 µs total average = 202.81 ns/block
-
New Sequential Test (4096 blocks):
- 3.0044 µs total average = ~0.734 ns/block
- ✅ Sub-nanosecond goal achieved
-
New Parallel Test (4096 blocks):
- 53.787 µs total average = ~13.13 ns/block
The results confirmed that:
- Sequential access is significantly more efficient for tight inner loops, with ~17x better per-block performance than the parallel path.
- Parallel access is only beneficial for high-cost operations across multiple chunks, such as world generation, terrain meshing, or IO-bound streaming.
These findings will inform chunk processing strategies in lustre_voxel
, where sequential will be used for intra-chunk systems, and parallelism will be reserved for larger spatial operations.
The submodules blocks
, chunks
, and data
were repeatedly refactored this week to align with DOD principles:
- ✅ Block types are now stored as
Vec<u16>
for cache-friendly access - ✅ Metadata is sparsely packed with bit-level precision, minimizing overhead for blocks like
SEA_LEVEL_AIR
- ✅ Block registries use static lookup tables for property access, preventing dynamic allocation during runtime
The current prototype supports block IDs 0–85, which are exclusively used for world builder blocks:
- Terrain Blocks
- Atmosphere Blocks
- Resource Blocks
- Fluid Blocks
This block set is foundational for generating layered planetary environments (core → mantle → crust → ocean → atmosphere) with biome and sub-biome diversity.
- All systems in
lustre_voxel
are being rewritten to align with DOD/DOP. - Profiling data is being used to enforce a hard performance ceiling of <1 ns/block access in critical systems.
- A micro-benchmark was completed this week to evaluate sequential (SoA-like) vs parallel (AoS-like) access for block cache lookups.
- A full ECS-scale simulation at biome scale (394³ chunks, ~250B blocks) is still being planned to evaluate SoA vs AoS at the system level.
-
Integrate DOD-Compliant Block Registry
- Finalize
BlockTypeRegistry
for static lookup tables - Implement fast-path cache hints for hot blocks (terrain, stone, air)
- Finalize
-
Begin Work on
biomes
Submodule- Expand biome types and define sub-biomes (e.g., coastal, alpine, volcanic)
- Integrate biome rulesets for terrain, fluid, resource, and atmosphere layering
- Optimize data layout for deterministic access and cache locality
-
Begin Work on
generators
Submodule- Refine procedural generation logic for planetary and lunar chunks
- Implement biome-aware generation paths using precomputed noise fields
- Benchmark sequential vs parallel generation to identify optimal workload strategy
-
Begin Work on
mesh
Submodule- Establish baseline greedy mesher for surface mesh extraction
- Prepare chunk meshing pipeline for integration with
lustre_render
- Validate meshing performance against block cache access benchmarks
Stay tuned for next week as we continue focusing on performant and optimized code early. We'll be conducting a full ECS-scale simulation at biome scale (394³ chunks, ~250B blocks) to evaluate SoA vs AoS at the system level.