Meshlets - JuanDiegoMontoya/Frogfood GitHub Wiki

Frogfood renders all meshes using meshlets. A mesh is typically composed of several meshlets, each with a small number of vertices and primitives (Frogfood uses 64 for both). While they were originally conceived of for use in mesh shaders, Frogfood uses them for increased culling granularity.

Scene Loading

The code for scene loading can be found in SceneLoader.cpp.

Frogfood only supports loading glTF scene files using fastgltf. glTF loading has been parallelized where it's most beneficial- image decoding, vertex accessor loading and format conversion, and meshlet construction. To find these instances of parallelization, search for std::execution::par in SceneLoader.cpp.

glTF files with the following extensions are supported and will be rendered correctly by Frogfood:

Meshlet Generation

Meshlets are generated using meshoptimizer, which provides an API for building meshlets from a standard indexed mesh. The API is well explained in meshoptimizer's readme.

TODO: Insert diagram or image illustrating the in-memory differences between meshlet meshes and standard meshes.

The original indexed mesh is retained for the debug forward renderer and creating bottom-level acceleration structures for ray tracing.

Rendering

Because Frogfood originally used OpenGL for rendering, which had no cross-platform mesh shaders at the time, we decided to pursue an alternative approach that leverages the traditional vertex pipeline.

Culling

Culling is performed in two stages:

1. Meshlet Culling

CullMeshlets.comp.glsl

In this stage, each thread of a compute dispatch operates on a single meshlet in the contiguous meshlets buffer. The following tests are performed to determine if the meshlet is visible:

  1. Determine if the meshlet's AABB is potentially colliding with the frustum.
  2. Calculate the screen-space bounds of the meshlet and its depth, then
    1. If the current view type is "main", then test the bounds against the Hi-Z buffer computed by repeated downsampling of last frame's depth buffer with reduction sampler.
    2. If the current view type is "virtual", then test the bounds against the Hi-P buffer for the current virtual shadow map being rendered. The Hi-P buffer contains a hierarchical representation of which pages are active.

If the meshlet is not culled by any of the above steps, its index gets appended to an array to be consumed in the next stage.

2. Triangle Culling

CullTriangles.comp.glsl

For each meshlet that passed culling, a workgroup is spawned wherein each thread culls a single primitive. Each triangle is subjected to the following tests:

  1. Back-facing primitives are culled by inspecting the sign of the determinant of the 3x3 matrix formed by the triangle's clip-space coordinate
  2. Frustum culling is performed by checking if the NDC-space bounding box of the triangle collides with the [-1, 1] box
  3. Primitives that don't cover any sample points
  4. If the view type is "virtual", then the triangle bounds will be tested against an HPB as described in the meshlet culling section

At the end of each workgroup, the indices of unculled triangles gets written to a buffer and their count is stored in an indirect draw command.

The following resources were referenced for triangle culling:

Vertex Shader

As we only generate a stream of indices from the triangle culling pass, you may be wondering how we can identify individual meshlets in the vertex shader. The trick involves exploiting our knowledge of the meshlets. Because they can only contain up to 64 vertices, we need no more than log2(64)=6 bits to identify a vertex within a meshlet. The other 26 bits of the index are used to identify the meshlet itself. That means the largest scene we can render contains 2^26 =67 million meshlets containing 64 primitives each, for a total of 4 billion triangles.

A major advantage of hardware mesh shading is that it doesn't require intermediate storage and bandwidth for meshlet indices and vertex indices. Meshlet culling would be moved to a task shader, which spawns mesh shader workgroups for unculled meshlets. Then, each of those threads can use the transformed vertices for both culling and rasterization.

The render below features a subset of Unreal's City Sample (with no textures and less geometry). A 24-8 split, which supports up to 16 million meshlet instances, was used for vertex indices. Even with reduced geometry, there are over 34 million meshlet instances in the scene, overflowing the bits allocated in the vertex indices. Artifacts can be seen in the form of stray geometry.

image