Software backend - servo/webrender GitHub Wiki
llvmpipe vs swiftshader current performance investigation. SwiftShader runs one vertex at a time when if the vertex shader has texture loads.
Current plan
- Try to get WebRender running on llvmpipe as well as possible to get a feel for how well software can work.
- Figure out bottlenecks when running on llvmpipe.
- text-benchmark.yaml profile (num_threads=1, vector_width=128))
- Investigate a custom rasterizer.
Architecture
Rasterization
SwiftShader rasterizes into a 'struct Primitive' which contains a set of spans that describe the rasterized polygon/triangle. It does this one edge at a time.
Shaders
llvmpipe and swiftshader both run the pixel shader on 2x2 blocks of pixels. A pixel is represented as something like Pixel{ a[f32; 4], r: [f32; 4], g: [f32; 4], b: [f32; 4]}
or specifically Vector4f
in SwiftShader. This way, all "a" for a block are placed in a SIMD word and processed together, same for "r"s, "g"s, and "b"s.
The vertex shader is similarly run on 4 vertices at a time.
Z buffer
Both llvmpipe and swiftshader seem to use a linear z buffer. i.e. no hierarchical-Z. This probably isn't so bad for llvmpipe because it does pixel shading a tile at a time and so we're more likely to hit the cache for subsequent draws. It does seem like it would be more of a problem for SwiftShader.
LLVMpipe performance
llvmpipe can dump its shaders (build mesa with --buildtype=debug
and then set GALLIVM_DEBUG=dumpbc) as well as note them for profiling with perf.
LP_NATIVE_VECTOR_WIDTH=128
forces AVX offGALLIUM_DUMP_CPU=1
lets you see the detected cpuLP_NUM_THREADS=0
forces to a single threadMESA_GLSL=dump
will the glsl-ir of the shaders
Custom
I'm experimenting with compiling the WebRender shaders to C++. We could potentially do this at build time which would avoid the need to ship a jit which is good for code size and security. (We can opt out of being able to make pages executable in the gpu process)
One challenge with this approach is that we'd want to know the sampler state at compile time.
Profling
Try out https://github.com/plasma-umass/coz