2026 03 01_webrender_wgpu_renderer_implementation_plan - mark-ik/graphshell GitHub Wiki
Date: 2026-03-01
Status: Active convergence / upstream-first (reframed 2026-03-14, current-state corrected 2026-04-02, refreshed 2026-04-09) โ Graphshell still ships on egui_glow, but WebRender wgpu work is now treated as upstream renderer development first, with thin Servo integration and Graphshell-local validation. The current WebRender branch contains a functional wgpu renderer with remaining parity-validation and architectural cleanup work, so active planning emphasizes convergence from the existing hybrid branch shape rather than an immediate clean-slate backend split. See PLANNING_REGISTER.md ยง0.12/ยง0.13.
Author: Arc
Source research: research/2026-03-01_webrender_wgpu_renderer_research.md
Active audit log: 2026-03-03_servo_wgpu_upgrade_audit_report.md
Feeds into:
-
2026-03-01_webrender_readiness_gate_feature_guardrails.md(readiness gates G1โG5) -
2026-03-01_backend_bridge_contract_c_plus_f_receipt.md(C+F closure policy) -
aspect_render/2026-02-27_egui_wgpu_custom_canvas_migration_strategy.md(renderer backend migration) -
research/2026-02-27_egui_wgpu_custom_canvas_migration_requirements.md(GPU ownership model)
Tracker: #183 (backend migration), #180 (runtime-viewer GLโwgpu bridge)
Related lanes: #88 (stabilization), #90 (embedder-debt), #92 (viewer-platform), #99 (spec-code-parity)
This plan converts the research document's specification into a phased, technically validatable execution sequence for introducing a wgpu rendering backend into WebRender as consumed by Graphshell through Servo. Each phase has explicit entry conditions, exit criteria, validation methods, and rollback posture. The plan is structured so that every phase produces independently useful evidence and no phase requires speculative assumptions from a later phase.
End-state: Servo's WebRender compositor emits a wgpu::Texture that Graphshell's
CompositorAdapter binds zero-copy into the egui_wgpu frame pass, eliminating GL state
save/restore chaos mode and closing #180.
Current-state constraint: Glow remains the active runtime composition policy for the current milestone. This plan runs as Track B (WebRender readiness) in parallel with Track A (milestone delivery on Glow), per the readiness gate contract.
Current audit note (2026-03-14): the Servo-side wgpu 26 -> 27 compatibility pass still
matters, but it is now treated as enabling integration work rather than as the main development
home. The primary implementation target is upstream WebRender device / renderer; Servo should
only carry the minimum dependency-redirection or integration delta required to exercise a local
editable WebRender checkout. See 2026-03-03_servo_wgpu_upgrade_audit_report.md.
Execution posture update (2026-03-14):
- Do renderer development and most validation in upstream WebRender first.
- Use Servo as the thinnest possible integration consumer.
- Use Graphshell as the final downstream validation environment.
- Avoid a long-lived behavioral Servo fork unless Cargo/source topology makes it unavoidable.
Current-state correction (2026-04-02):
- The local WebRender
wgpu-backend-0.68-minimalbranch is no longer a hypothetical starting point; it already executes real wgpu rendering work. - The current implementation is a hybrid backend shape centered on
Rendererowning bothdevice: Option<Device>andwgpu_device: Option<WgpuDevice>, with wgpu execution routed through dedicated renderer methods rather than a settled backend-neutral executor seam. - Several renderer subsystems still expose placeholder
Wgpu(...)variants or compatibility carriers, so the branch should be treated as a proof backend, not as the final architecture. - GL remains mandatory for parity testing, fallback behavior, compositor maturity, and richer diagnostics while the wgpu path converges.
Progress update (2026-04-03):
- Branch
wgpu-device-renderer(commitda8eb84+ uncommitted P7 work) now passes 225/366 reftests (61%) via the built-inwrench --wgpu reftestharness. - All
BatchKindvariants are now routed to correct shaders (Quad, Brush, TextRun, SplitComposite). - Offscreen rendering (
alpha_batch_containers) is now dispatched for texture cache targets, unblocking filters, gradients, and blend modes that render through intermediate surfaces. - SVG filter shaders (
cs_svg_filter,cs_svg_filter_node) are compiled and dispatched, with CPU-side repacking of u16 instance fields to i32 for wgpu vertex attribute compatibility. - Texture-to-texture blits implemented via
copy_texture_to_texture. - Shader compilation moved to a 16 MB stack thread to work around naga recursive-descent stack overflow on large transpiled WGSL (pre-existing bug).
- Primary remaining gaps:
resolve_ops,clip_masks(ClipMaskInstanceList), subpixel text (dual-source blending), and encoder-invalid validation errors on some targets. - Detailed breakdown in
2026-04-03_p6_progress_report.md.
Current-state refresh (2026-04-09):
- Current branch tip:
wgpu-backend-0.68-experimental@e403cc14b. - The specific gaps called out in the 2026-04-03 snapshot are now closed in-tree:
resolve_ops,clip_masks, subpixel text viaDUAL_SOURCE_BLENDING, YUV external-surface compositing, realGraphicsApi::Wgpumetadata, and gracefulCompositorConfig::Nativefallback inwgpu-only mode. - Focused validation on the current checkout is green:
cargo check -p webrender --features wgpu_backend,cargo test -p webrender --features wgpu_backend --test wgpu_shared_device, andcargo run -p webrender-examples --bin wgpu_headless --features wgpu_backend. - WGSL translation now reports
68/68variants succeeding in the local build. - The branch should no longer be described as a mere proof backend. It is better characterized as a functional renderer backend with remaining work concentrated in full-suite parity refresh, embedder integration validation, and cleanup of hybrid GL/wgpu compatibility carriers.
- Recent branch history reports substantially higher reftest completion than the 2026-04-03 note
(for example commit
366ed169ereports434/441 pass), but this document should treat that figure as provisional until a fresh fullwrench --wgpu reftestrun is captured against the current branch tip.
The original phase map below remains valuable as a dependency and validation inventory, but it is no longer the only reasonable ordering for active work. Current execution should be organized around four convergence tracks.
Replace stringly pipeline, shader, and layout metadata with typed descriptors where practical. Immediate targets include:
- pipeline identity and blend/depth variants,
- shader family / batch-kind mapping,
- resource binding declarations,
- compatibility metadata currently inferred from shader-name prefixes.
Move toward backend-specific execution seams subsystem-by-subsystem instead of attempting a single renderer-wide abstraction jump. Immediate seam candidates are:
- texture cache updates,
- GPU cache uploads,
- frame-data texture or buffer upload,
- pass-local draw submission.
The goal is to make backend ownership honest without freezing the branch behind a large trait-extraction rewrite.
Keep GL available as:
- the correctness oracle for pixel and geometry comparisons,
- the fallback backend when wgpu parity is incomplete,
- the richer diagnostics path for capture/profiler/compositor investigation.
All active wgpu work should improve, not weaken, the ability to compare behavior against GL.
Continue to keep:
- Servo integration thin and primarily dependency-topology oriented,
- Graphshell bridge contracts stable,
- Graphshell compositor assumptions anchored to the existing three-pass model,
-
GlowCallbackas the current production-safe bridge policy until readiness evidence closes.
Use the convergence tracks above as the default execution guide for current work.
Use the P0-P12 plan below as:
- a dependency checklist,
- a validation ledger,
- a risk inventory,
- a record of the fuller March target architecture.
Do not read P3 as meaning that a renderer-wide backend trait extraction must happen before
useful wgpu improvements can continue.
| Phase | Name | Gate | Produces | Risk |
|---|---|---|---|---|
| P0 | Dependency Audit + Version Alignment | G1 | Version compatibility matrix, patch path proof | Low |
| P1 | Upstream Reconnaissance | G1 | Coordination posture, duplicate-work avoidance | Low |
| P2 | Local Patch / Thin Integration Scaffold | G1 | Reproducible local WebRender checkout consumed by Servo/Graphshell with minimal integration delta | Medium |
| P3 | WebRender Backend Trait Extraction | G2 | Backend-agnostic Device/Renderer trait boundary inside WebRender |
High |
| P4 | Shader Translation Pipeline | G2 | GLSL โ WGSL build-time pipeline via naga; translated shader set | Medium |
| P5 | wgpu Device Implementation | G2, G3 |
WgpuDevice struct passing WebRender resource management tests |
High |
| P6 | wgpu Renderer Implementation | G2, G3 |
WgpuRenderer struct passing WebRender frame rendering tests |
High |
| P7 | Compositor Output Contract | G2, G3 |
wgpu::Texture handoff from WebRender to embedder |
High |
| P8 | Graphshell Integration Spike | G3, G4 | One composited Servo viewer in a wgpu-backed Graphshell frame | High |
| P9 | Pixel Parity Validation | G5 | Reference-set comparison between GL and wgpu paths | Medium |
| P10 | Performance Validation | G5 | Frame budget measurements meeting ยง5.3 targets | Medium |
| P11 | Platform Matrix Validation | G4 | Per-platform pass/fail evidence | Medium |
| P12 | Production Cutover Preparation | G1โG5 | Switch authorization evidence package | Low |
Interpretation note (2026-04-02): keep this phase map as the long-range inventory, but route
near-term work through Tracks C1-C4 above. In particular, P3 is now an optional convergence
destination, not the mandatory first major implementation step.
Objective: Determine whether Servo's wgpu version and Graphshell's target egui_wgpu
version are compatible, and document the exact dependency graph.
Entry conditions:
- Access to Servo main branch
Cargo.lock - Knowledge of target
egui_wgpuversion for Graphshell
Tasks:
| # | Task | Validation method |
|---|---|---|
| P0.1 | Run cargo tree -i wgpu --depth 4 against Servo main to extract wgpu version |
Version string captured |
| P0.2 | Run cargo tree -i wgpu --depth 4 against target egui_wgpu release |
Version string captured |
| P0.3 | Compare wgpu versions; document compatibility or skew | Written compatibility matrix |
| P0.4 | Identify naga version bundled with each wgpu release |
Version strings captured |
| P0.5 | If versions diverge: identify the minimum egui_wgpu release compatible with Servo's wgpu, or vice versa |
Candidate version pair documented |
| P0.6 | Document wgpu feature flags required by WebRender vs egui_wgpu | Feature flag comparison table |
| P0.7 | Produce dependency version alignment report | Report posted to #183
|
Exit criteria:
- Version compatibility matrix is complete and posted to tracker
- Either: versions align naturally, OR a specific version-unification patch is identified
- G1 evidence: dependency control validated for wgpu/naga/webrender_api
Rollback: No code changes; pure analysis. No rollback needed.
Answers research Q1, Q4 partially.
Objective: Determine the state of upstream wgpu renderer work in Servo/WebRender to avoid duplicating effort and identify coordination opportunities.
Entry conditions:
- None (can run in parallel with P0)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P1.1 | Search Servo GitHub issues/PRs for "wgpu renderer", "wgpu backend", "webrender wgpu" | Search results documented |
| P1.2 | Search WebRender crate issues for wgpu tracking | Search results documented |
| P1.3 | Check Wu Yu-Wei's public repositories and contributions for implementation progress | Activity documented |
| P1.4 | Check Servo Zulip/Matrix channels for wgpu renderer discussion threads | Summary documented |
| P1.5 | Determine upstream contributor(s) and their current focus | Contact points identified |
| P1.6 | Assess whether upstream work exists that Graphshell can consume directly | Consume/fork/build decision documented |
| P1.7 | Produce upstream coordination posture document | Posted to #183
|
Exit criteria:
- Upstream state is known: active work exists (coordinate), or no active work (proceed independently)
- Coordination posture decision is documented
Rollback: No code changes.
Answers research Q2.
Objective: Establish a reproducible build path that lets Graphshell exercise a local editable WebRender checkout through Servo while keeping Servo changes as thin as possible.
Entry conditions:
- P0 complete (version alignment known)
- P1 complete (upstream posture known โ determines whether local patching is sufficient or a thin integration branch is required)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P2.1 | Create or adopt a local editable WebRender checkout (../webrender or equivalent) |
Checkout exists and builds standalone |
| P2.2 | Point Servo at the local WebRender checkout through [patch.crates-io] or equivalent local override |
Cargo resolves webrender from the local checkout |
| P2.3 | If local patching is insufficient, create a thin Servo integration branch whose only job is dependency redirection and compatibility shims | Branch exists with narrowly scoped diff |
| P2.4 | Run cargo check -q with zero behavioral WebRender renderer changes |
Clean build |
| P2.5 | Run targeted Servo/Graphshell validation to prove the local checkout is in the dependency graph | Validation passes |
| P2.6 | Add a no-op wgpu_device module stub inside the local WebRender checkout to prove the patch slot works |
Module compiles, no behavior change |
| P2.7 | Document the patch path in a WEBRENDER_PATCH.md file |
Documentation exists |
| P2.8 | Document the rollback path: reverting the local override restores stock Servo consumption | Rollback tested and documented |
Exit criteria:
- Graphshell or targeted integration checks pass against the local WebRender checkout
- Rollback to stock Servo/WebRender consumption is proven
- G1 closed: dependency control and reproducibility demonstrated
Rollback: Revert the local override path (remove [patch.crates-io] or restore original
dependency source).
Objective: Inside the editable WebRender checkout, introduce a backend abstraction trait that the existing GL implementation satisfies, without changing any GL behavior. This creates the seam where the wgpu implementation plugs in.
Entry conditions:
- P2 complete (patch scaffold works)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P3.1 | Audit WebRender Device struct โ enumerate all public methods and GL calls |
Method inventory spreadsheet/doc (count, categorized by concern) |
| P3.2 | Audit WebRender Renderer struct โ enumerate pass scheduling, render target, and draw call patterns |
Method inventory |
| P3.3 | Define trait WrDevice abstracting the Device interface: texture ops, buffer ops, framebuffer ops, shader program ops, draw calls |
Trait definition compiles |
| P3.4 | Define trait WrRenderer abstracting the Renderer interface: pass scheduling, target management, command dispatch |
Trait definition compiles |
| P3.5 | Implement WrDevice for GlDevice wrapping the existing Device struct |
All existing WebRender tests pass |
| P3.6 | Implement WrRenderer for GlRenderer wrapping the existing Renderer struct |
All existing WebRender tests pass |
| P3.7 | Make WebRender generic over <D: WrDevice, R: WrRenderer> at the top-level integration points |
Compiles; all existing behavior preserved |
| P3.8 | Run Graphshell full test suite against the trait-extracted WebRender | All tests green, no rendering differences |
| P3.9 | Measure build time delta from trait extraction | Documented; acceptable if โค15% increase |
Exit criteria:
- WebRender compiles and passes all tests with trait-abstracted backend
- Existing GL path behavior is byte-identical (no rendering changes)
- The trait surface is documented with method-level doc comments
- G2 evidence: backend contract boundary exists
Rollback: Revert the WebRender checkout or upstream branch to the pre-extraction commit.
Risk mitigation: This is the highest-risk refactoring phase. If the Device/Renderer
interface proves too entangled for clean trait extraction:
- Fallback A: Extract only
Device, leaveRenderermonomorphic withmatchdispatch - Fallback B: Use a
BackendKindenum dispatch instead of trait generics (avoids monomorphization cost) - Fallback C: Skip trait extraction entirely; implement
WgpuDeviceas a parallel struct with a top-levelmatchat theRendererentry point (less elegant, equally functional)
Estimated scope: The research document notes ~200k lines in WebRender, but GL surface
is bounded to Device + Renderer. Trait extraction touches only those two structs plus
their callsites within WebRender. Expected diff: 2,000โ5,000 lines.
Objective: Translate WebRender's GLSL shaders to WGSL via naga at build time, producing a validated shader set for the wgpu backend.
Entry conditions:
- P3 complete (trait boundary exists, so shader requirements are fully enumerated)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P4.1 | Inventory all GLSL shaders in WebRender: count, vertex/fragment/compute classification, feature usage | Shader inventory document |
| P4.2 | Identify GLSL features used: extensions, precision qualifiers, built-in variables, texture sampling modes, UBO layouts, custom blend modes | Feature usage matrix |
| P4.3 | Set up a build.rs or standalone tool using naga to compile GLSL โ WGSL |
Tool runs, produces .wgsl files |
| P4.4 | Run naga translation on each shader; document any translation failures or warnings | Per-shader pass/fail log |
| P4.5 | For any shader that fails naga translation: identify the specific GLSL feature gap and produce a manual WGSL port | Manual port exists; compiles in naga |
| P4.6 | Validate translated shaders by loading them through wgpu::Device::create_shader_module() on a test device |
All shaders load without error |
| P4.7 | Cross-reference translated uniform/binding layouts against the WrDevice trait's expected bind group structure |
Layout compatibility confirmed |
| P4.8 | Store translated shaders in the fork under webrender/res/wgpu/ with a build-time inclusion mechanism |
Shaders available at compile time |
Exit criteria:
- All WebRender shaders have validated WGSL equivalents
- No runtime shader compilation is required
- Bind group layouts match the
WrDevicetrait expectations - G2 evidence: shader parity demonstrated
Rollback: Shader files are additive; removing the wgpu/ shader directory reverts.
Answers research Q5.
Objective: Implement WgpuDevice satisfying the WrDevice trait, providing wgpu
equivalents for all WebRender GPU resource management operations.
Entry conditions:
- P3 complete (trait boundary defined)
- P4 complete (shaders translated)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P5.1 | Implement WgpuDevice struct holding Arc<wgpu::Device>, Arc<wgpu::Queue>
|
Struct compiles |
| P5.2 | Implement texture management: create_texture, update_texture, delete_texture using wgpu::Texture + write_texture
|
Unit test: create โ write โ read-back matches input data |
| P5.3 | Implement atlas texture support for glyph and image caches | Unit test: atlas sub-region writes produce correct pixel data |
| P5.4 | Implement buffer management: vertex buffers (COPY_DST | VERTEX), instance buffers, index buffers |
Unit test: buffer write โ draw โ output matches expected geometry |
| P5.5 | Implement bind group management: uniform bind groups replacing GL UBOs | Unit test: uniform data accessible in shader |
| P5.6 | Implement render target management: wgpu::Texture with RENDER_ATTACHMENT | TEXTURE_BINDING for intermediate targets |
Unit test: render to intermediate โ sample from intermediate โ correct pixels |
| P5.7 | Implement shader program management: load translated WGSL shaders, create wgpu::RenderPipeline instances |
All pipelines create without error |
| P5.8 | Implement draw call dispatch: wgpu::RenderPass::draw() / draw_indexed() with correct vertex/instance counts |
Unit test: triangle renders correctly |
| P5.9 | Implement external image import stub (platform-specific; initially returns error) | Stub compiles; returns Err with documented reason |
| P5.10 | Run WrDevice trait compliance test suite (shared between GL and wgpu implementations) |
All trait tests pass for WgpuDevice
|
Exit criteria:
-
WgpuDevicepasses allWrDevicetrait compliance tests - Resource creation/destruction lifecycle is clean (no GPU resource leaks on drop)
- Device accepts an externally-provided
wgpu::Devicehandle (shared ownership model) - G2, G3 evidence: backend contract parity for device layer
Rollback: Remove WgpuDevice module from fork; GlDevice remains unaffected.
Objective: Implement WgpuRenderer satisfying the WrRenderer trait, orchestrating
wgpu render passes for WebRender's multi-pass architecture.
Entry conditions:
- P5 complete (
WgpuDevicepasses trait tests)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P6.1 | Implement shadow pass as wgpu::RenderPass with depth-only attachment |
Visual: shadow renders match GL reference |
| P6.2 | Implement opaque pass as wgpu::RenderPass with color + depth attachments |
Visual: opaque geometry renders correctly |
| P6.3 | Implement alpha pass as wgpu::RenderPass with blend state configuration |
Visual: transparency and blend modes match GL reference |
| P6.4 | Implement composite pass as final wgpu::RenderPass writing to the compositor output texture |
Visual: final composited frame renders |
| P6.5 | Implement per-pass load/store operations using wgpu::RenderPassDescriptor
|
Correct clearing and preservation behavior |
| P6.6 | Implement wgpu::CommandEncoder per-frame lifecycle: begin โ encode passes โ submit |
Frame renders end-to-end |
| P6.7 | Implement render target switching between passes (intermediate textures โ output) | Multi-pass rendering produces correct final output |
| P6.8 | Implement batch draw call encoding (WebRender batching โ wgpu draw calls) | Draw call counts match expectations |
| P6.9 | Implement texture sampling setup (glyph atlas, image cache, intermediate targets) | Text and images render correctly |
| P6.10 | Run WebRender integration test suite with WgpuRenderer + WgpuDevice
|
All integration tests pass |
Exit criteria:
-
WgpuRendererpasses allWrRenderertrait compliance tests - WebRender integration tests pass with wgpu backend selected
- Visual output for test scenes matches GL reference within ยง5.1 criteria (โค0.5% pixel diff)
- G2, G3 evidence: full backend contract parity
Rollback: Remove WgpuRenderer module; GlRenderer remains unaffected.
Status (2026-04-03): Active. The original task breakdown assumed a trait-based
WrRenderer which does not exist in the current branch shape. Instead, wgpu rendering
is implemented as parallel methods on Renderer (e.g. draw_passes_wgpu(),
draw_cache_target_tasks_wgpu()). Effective status of each sub-task:
| # | Status | Notes |
|---|---|---|
| P6.1 | N/A | WebRender doesn't have a separate "shadow pass" โ shadows are rendered via blur tasks in texture cache targets |
| P6.2 | Done | Opaque batches drawn front-to-back with depth write in both picture cache tiles and texture cache targets |
| P6.3 | Done | Alpha batches drawn back-to-front with per-batch blend mode via blend_mode_to_wgpu() (7 blend modes + fallback) |
| P6.4 | Done | Composite pass draws color tiles + textured tiles via CompositeFastPath shader to readback texture |
| P6.5 | Done | Per-target load/store ops: picture cache tiles clear to background color, texture cache targets use LoadOp::Load |
| P6.6 | Done | Encoder lifecycle: take_encoder() โ record passes โ return_encoder() โ flush_encoder() per pass group |
| P6.7 | Done | Render targets resolved via wgpu_texture_cache HashMap keyed by CacheTextureId
|
| P6.8 | Done | All BatchKind variants (Brush, Quad, TextRun, SplitComposite) routed to correct WgpuShaderVariant pipelines |
| P6.9 | Partial | Glyph atlas and image cache work via TextureBindings; some image variants (ANTIALIASING+REPETITION) not mapped |
| P6.10 | Active | The stale 225/366 snapshot is superseded: current branch work has closed resolve_ops, clip_masks, subpixel-text, YUV-composite, and graphics-metadata gaps. Focused local validation is green; next evidence needed is a fresh full wrench --wgpu reftest pass count on current HEAD |
Objective: Define and implement the embedder-facing interface for receiving a
wgpu::Texture from WebRender's compositor output, replacing the current GL FBO readback.
Entry conditions:
- P6 complete (wgpu renderer produces correct frames)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P7.1 | Define the CompositorOutputTexture type: wraps wgpu::TextureView + metadata (dimensions, format, generation counter) |
Type compiles |
| P7.2 | Implement WgpuRenderer::compositor_output() -> CompositorOutputTexture
|
Returns valid texture after frame render |
| P7.3 | Define the shared device handoff API (embedder provides Arc<wgpu::Device> + Arc<wgpu::Queue> to Servo at init) |
API signature defined |
| P7.4 | Implement Servo-side device acceptance: initialize_with_wgpu_device() (or equivalent patch) |
Servo accepts provided device; WebRender uses it |
| P7.5 | Prove zero-copy path: the wgpu::Texture returned by WebRender is directly bindable by the embedder without copy |
Test: bind the texture in an egui_wgpu callback; pixels are visible |
| P7.6 | Implement texture pool for compositor output (pre-allocate N textures; rotate per frame to avoid GPU stalls) | Pool allocates on init; rotates without frame drops |
| P7.7 | Implement fallback path: if shared device handoff fails, fall back to copy-based texture transfer | Fallback produces correct output (with measured latency) |
| P7.8 | Emit diagnostics for texture handoff: CHANNEL_COMPOSITOR_WGPU_HANDOFF_US_SAMPLE, CHANNEL_COMPOSITOR_WGPU_TEXTURE_POOL_HIT, CHANNEL_COMPOSITOR_WGPU_TEXTURE_POOL_MISS
|
Channels emit during rendering |
Exit criteria:
- Embedder receives a
wgpu::TextureViewfrom WebRender after each frame - Zero-copy path works when device is shared
- Copy-based fallback works when device is not shared
- Diagnostics channels emitting
- G2, G3 evidence: compositor output contract complete
Rollback: GL compositor output path remains available (feature-flagged).
Answers research Q6.
Objective: Produce one running Graphshell instance where a Servo viewer tile renders web content through the wgpu WebRender path, composited into an egui_wgpu frame.
Entry conditions:
- P7 complete (compositor output contract works)
- Graphshell
render_backendmodule hasBackendContentBridgeMode::WgpuPreferredFallbackGlowCallbackwired
Tasks:
| # | Task | Validation method |
|---|---|---|
| P8.1 | Wire BackendContentBridgePolicy to allow ExperimentalEnvRequestedMode when env var is set |
Policy responds to env var |
| P8.2 | Implement wgpu bridge path in CompositorAdapter: receive CompositorOutputTexture, bind as wgpu::TextureView in egui_wgpu paint callback |
Frame renders with web content visible |
| P8.3 | Remove GL state save/restore from wgpu bridge path (inherently isolated by command encoder scoping) | No GL calls on wgpu path |
| P8.4 | Verify chaos mode diagnostics: wgpu path should report no state violations (structural isolation) | Chaos mode passes with zero violations |
| P8.5 | Verify overlay affordance pass (Pass 3) renders over web content (Pass 2) on wgpu path | Focus ring visible over page content |
| P8.6 | Test with multiple composited tiles (at least 3 concurrent Servo viewers) | All tiles render correct content |
| P8.7 | Test tile resize behavior: resize tiles every frame, verify no frame drops or texture corruption | No visual artifacts during resize |
| P8.8 | Capture screenshot evidence for #180
|
Screenshots posted to tracker |
Exit criteria:
- Web content renders correctly in Graphshell through wgpu WebRender path
- Overlay affordances render correctly (Pass 3 over Pass 2)
- No GL state isolation violations
-
#180evidence posted with screenshots and measurements - G3, G4 evidence: pass-contract safety and platform confidence on primary platform
Rollback: Set env var back to default; Glow path activates; no behavior change for non-spike users.
Objective: Systematically compare wgpu path output against GL path output for a defined reference set, proving visual correctness.
Entry conditions:
- P8 complete (integration spike works)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P9.1 | Define reference URL set: 10 pages covering static HTML, CSS compositing, text-heavy, images, WebGPU canvas | URL list documented |
| P9.2 | Capture GL-path screenshots for each reference URL at 1920ร1080 | Reference images saved |
| P9.3 | Capture wgpu-path screenshots for each reference URL at same viewport | Comparison images saved |
| P9.4 | Run pixel diff analysis (per-pixel comparison with configurable tolerance) | Diff report generated |
| P9.5 | For each page: report pixel difference percentage | All pages โค0.5% difference |
| P9.6 | Investigate and document any differences >0.5%: identify root cause (float precision, shader translation, blend mode, etc.) | Root causes documented |
| P9.7 | Fix or accept documented differences | Acceptance decisions recorded |
| P9.8 | Run focus ring pass-order validation: verify Pass 3 affordances occlude Pass 2 content at expected positions | Pixel inspection confirms correct z-order |
Exit criteria:
- All reference pages within โค0.5% pixel difference or with documented acceptable deviations
- Focus ring z-order correct on wgpu path
- Pixel parity report posted to
#183 - G5 evidence: regression envelope validated
Rollback: N/A (measurement phase).
Objective: Measure wgpu path frame timings against the targets defined in research ยง5.3, proving the wgpu path meets frame budget requirements.
Entry conditions:
- P8 complete (integration spike works)
- Can run in parallel with P9
Tasks:
| # | Task | Validation method |
|---|---|---|
| P10.1 | Instrument wgpu CommandEncoder with timestamp queries for per-pass timing |
Timing data collected |
| P10.2 | Measure Servo wgpu render time per active tile at 1080p | Target: โค4 ms per tile |
| P10.3 | Measure texture handoff latency (WebRender queue.submit() to Graphshell texture bind) |
Target: โค0.5 ms |
| P10.4 | Measure resize handling: trigger tile rect changes every frame, measure texture recreation cost | Target: no frame drops |
| P10.5 | Measure device contention: graph canvas wgpu pass + Servo wgpu pass simultaneously | Target: no deadlock, no throughput regression |
| P10.6 | Compare total frame time (GL path vs wgpu path) for reference URL set | wgpu path โค110% of GL path total frame time |
| P10.7 | Profile GPU memory usage: texture pool overhead, buffer allocations | Documented; no memory leaks over 1000 frames |
| P10.8 | Produce performance validation report | Report posted to #183 and #180
|
Exit criteria:
- All frame budget targets met or exceeded
- No deadlocks under concurrent tile rendering
- No GPU memory leaks
- Performance report posted with raw data
- G5 evidence: performance within regression envelope
Rollback: N/A (measurement phase).
Objective: Validate the wgpu WebRender path on each target platform, documenting pass/fail and platform-specific limitations.
Entry conditions:
- P8 complete (integration spike works on primary platform)
Tasks:
| # | Task | Validation method |
|---|---|---|
| P11.1 | Windows (DX12): full spike + pixel parity + performance | Pass/fail with evidence |
| P11.2 | Windows (Vulkan fallback): full spike | Pass/fail with evidence |
| P11.3 | macOS (Metal): full spike + pixel parity | Pass/fail with evidence |
| P11.4 | Linux (Vulkan): full spike + pixel parity | Pass/fail with evidence |
| P11.5 | Linux (GL fallback): document expected behavior (GLโwgpu copy path or GL-only) | Documented limitation |
| P11.6 | For each platform: test external image import (DXGI shared handle / MTLTexture IOSurface / VkImage DMABuf) | Per-platform capability documented |
| P11.7 | For each platform: test GL fallback path when wgpu is unavailable | Fallback activates correctly |
| P11.8 | Produce platform matrix report | Report posted to #183
|
Exit criteria:
- Primary platform (Windows DX12) passes all validation
- Each platform has explicit pass/fail documentation
- Platform-specific limitations are tagged as switch blockers or non-blockers
- GL fallback works on all platforms
- G4 closed: platform confidence established
Rollback: N/A (validation phase).
Answers research Q3 partially.
Objective: Assemble all evidence required for switch authorization and prepare the
runtime policy change from GlowBaseline to wgpu-primary.
Entry conditions:
- P9, P10, P11 complete (all validation phases done)
- G1โG5 all closed with linked evidence
Tasks:
| # | Task | Validation method |
|---|---|---|
| P12.1 | Verify G1โG5 closure: each gate has linked tracker evidence | Gate checklist complete |
| P12.2 | Change active_backend_content_bridge_policy() to return WgpuPrimaryFallbackGlow (new policy variant) |
Policy change compiles |
| P12.3 | Add BackendContentBridgePolicy::WgpuPrimaryFallbackGlow variant: selects wgpu when capable, falls back to Glow when not |
Policy logic tested |
| P12.4 | Update BackendContentBridgeCapabilities probe to detect actual wgpu WebRender availability |
Probe returns correct capability state |
| P12.5 | Run full test suite with wgpu-primary policy | All tests green |
| P12.6 | Run headed smoke tests with wgpu-primary policy | Visual correctness confirmed |
| P12.7 | Document rollback procedure: revert policy function to return GlowBaseline
|
Rollback tested |
| P12.8 | Produce switch authorization receipt for #183
|
Receipt posted with all evidence links |
| P12.9 | Post final #180 closure comment with measured evidence |
#180 closure criteria met |
Exit criteria:
- Switch authorization receipt posted with G1โG5 evidence
-
#180closed with measured evidence - Glow retirement timeline documented (not immediate โ retained as fallback until one full release cycle confirms stability)
Rollback: Change active_backend_content_bridge_policy() back to GlowBaseline. One line change.
P0 (Dependency Audit)
P1 (Upstream Recon)
โ โ
โโโโโโฌโโโโโ
โ
P2 (Fork + Patch)
โ
P3 (Trait Extraction)
โ
P4 (Shader Translation)
โ
P5 (wgpu Device)
โ
P6 (wgpu Renderer)
โ
P7 (Compositor Output)
โ
P8 (Integration Spike)
/ โ \
/ โ \
P9 P10 P11
(Parity)(Perf)(Platform)
\ โ /
\ โ /
P12 (Cutover)
P0 and P1 run in parallel. P2 depends on both. P3โP7 are strictly sequential. P9, P10, P11 can run in parallel after P8. P12 depends on all three.
This section specifies the GPU device ownership model referenced throughout the phases.
Graphshell (main)
โโโ creates wgpu::Instance
โโโ selects wgpu::Adapter
โโโ creates wgpu::Device + wgpu::Queue
โ
โโโ passes Arc<wgpu::Device> + Arc<wgpu::Queue> to:
โ โโโ egui_wgpu::Renderer (Graphshell's UI rendering)
โ โโโ Servo/WebRender (compositor output rendering)
โ โโโ Future compute workloads (Burn, AI inference)
โ
โโโ owns device lifetime (drop on app exit)
| Decision | Choice | Rationale |
|---|---|---|
| Who creates the device? | Graphshell | Embedder must control device lifetime and feature requirements |
| How does Servo receive the device? |
Arc<wgpu::Device> passed at Servo initialization |
Zero-copy texture sharing requires same device |
| How does egui_wgpu get the device? | Use egui_wgpu::Renderer directly (not egui_wgpu::winit::Painter) |
Painter creates its own device; Renderer accepts an external one |
| What happens if shared device init fails? | Fall back to separate devices + copy-based texture transfer | Correctness preserved; performance degraded |
The egui_wgpu::Renderer constructor accepts:
pub fn new(
device: &wgpu::Device,
output_color_format: wgpu::TextureFormat,
output_depth_format: Option<wgpu::TextureFormat>,
msaa_samples: u32,
dithering: bool,
) -> SelfThis means Graphshell can create the wgpu::Device first, then pass a reference to both
egui_wgpu::Renderer and to Servo's WebRender initialization. The Servo side requires
a new API or patch to accept the device (see P7.3โP7.4).
Phases P3 and P5 require a detailed audit of WebRender's GL usage. This template defines what the audit must capture.
| GL Operation Category | Representative Calls | wgpu Equivalent | Notes |
|---|---|---|---|
| Texture creation |
glGenTextures, glTexImage2D, glTexSubImage2D
|
device.create_texture(), queue.write_texture()
|
Atlas textures need sub-region update |
| Texture binding |
glBindTexture, glActiveTexture
|
Bind groups; set_bind_group() in render pass |
No direct equivalent to glActiveTexture; bind groups are explicit |
| Buffer creation |
glGenBuffers, glBufferData, glBufferSubData
|
device.create_buffer(), queue.write_buffer()
|
Usage flags: VERTEX, INDEX, UNIFORM, COPY_DST
|
| VAO setup |
glVertexAttribPointer, glEnableVertexAttribArray
|
VertexBufferLayout in RenderPipelineDescriptor
|
Static layout; defined at pipeline creation |
| Framebuffer ops |
glGenFramebuffers, glBindFramebuffer, glFramebufferTexture2D
|
RenderPassDescriptor targeting TextureView
|
Per-pass, not persistent FBO objects |
| Shader compilation |
glCreateShader, glShaderSource, glCompileShader, glLinkProgram
|
device.create_shader_module(), device.create_render_pipeline()
|
Build-time via naga (P4) |
| State setting |
glEnable/glDisable, glBlendFunc, glDepthFunc, glScissor, glViewport
|
Pipeline state in RenderPipelineDescriptor; scissor rect in set_scissor_rect()
|
Static per-pipeline or dynamic per-render-pass |
| Draw calls |
glDrawArrays, glDrawElements, glDrawArraysInstanced
|
render_pass.draw(), render_pass.draw_indexed()
|
Direct mapping |
| Readback | glReadPixels |
buffer.map_async() + staging buffer copy |
Async; needs fence or callback |
| Clear |
glClear, glClearColor
|
RenderPassDescriptor::color_attachments[].ops.load = LoadOp::Clear |
Per-pass clear via load op |
| WebRender Pass | GL Operations | wgpu Pass Structure |
|---|---|---|
| Shadow pass | Depth-only FBO, depth writes |
RenderPass with depth attachment only; LoadOp::Clear depth |
| Opaque pass | Color + depth FBO, depth test, no blending |
RenderPass with color + depth; blend state disabled |
| Alpha pass | Color FBO, depth test, blending enabled, blend func per-batch |
RenderPass with color; blend state per-pipeline |
| Composite pass | Full-screen quad, texture sample from intermediate targets |
RenderPass targeting output texture; bind intermediate as input |
| Risk | Likelihood | Impact | Mitigation | Phase |
|---|---|---|---|---|
| wgpu version skew between Servo and egui_wgpu | Medium | High (blocks integration) | P0 audits this first; version unification patch if needed | P0 |
| Upstream Servo starts competing wgpu renderer work | Low | Medium (coordination overhead) | P1 monitors upstream; align rather than compete | P1 |
WebRender Device/Renderer too entangled for clean trait extraction |
Medium | High (P3 redesign) | P3 defines three fallback approaches (enum dispatch, partial extraction, parallel struct) | P3 |
| Shader translation via naga fails for custom WebRender blend modes | Medium | Medium (manual porting) | P4 allows manual WGSL ports for failing shaders | P4 |
| Windows DX12 external image import limitation | Medium | Low (copy fallback exists) | P7 implements copy-based fallback; P11 documents limitation | P7, P11 |
egui_wgpu::Renderer cannot accept externally-created device |
Low | High (ownership model breaks) | P8 validates this early; fallback: use egui_wgpu at lower level |
P8 |
| Frame budget regression on wgpu path | Medium | Medium (delays cutover) | P10 measures early; optimize before P12 | P10 |
| Upstream Servo API change breaks fork patch | Medium | Medium (rebase work) | Pin fork to specific Servo commit; rebase on schedule | P2 ongoing |
This matrix maps each research document QA criterion (ยง5) to the phase that validates it.
| QA Criterion | Research ยง | Validation Phase | Method |
|---|---|---|---|
| Static page pixel parity | ยง5.1 | P9 | Screenshot diff โค0.5% |
| Glyph rendering parity | ยง5.1 | P9 | Visual comparison |
| CSS compositing parity | ยง5.1 | P9 | Blend mode pixel diff |
| WebGPU canvas correctness | ยง5.1 | P9 | Canvas content pixel verification |
| Focus ring pass-order | ยง5.1 | P8, P9 | Pixel inspection: ring over content |
| GL state isolation (wgpu: structural) | ยง5.2 | P8 | Chaos mode: zero violations on wgpu |
| Servo wgpu render time | ยง5.3 | P10 | Timestamp queries โค4 ms/tile |
| Texture handoff latency | ยง5.3 | P10 | Measured โค0.5 ms |
| Resize handling | ยง5.3 | P10 | No frame drops |
| Device contention | ยง5.3 | P10 | No deadlock, no regression |
| GL fallback on capability miss | ยง5.4 | P11 | Probe returns false; Glow activates |
| Mixed-mode stability | ยง5.4 | P11 | No corruption with mixed paths |
| Capability probe accuracy | ยง5.4 | P12 | Probe matches actual availability |
| Replay diagnostics parity | ยง5.5 | P7, P8 | wgpu channels emit; inspector shows data |
Per research ยง6, upstreaming is staged after local evidence.
| Stage | Timing | Action | Evidence required |
|---|---|---|---|
| U0 Monitor | P1 onward | Watch Servo upstream for wgpu renderer activity | None |
| U1 Spike evidence | After P8 | Share integration spike results with Servo contributors | Screenshots, measurements, architecture description |
| U2 Trait extraction PR | After P3 proven stable | Propose backend trait extraction to Servo/WebRender | P3 exit criteria met; WebRender tests passing |
| U3 wgpu Device/Renderer PR | After P6 proven stable | Propose wgpu backend addition (feature-gated) | P6 exit criteria met; pixel parity initial evidence |
| U4 Embedding API PR | After P7 proven | Propose initialize_with_wgpu_device() embedding API |
P7 exit criteria met; shared device model validated |
| U5 Full upstream merge | After P12 | Retire local fork; consume upstream directly | All gates closed; upstream has equivalent capability |
Fallback: If upstream declines trait extraction (U2) or wgpu backend (U3), Graphshell
maintains the fork indefinitely. The [patch.crates-io] mechanism (P2) supports this
without architectural compromise. The C+F contract ensures the GL path remains valid
regardless of upstream decisions.
These are rough order-of-magnitude estimates. Actual effort depends on WebRender internals discovered during P3.
| Phase | Estimated effort | Confidence | Notes |
|---|---|---|---|
| P0 | 1โ2 days | High | Dependency analysis and documentation |
| P1 | 1โ2 days | High | Research and coordination |
| P2 | 2โ3 days | High | Fork setup and build verification |
| P3 | 2โ4 weeks | Medium | Largest refactoring risk; depends on WebRender coupling |
| P4 | 1โ2 weeks | Medium | Shader count and naga coverage determine effort |
| P5 | 2โ4 weeks | Low | Depends on P3 trait surface size |
| P6 | 2โ4 weeks | Low | Depends on pass complexity discovered in P3 |
| P7 | 1โ2 weeks | Medium | Servo embedding API change is the hard part |
| P8 | 1โ2 weeks | Medium | Integration plumbing |
| P9 | 3โ5 days | High | Mostly automated comparison |
| P10 | 3โ5 days | High | Instrumentation and measurement |
| P11 | 1โ2 weeks | Medium | Platform access and CI setup |
| P12 | 2โ3 days | High | Evidence assembly and policy change |
Total estimated range: 3โ5 months of focused work (one contributor).
Critical path: P3 (trait extraction) is the gating phase. If P3 proves tractable (โค2 weeks), the total estimate compresses. If P3 reveals deep entanglement requiring fallback approaches, the total estimate extends.
| Gate | Closing phase(s) | Evidence type |
|---|---|---|
| G1 (Dependency control) | P0, P2 | Version matrix, build proof, rollback proof |
| G2 (Backend contract parity) | P3, P4, P5, P6, P7 | Trait tests, shader parity, device/renderer compliance |
| G3 (Pass-contract safety) | P5, P6, P7, P8 | Pass-order tests, chaos mode, overlay affordance z-order |
| G4 (Platform confidence) | P8, P11 | Per-platform pass/fail evidence |
| G5 (Regression envelope) | P9, P10 | Pixel parity report, performance report |
This plan is a detailed execution of the C+F contract's F (Fallback-safe) leg:
- Every phase preserves the GL path as a working fallback
- The wgpu path is feature-gated and policy-controlled throughout
-
active_backend_content_bridge_policy()remainsGlowBaselineuntil P12 switch authorization
All feature work during plan execution must continue to follow the guardrails in
2026-03-01_webrender_readiness_gate_feature_guardrails.md:
- No new renderer-specific coupling in UI/workflow code
- Bridge metadata preservation for any render-path work
- Fallback-safe behavior for any new capability
- Receipt-linked evidence for migration-adjacent slices
- Primary tracker:
#183(backend migration) - Spike evidence target:
#180(runtime-viewer GLโwgpu bridge) - Related lanes:
#88(stabilization),#90(embedder-debt),#92(viewer-platform),#99(spec-code-parity) - Foundation issues:
#166(replay traces),#171(chaos mode),#167(differential composition),#168(GPU budget),#169(hot-swap),#170(telemetry schema)
| Date | Decision | Rationale |
|---|---|---|
| 2026-03-01 | Adopt parallel Device + Renderer wgpu implementation inside WebRender (research ยง2.3) |
Cleanest #180 closure; eliminates GL state chaos; aligns with upstream intent |
| 2026-03-01 | Graphshell owns wgpu::Device; Servo receives shared handle (research ยง3.5) |
Enables zero-copy texture handoff; aligns with egui_wgpu::Renderer constructor model |
| 2026-03-01 | Use naga for build-time GLSLโWGSL shader translation (research ยง3.4) | Zero runtime cost; naga is wgpu's native compiler; catches errors at build time |
| 2026-03-01 | Fork-and-patch for spike; upstream-first for stable changes (research ยง6.1) | Evidence-first approach; avoids premature upstream proposals |
| 2026-03-01 | GL path retained behind feature flag indefinitely during bring-up (C+F contract) | Fallback-safe; no forced migration until all gates close |
These remain open. Each is mapped to the phase that will resolve it.
| # | Question | Resolving phase |
|---|---|---|
| Q1 | Servo's wgpu version vs egui_wgpu's wgpu version compatibility | P0 |
| Q2 | Active upstream Servo/WebRender wgpu renderer work? | P1 |
| Q3 | Windows DX12 external image import support in wgpu? | P11 |
| Q4 | Can egui_wgpu::Renderer accept Graphshell-owned device? |
P0 (API check), P8 (integration proof) |
| Q5 | WebRender shader inventory: count, GLSL feature usage, naga translation coverage? | P4 |
| Q6 | Is Servo OffscreenRenderingContext extensible to return wgpu::Texture? |
P7 |