2026 04 20_graphshell_gpu_spec - mark-ik/graphshell GitHub Wiki
Date: 2026-04-20
Status: Skeleton spec / not yet actionable. Intended to be fleshed out
when trigger conditions fire (see ยง11). Until then this document captures the
intended shape so code written elsewhere doesn't foreclose design options.
Scope: Define the responsibilities, boundaries, core types, key design
decisions, and dependency topology for a future graphshell-gpu crate that
serves as the single GPU resource authority for all wgpu consumers in
Graphshell.
Related docs:
-
2026-04-16_middlenet_lane_architecture_spec.mdยง4.6 โ original sketch ofgraphshell-gpuresponsibilities as "shared host plumbing" -
../implementation_strategy/2026-04-20_middlenet_direct_lane_v1_5_plan.mdยง2.6 โ forward-compatible types (FontHandle,ImageRef,OffscreenTarget) reserved inmiddlenet-renderpending this extraction -
../research/2026-04-14_wasm_portable_renderer_feasibility.mdโ WASM envelope GPU constraints (WebGPU vs WebGL2, async init, workers) -
../research/2026-04-16_rendering_architecture_vision.mdโ unifying render pipeline vision -
design_docs/verso_docs/โ Verso as cross-engine dispatcher (consumer ofgraphshell-gpu, not a peer)
Graphshell has many wgpu consumers. Without a central resource authority, each independently creates devices, allocates glyph atlases, decodes and caches images, and makes frame-timing decisions. Consequences:
- Redundant GPU resources: duplicate glyph atlases for the same fonts, duplicate image decodes, duplicate pipeline state objects.
- No cross-renderer texture handoff: when Servo composites a webview into a texture and the graph canvas wants that texture as a node-face surface, there is no canonical contract โ today it requires copying or bespoke plumbing.
- Uncoordinated frame pacing: each subsystem vsyncs on its own schedule; under load, subsystems can starve each other or cause visible tearing at handoff boundaries.
- Duplicated offscreen paint paths: previews, thumbnails, export surfaces each build their own offscreen infrastructure.
- Fragmented WASM story: each subsystem has its own "native vs browser" branching.
graphshell-gpu is the single GPU resource authority every wgpu consumer
borrows from. It owns device lifecycle, glyph/image caches, the offscreen
worker pool, frame scheduling, and cross-renderer texture handoff contracts.
It does not own scene-building logic, format decoding, or content semantics.
There SHALL be exactly one GpuHost per Graphshell process. All wgpu
consumers (WebRender fork, vello, iced-wgpu chrome, middlenet-render, Servo
compositor, burn/cubecl) obtain GPU resources from it. No consumer creates
its own wgpu::Instance or Adapter.
The rendering pipeline and ML compute pipeline MAY use separate wgpu::Device
instances so that long-running ML kernels do not block interactive paint
work. They SHOULD share the wgpu::Instance and wgpu::Adapter.
Glyph atlases, decoded images, and commonly-used pipeline objects SHALL live
in single canonical caches and be referenced by handle. FontHandle,
ImageRef, and similar handle types are the public currency of resource
sharing; raw wgpu::Texture and Buffer references SHOULD NOT leak across
consumer boundaries except through explicit handoff contracts (ยง6.3).
graphshell-gpu does not know about HTML, SemanticDocument, graph node
layout, ML tensors, or any content domain. It offers primitives; consumers
build scenes.
Native-only and browser-only differences SHALL be confined to initialization and worker-pool internals. The public surface consumers see is the same.
-
wgpu::Instance,wgpu::Adapter,wgpu::Device(s),wgpu::Queue(s) lifecycle - Font registry (shaping engine + glyph atlas) โ
FontHandleis opaque - Image decode and cache โ
ImageRefis opaque - Offscreen paint worker pool โ scenes in, raster images out
- Frame scheduling / vsync coordination across render subsystems
- Cross-renderer texture handoff contracts โ
TextureBridge - Surface lifecycle (window and headless) โ
SurfaceTarget - Capability reporting (features, limits, backend, WASM vs native)
- Scene-building logic โ per-renderer (middlenet-render, WebRender, vello)
- Format decoding โ
middlenet-formats - Content semantics โ
middlenet-core - HTTP / transport โ
middlenet-transport,graphshell-comms - Cross-engine dispatch โ
verso - Shell/app state, workbench, navigator
- ML workload graphs โ burn / cubecl own their own compute graphs;
graphshell-gpuonly gives them a device
All types below are sketches, not API commitments. Final shapes emerge during extraction.
pub struct GpuHost {
pub instance: wgpu::Instance,
pub adapter: wgpu::Adapter,
pub render: GpuRenderContext {
pub device: Arc<wgpu::Device>,
pub queue: Arc<wgpu::Queue>,
},
pub compute: Option<GpuComputeContext {
pub device: Arc<wgpu::Device>,
pub queue: Arc<wgpu::Queue>,
}>,
pub fonts: FontRegistry,
pub images: ImageCache,
pub offscreen: OffscreenPool,
pub scheduler: FrameScheduler,
pub bridge: TextureBridge,
pub capabilities: HostCapabilities,
}GpuHost is constructed once at app start (async on WASM, sync on native)
and passed by Arc<GpuHost> everywhere.
Owns font loading (via parley / swash), shaping, and glyph atlas management.
pub struct FontHandle(u64);
impl FontRegistry {
pub fn register(&self, spec: FontSpec) -> FontHandle;
pub fn shape(&self, handle: FontHandle, text: &str, size: f32)
-> ShapedRun;
pub fn atlas(&self) -> &GlyphAtlas; // read-only for renderers
}WebRender, vello, iced-wgpu, and middlenet-render all read from the same glyph atlas. Font collection is shared; re-shaping the same text at the same size in two renderers is a cache hit.
pub struct ImageRef(u64);
impl ImageCache {
pub fn decode_async(&self, source: ImageSource)
-> impl Future<Output = Result<ImageRef, ImageError>>;
pub fn texture(&self, handle: ImageRef) -> Option<Arc<wgpu::Texture>>;
pub fn dimensions(&self, handle: ImageRef) -> Option<(u32, u32)>;
}LRU eviction with pinning support. Images decoded once, reused across renderers and across resolutions (mipmaps on demand).
pub trait OffscreenTarget {
fn paint(&mut self, scene: &RenderScene) -> Result<(), OffscreenError>;
fn into_image(self) -> Result<RasterImage, OffscreenError>;
}
impl OffscreenPool {
pub fn submit(&self, job: OffscreenJob)
-> impl Future<Output = Result<RasterImage, OffscreenError>>;
pub fn cancel(&self, job_id: OffscreenJobId);
}Worker pool sized to available cores on native; Web Workers + OffscreenCanvas on WASM. Prioritizable (ยง5.2): interactive previews beat background thumbnails.
pub enum PaintPriority {
UserInput, // triggered by current user gesture
Animation, // continuous anim (scroll, physics)
Background, // preview/thumbnail work
}
pub trait FramePainter {
fn paint(&mut self, ctx: FrameContext);
fn priority(&self) -> PaintPriority;
}
impl FrameScheduler {
pub fn register(&self, painter: Arc<dyn FramePainter>);
pub fn request_frame(&self, priority: PaintPriority);
pub fn present(&self);
}Priority-queue model (ยง5.2).
Cross-renderer texture handoff. Required so Servo's composited webview can become a node-face texture on the graph canvas without CPU round-trip.
pub trait TextureSource {
fn texture(&self) -> Arc<wgpu::Texture>;
fn dimensions(&self) -> (u32, u32);
fn frame_counter(&self) -> u64; // for invalidation
}
impl TextureBridge {
pub fn publish(&self, producer_id: ProducerId,
source: Arc<dyn TextureSource>);
pub fn subscribe(&self, producer_id: ProducerId)
-> BridgeSubscription;
}Zero-copy where the render device is shared; explicit cross-device copy when it isn't (ยง5.3).
pub struct HostCapabilities {
pub backend: wgpu::Backend,
pub envelope: Envelope, // Native | Wasm
pub features: wgpu::Features,
pub limits: wgpu::Limits,
pub has_compute_device: bool,
pub max_texture_size: u32,
pub supports_timestamp_queries: bool,
}Surfaced so consumers can gate behavior (e.g., "this node wants compute; is a compute device available?").
Decision (leaning): share wgpu::Instance and wgpu::Adapter; provision
separate wgpu::Devices for rendering and compute so ML kernels don't
serialize behind interactive paints.
Why not one device? wgpu has one queue per device. All rendering and all ML compute serialize on a single queue โ long burn kernels cause visible jank.
Why not many devices? Devices are not free. Cross-device texture sharing requires explicit export/import and extra synchronization. Keep the device count small: render (shared by WebRender/vello/iced/middlenet-render) + compute (shared by burn/cubecl) is the baseline. Additional isolated devices (e.g., per-Servo-webview) only if a specific reason emerges.
Open: Should per-webview Servo compositing get its own device for isolation? Tentatively no โ keep it on the render device, use the texture bridge for handoff.
Decision (leaning): subsystems register as FramePainters with a
PaintPriority. Each vsync, the scheduler:
- drains any
UserInput-priority paints first (must land this frame), - runs
Animationpaints as time allows, - yields remaining time to
Backgroundoffscreen work.
Why not free-running? Subsystems starving each other under load.
Why not central beat? Adds latency to user-input paths when they should be first in line.
Open: Should background offscreen work live in OffscreenPool on workers
(off the vsync) entirely, with FrameScheduler only coordinating
on-screen paint? Probably yes โ cleaner separation.
Decision (leaning):
- When producer and consumer share the same
wgpu::Device, hand offArc<wgpu::Texture>directly. Zero-copy. - When they don't (e.g., compute device produces, render device consumes), use explicit copy at the bridge; potentially via shared memory where the platform supports it.
- Never expose raw textures outside the bridge; always via
TextureSourcewith aframe_counterso consumers can invalidate.
Why this matters: determines whether Graphshell can run 40 live webview nodes smoothly or chokes at 10. The first real use case will force specifics; design the contract now so it's not retrofitted.
Decision: public API identical across native and WASM. Divergence confined to:
- Initialization:
async fn GpuHost::new_wasm(canvas)vsfn GpuHost::new_native(). -
OffscreenPoolinternals: Web Workers + OffscreenCanvas on WASM; OS threads on native. - Image decode: browser-native
createImageBitmapon WASM;imagecrate decoders on native.
Rationale: lets the same consumer code (middlenet-render, vello, etc.) run in both envelopes without cfg-heavy branching.
Decision (recorded 2026-04-28): parley for shaping; swash for glyph rasterization on graphshell-gpu-owned paths. Atlas ownership is scoped per render path, not global:
- Direct Lane / vello and any graphshell-gpu-rendered content:
graphshell-gpu owns the atlas, keyed by
(font_id, glyph_id, size_bucket, subpixel_pos). - WebRender path (HTML Lane and any webrender-wgpu-rendered content):
webrender-wgpu owns the atlas internally; graphshell-gpu does not
duplicate it. See
../../../../webrender-wgpu/wr-wgpu-notes/2026-04-28_idiomatic_wgsl_pipeline_plan.mdยง10 Q14.
WebRender does not shape โ embedders submit pre-shaped glyph runs via its display-list API โ so parley sits above webrender-wgpu in the HTML Lane stack (Stylo โ Taffy โ Parley โ webrender-wgpu) without API conflict.
Why not cosmic-text? Parley is where the Servo/vello ecosystem is converging. Cosmic-text is nice but diverges from Servo's direction.
Decision (leaning): size-bounded LRU with pinning.
-
ImageRefis refcounted internally; hard references pin. - Soft references (via a weaker handle type) are evictable.
- Pinning is how live webview thumbnails stay hot; LRU handles historical previews.
Open: Is a second tier of disk-backed cache warranted, or is reload from source fast enough? Probably yes for decoded images from network sources.
wgpu-
parley(text shaping) -
swash(glyph rasterization) -
image(native image decoding) โ feature-gated off in WASM -
lruormoka(caches) -
futures(async traits)
| Crate / component | Uses |
|---|---|
| WebRender fork | render Device/Queue, glyph atlas, image cache |
| vello integration | render Device/Queue, glyph atlas, image cache |
| iced-wgpu chrome | render Device/Queue, fonts |
middlenet-render |
FontHandle, ImageRef, OffscreenTarget, glyph atlas for direct scene painting |
| Servo compositor (in Verso) | render Device, TextureBridge (publishes webview textures) |
verso |
reads HostCapabilities for engine selection |
| Graph canvas backend | render Device, glyph atlas, image cache, TextureBridge (consumes webview textures for node faces) |
burn / cubecl
|
compute Device only; not the full host |
-
graphshell-gpuMUST NOT depend on any Graphshell content or app crate (nomiddlenet-*, noverso-*, no shell, no workbench). -
graphshell-gpuMUST NOT depend on format decoders or parsers. - Consumers MUST NOT bypass
graphshell-gputo create their own devices. -
TextureBridgeis the only sanctioned path for cross-renderer texture sharing.
WebRender currently assumes device ownership. Extracting
graphshell-gpu requires a fork-side refactor: WebRender takes an external
Arc<Device>, shares the glyph atlas with graphshell-gpu, and publishes
its composited output through TextureBridge where applicable.
This is the largest single piece of work in the extraction; it dominates the timeline.
Vello already accepts external devices. Integration is small: construct
vello's Renderer from host.render.device.clone() and wire its text path
through FontRegistry.
iced-wgpu has its own setup path but accepts external devices in some configurations. Integration is moderate โ mostly about sharing the font registry so iced's chrome and document content don't maintain parallel glyph atlases.
In v1.5 this crate owns FontHandle, ImageRef, OffscreenTarget as
opaque placeholders. On graphshell-gpu extraction these types move to
graphshell-gpu and middlenet-render re-exports or depends on them.
This is the cheapest migration of any consumer โ the whole point of
reserving those types in v1.5.
Servo publishes composited webview output via TextureBridge::publish.
Graph canvas subscribes to render those textures as node faces. This is the
primary use case that forces TextureBridge's design.
verso is a content-dispatch layer, not a GPU layer. It reads
HostCapabilities to decide engine routing (e.g., "Servo lane requires
compute device for WebGPU content" โ fall back to Wry if absent). It does
not own any GPU resources.
ML crates get the compute Device only. They do not participate in frame
scheduling. The compute device and render device may share the adapter but
not the queue โ ML work happens in parallel with paint.
GPU errors (device loss, out-of-memory, unsupported feature) surface through
typed errors: DeviceLost, AllocationFailed, FeatureUnsupported. Device
loss triggers a recreation protocol: GpuHost rebuilds, all consumers
reinitialize via a HostRebuildEvent.
graphshell-gpu MUST expose a diagnostics channel emitting: device memory
pressure, cache hit rates, offscreen queue depth, frame scheduler decisions,
texture bridge subscription counts. Plumbs into Graphshell's existing
DiagnosticChannelDescriptor system (see shell guidelines).
- Unit tests for
FontRegistryshaping cache,ImageCacheLRU, frame scheduler priority math. - Integration tests with a headless wgpu backend verifying device sharing and texture bridge contracts.
- WASM envelope tests deferred until the WASM envelope is building.
The crate SHOULD hold GPU work submission under:
- Frame submission latency: p99 < 2ms from
request_frameto queue submit (native). - Glyph shaping cache hit: p99 < 100ฮผs.
- Image decode (cached): p99 < 50ฮผs.
- Offscreen paint start latency: p99 < 10ms.
Concrete numbers land when real measurements exist; these are placeholders.
In browser WASM:
-
wgpu::Instancemaps to WebGPU (preferred) or WebGL2 (fallback). -
GpuHost::new_wasm(canvas)is async; awaits adapter and device. - No threads on wasm32-unknown-unknown default profile.
OffscreenPooluses Web Workers with OffscreenCanvas; image decode uses browser APIs via wasm-bindgen. -
FrameSchedulerties torequestAnimationFrameinstead of OS vsync. -
TextureBridgezero-copy works on WebGPU (sharedGPUTexture); on WebGL2 fallback it degrades to explicit copy. - All of this is confined to
graphshell-gpu; consumers see the same API.
- Specific scene-building logic for any renderer โ stays in that renderer's crate.
-
Format decoding โ
middlenet-formats. -
Content semantics โ
middlenet-core. -
Cross-engine content dispatch โ
verso. - ML workload graphs โ burn / cubecl own these.
-
HTTP / transport โ
graphshell-comms,middlenet-transport. - Shell / workbench / navigator state โ stays in shell crates.
-
Shader authoring IR โ the rust-gpu โ SPIR-V โ naga pipeline is its own
concern;
graphshell-gpuconsumes compiled shaders, doesn't own the pipeline that produces them.
This skeleton becomes an actionable plan when any of the following fire:
- Cross-surface drift observed. In v1.5 Step 6, observation cards, hover previews, search snippets, and feed tiles each maintain independent font/image state AND measurable duplication emerges (VRAM, CPU decode cost).
-
WebRender fork accepts external device. The fork stabilizes enough
that taking an
Arc<wgpu::Device>from outside is a small patch rather than a fight. -
First cross-renderer texture handoff use case lands. Most likely
Servo webview โ graph node face. This use case forces the
TextureBridgedesign decisions to concrete specifics. - ML workload contention. Burn kernels start visibly blocking paint, making the render/compute device split non-optional.
- WASM envelope work starts. Needing a single "native vs browser" seam for GPU resources rather than per-consumer branching.
Until then, v1.5's forward-compatible types (FontHandle, ImageRef,
OffscreenTarget in middlenet-render) carry enough design intent to keep
the option open without premature extraction.
- Per-webview Servo device? Tentatively no (use render device + texture bridge). Revisit if Servo's compositor contention with other renderers is observable.
- Disk-backed image cache tier? Probably yes for network-sourced images; deferred decision.
- Diagnostic surface depth. How much GPU introspection does the shell expose to users vs reserve for development-only diagnostics?
-
Feature negotiation failure. What does
graphshell-gpudo when an adapter lacks features a consumer requires? Degrade the consumer (middlenet-render drops backdrop-filter), refuse to init, or warn and continue? - Threading model for offscreen pool on native. Rayon, tokio blocking pool, custom? Probably tokio for consistency with transport.
-
Pipeline cache. Does
graphshell-gpuown a cross-renderer pipeline cache, or does each renderer manage its own? Leaning toward per-renderer since pipelines are renderer-specific.
v1.5 explicitly does not extract graphshell-gpu. What v1.5 does:
- Reserves
FontHandle,ImageRef,OffscreenTargetas opaque types inmiddlenet-render. - Routes all middlenet-render surface reuse through these types.
- Keeps the integration surface small enough that extraction later is a move-and-re-export, not a rewrite.
This spec exists so that decisions made during v1.5 don't accidentally close off options this extraction will need โ especially the device-topology question (ยง5.1) and the texture-handoff contract (ยง5.3).
-
2026-XX-XX_graphshell_gpu_extraction_plan.mdโ ordered extraction milestone (concrete crate split, WebRender fork refactor, consumer migration sequence). -
2026-XX-XX_texture_bridge_contract_spec.mdโ detailed texture handoff semantics once the first real use case lands. -
2026-XX-XX_gpu_diagnostics_spec.mdโ diagnostics surface design. - WASM envelope enablement plan (depends on WASM-target work in other subsystems).