Graphshell: Incomplete Validation Tests

Purpose: Collect headed-manual and extended automated validation items that have not yet been executed or verified. Extracted from archived plan docs and ongoing plans so completed plans can be closed cleanly.

Note: Automated unit tests belong in their respective plan or source files. This file collects validation requiring headed execution or an extended integration harness.

Diagnostic Tools (New)

Many validation items below should now be verified live using the Diagnostic Inspector.

Toggle: Ctrl+Shift+D or F12, or via Command Palette Open Diagnostic Pane.
Intents Tab: Verify intent emission, ordering, and LifecycleCause.
Compositor Tab: Verify tile hierarchy, active rects, and webview mapping status (crucial for layout/rendering bugs).
Engine Tab: Monitor channel latency and throughput during performance tests.

Automation Strategy: The diagnostic system exposes structured state (DiagnosticsState) that can be queried in integration tests.

State Snapshots: Use snapshot_json_value() to assert on the full system state (tile hierarchy, active intents) without screen scraping.
Event Stream: Subscribe to the global diagnostic channel to assert on event ordering (e.g., url_changed before history_changed).

Automated validation run (2026-02-22):

Accessibility Subsystem: Phase 1 WebView Bridge Validation (P10.d)

Source: implementation_strategy/SUBSYSTEM_ACCESSIBILITY.md (§6.1, §8 Phase 1)

Scope guard (Phase 1 only):

Validate the WebView accessibility bridge baseline (notify_accessibility_tree_update -> bridge injection).
Do not include Graph Reader/Map/Room mode validation in this slice.

Automated harness check

Run these as the P10.d baseline harness set:

cargo test --lib shell::desktop::ui::gui::accessibility_bridge_tests::inject_webview_a11y_updates_drains_pending_map -- --exact
- Result: pass (1/1); confirms pending WebView accessibility updates are consumed each frame and do not accumulate.
cargo test --lib shell::desktop::ui::gui::accessibility_bridge_tests::webview_a11y_graft_plan_includes_injectable_nodes_and_root -- --exact
- Result: pass (1/1); confirms bridge planning yields an injectable node set + stable root in the supported baseline scenario.
cargo test --lib shell::desktop::ui::gui::accessibility_bridge_tests::webview_a11y_graft_plan_tracks_role_conversion_fallbacks -- --exact
- Result: pass (1/1); confirms degraded conversion fallback remains explicit/observable when role compatibility falls back.
cargo check -q --lib
- Result: pass (warnings only); confirms bridge validation/harness code compiles in library path.

Manual handoff checklist (repeatable)

This checklist is the required cross-contributor handoff script for P10.c/P10.d split work.

Preconditions

Build/run Graphshell with normal desktop runtime (cargo run -p graphshell --bin graphshell -- -M https://example.com).
Ensure one screen reader is active on platform under test:
- Windows: NVDA
- Linux: Orca
- macOS: VoiceOver (if available for contributor)
Open at least one node pane with live embedded web content.

Checks

Bridge baseline readability
- Action: move accessibility focus into the active embedded web content region.
- Expected: screen reader announces an embedded web content/document region (derived from focused or labeled nodes when present).
Focused-label preference
- Action: navigate to a clearly labeled element (e.g., heading/link) in embedded content.
- Expected: announcement prefers focused element label over generic fallback wording.
Frame-to-frame continuity
- Action: trigger content updates (scroll/navigation within the same pane) and continue reading.
- Expected: reading remains stable across updates; no repeated loss of document context.
Degraded fallback remains announced
- Action: open sparse/minimal content where explicit labels may be absent.
- Expected: reader still announces a usable embedded web-content fallback label rather than silence.

Handoff evidence template

Include the following in the issue/PR handoff comment:

Platform + screen reader used.
URLs/content types tested.
Pass/fail per checklist item above.
Any observed degraded behavior with concise reproduction notes.
Automated harness command output summary (the four commands above).

Performance Baseline: Viewport Culling Validation + Benchmark (P10.b)

Source: implementation_strategy/2026-02-24_performance_tuning_plan.md (Phase 1 + Validation)

Scope guard (P10.b only):

Validate culling effectiveness and benchmark instrumentation.
Do not change culling policy semantics in this slice.

Automated harness checks

cargo test -q viewport_culling_metrics_reduce_visible_set_and_submission_units -- --nocapture
- Result: pass (1/1); confirms visible-set/submitted-set shrink relative to full graph and reduced submission work units.
cargo test -q viewport_culling_benchmark_reports_lower_prep_time_than_full_rebuild -- --nocapture
- Result: pass (1/1); confirms benchmark instrumentation reports lower rebuild-prep time for culled graph vs full graph on dense synthetic fixture.

Manual benchmark run notes (before/after comparability)

Run Graphshell against the same dense graph fixture and camera framing for each comparison run.
Keep diagnostics sampling bounded (default diagnostics cadence in the Diagnostic Inspector) to avoid observer-effect distortion.
Capture both runs using this pair:
- Baseline: viewport culling disabled for the test window.
- Candidate: viewport culling enabled with identical fixture/camera path.
Capture and compare:
- visible/submitted node counts from culling metrics path,
- prep/rebuild timing trend (full vs culled path),
- frame-time trend from diagnostics view over the same interaction script.

Before/after capture template (required for handoff)

Use this exact structure in handoff comments for cross-contributor comparability:

Fixture ID / node+edge count:
Camera framing / zoom / pan seed:
Interaction script duration:
Baseline (culling=off):
- visible/submitted nodes:
- full/culled submission units:
- prep timing summary:
- frame-time summary:
Candidate (culling=on):
- visible/submitted nodes:
- full/culled submission units:
- prep timing summary:
- frame-time summary:
Delta summary:
- visible-set reduction direction (expected: down)
- submission-unit reduction direction (expected: down)
- frame-time impact direction (expected: down or neutral)

Evidence to capture in handoff comment

Fixture size and viewport framing used.
Commands + pass/fail for automated checks.
Full vs culled timing summary and observed frame-time delta direction.

Frame Routing and Membership (Headed Manual)

Source: implementation_strategy/2026-02-22_workbench_workspace_manifest_persistence_plan.md + implementation_strategy/2026-02-22_workbench_tab_semantics_overlay_and_promotion_plan.md

Context: Items 7, 10, and 11 now have automated coverage. The remaining routing/membership checks require headed-manual validation.

Start command (PowerShell):

$env:RUST_LOG="graphshell=debug"; cargo run -p graphshell --bin graphshell -- -M https://example.com

Baseline setup (once):

Action: right-click node E -> Frame -> Add To Frame... and pick workspace-alpha.
Expected: workspace-alpha snapshot now includes E as a tab on next restore/load, and E frame badge count increments accordingly.
Run note (2026-02-19): Passed; wording/label clarity for frame-route action remains a UX follow-up.

Unified context-aware pin control

Action A: with graph/frame focus (no active pane focus), use Persistence Hub Pin Frame; mutate layout and verify highlight toggles off; restore using Load Pin... -> Frame Pin and verify previous layout restores.
Action B: with an active pane focus, use Persistence Hub Pin Pane; mutate layout and verify highlight toggles off; restore using Load Pin... -> Pane Pin and verify previous layout restores.
Expected: single pin control adapts to focus context (Frame vs Pane) and renders active state only when current layout matches saved pin snapshot.
Run note (2026-02-19): Workspace pin flow passed. Pane pin restore semantics remain unclear in graph-only context, and pane pin persistence/visibility after view switching needs follow-up.

Graph UX Polish: Phase 1.1 / 1.2 (Headed Manual)

Source: implementation_strategy/2026-02-23_graph_interaction_consistency_plan.md + implementation_strategy/2026-02-20_edge_traversal_impl_plan.md

Context: Automated coverage exists for input/intent/app semantics. Headed-manual validation is required for end-to-end viewport behavior and interaction feel.

Keyboard model update:

C keyboard fit shortcut is retired.
Z is the single smart-fit key:
- 2+ selected nodes: fit selected bounds.
- 0 or 1 selected node: fit full graph.

Keyboard zoom in/out/reset
- Action: in graph view (no text field focused), press + a few times, - a few times, then 0.
- Expected: zoom increases, decreases, then resets to 1.0x in overlay.
Keyboard zoom blocked during text entry
- Action: focus URL bar (Ctrl+L), then press +, -, 0, Z.
- Expected: graph zoom/fit does not trigger while text input owns keyboard focus.
Smart-fit: zero selection
- Action: clear selection and press Z.
- Expected: full-graph fit.
Smart-fit: single selection
- Action: select exactly one node and press Z.
- Expected: full-graph fit (not single-node framing).
Smart-fit: multi-selection
- Action: select two or more distant nodes and press Z.
- Expected: viewport fits selected-node AABB with padding.
Wheel/trackpad zoom feel
- Action: use Ctrl+wheel or Ctrl+trackpad pinch/scroll.
- Expected: zoom responsiveness feels slightly faster than previous 0.01 setting and remains controllable.
Zoom bounds
- Action: repeatedly zoom out, then repeatedly zoom in.
- Expected: zoom clamps at configured bounds; no runaway scale.
Interaction regression sweep
- Action: after zoom/fit operations, drag/select nodes and open nodes from graph.
- Expected: graph interactions remain stable (no stuck input state).

Graph UX Polish: Carry-Over Manual Checks (Current)

Source: Active UX polish + follow-up implementation sessions

Context: These checks were identified during rapid implementation and need explicit headed verification before closing the UX-polish tranche.

Wheel vs trackpad tuning target
- Action: validate Ctrl+wheel and precision trackpad pinch/scroll on at least one wheel mouse and one trackpad.
- Expected: zoom feels controllable on both devices with no over/under-shoot bias.
Tab-click warm -> active routing
- Action: from workspace tabs, click a warm node tab and then navigate.
- Expected: node reliably promotes to active and page loads in the expected pane context.
Lasso reliability under dense node overlap
- Action: generate a dense graph cluster and perform repeated right-drag lasso sweeps.
- Expected: lasso selection set is deterministic and matches visual rectangle containment.
Omnibar non-@ ordering policy
- Action: test non-@ query flow for: local tabs present, connected matches present, provider fallback, and global fallback.
- Expected: ordering follows configured settings (scope + non-@ order preset) and non-local caps are respected.
Omnibar provider failure-state visibility
- Action: disable network or force provider error responses while typing non-@ and @g/@b/@d queries.
- Expected: dropdown surfaces clear provider status (loading / error) without blocking input or crashing.

Warning/Stub Revival Watchlist

Source: cargo check -p graphshell --lib warning sweep + code scan (2026-02-20)

Context: Keep these visible so transitional code paths do not silently rot. Prioritize items that indicate dormant feature producers/consumers and integration seams that may need revival.

Test-only helpers leaking into non-test builds
- Files: ports/graphshell/desktop/gui.rs, ports/graphshell/desktop/persistence_ops.rs
- Signal: unused import/function warnings for test wrappers and helpers.
- Validation expectation: move wrappers/imports behind #[cfg(test)] and confirm no behavior change.
Dormant intent producers vs handled intent variants
- File: ports/graphshell/app.rs
- Variants: CreateUserGroupedEdgeFromPrimarySelection, WebViewScrollChanged, SetNodeFormDraft
- Signal: variants are handled but currently not constructed in lib build.
- Validation expectation: either wire active producers in runtime flows or explicitly mark as deferred bridge paths in plan/docs.
Potentially stale GUI bridge helpers
- File: ports/graphshell/desktop/gui.rs
- Symbols: active_tile_webview_id, semantic-event conversion helper wrappers.
- Signal: dead-code warnings suggest old integration seams.
- Validation expectation: remove if obsolete, or reattach to intended flow and verify usage under headed run.
Scaffold marker still present in tile path
- File: ports/graphshell/desktop/tile_kind.rs
- Signal: explicit scaffold/dead-code allowance comment.
- Validation expectation: confirm this is still intentional in current phase, or retire the allowance when tile flow is fully active.
Graph model dead fields/methods that may reflect deferred features
- Files: ports/graphshell/graph/mod.rs, ports/graphshell/graph/egui_adapter.rs
- Symbols: Node.velocity, get_node_by_id, has_edge_between, EguiGraphState::from_graph.
- Signal: dead-code warnings; may indicate partially implemented or superseded paths.
- Validation expectation: decide revive/remove/annotate; if revive, add explicit validation scenarios tied to the owning plan.
Recovery TODO/FIXME that can impact UX and correctness
- Files: ports/graphshell/desktop/dialog.rs, ports/graphshell/desktop/headed_window.rs, ports/graphshell/panic_hook.rs, ports/graphshell/lib.rs, ports/graphshell/prefs.rs
- Signal: active TODO/FIXME markers in runtime code.
- Validation expectation: each marker either receives a tracked owner/plan link or a scoped deferral note with revalidation trigger.

Navigation: Back/Forward Delegate Event Ordering

Method: Use Diagnostic Inspector > Intents Tab to observe WebView* intents in real-time. Automation Status: Ready. Strategy: Capture DiagnosticEvent stream during navigation. Assert url_changed vs history_changed ordering and payload content programmatically.

Finding: During back/forward navigation, Servo fires url_changed before history_changed reflects the new stack position — and the URL in url_changed is the source URL, not the destination. Trace evidence (all events at t_ms=131, same frame):

seq=9:  url_changed → ?step=2           (pushState arrives)
seq=10: history_changed len=3 current=2 (stack confirmed at step=2)
seq=11: url_changed → ?step=2           ← spurious? fires before back completes
seq=12: history_changed len=3 current=1 ← back navigation; now at step=1, but url above said step=2
seq=13: url_changed → ?step=2           ← fires before forward completes
seq=14: history_changed len=3 current=2 ← forward; back at step=2

Going back from ?step=2 to ?step=1: url_changed(?step=2) fires first, then history_changed(current=1). sync_webviews_to_graph() reads url_changed to detect new navigations and create nodes/edges — it will see ?step=2 as an apparent navigation event even though the back transition is to ?step=1.

Risk: spurious node creation or misclassified edge type (Hyperlink instead of History) for back/forward transitions that emit url_changed for a URL that already exists.

No spurious node on back navigation: browse to A → B → C, go back to B — confirm only nodes A, B, C exist (no duplicate C created during the back transition's url_changed).
Back/forward edge type: back/forward transitions are classified as History edges, not Hyperlink edges, even when url_changed fires with the source URL before history_changed.
Burst scenario — node count invariant: run scenario_back_forward_burst.html (pushState ×3, back ×1, forward ×1) — confirm final node count is 4 (base + step=0,1,2), not 5 or 6 from spurious url_changed events.
Burst scenario — no duplicate edges: same run — confirm no duplicate History or Hyperlink edges exist between the burst nodes after the sequence completes.

F1: Multi-Pane Grouping Behavior

Source: archive_docs/checkpoint_2026-02-19/2026-02-17_f1_multi_pane_validation_checklist.md

Context: F1 automated tests pass; headed split/grouping trigger validation not yet recorded.

Validation Aid: Use Diagnostic Inspector > Compositor Tab to verify tile hierarchy structure. Automation Status: Ready. Strategy: Inspect DiagnosticsState.compositor_state.frames after action. Assert hierarchy contains expected split/tab structure.

Split trigger creates edge
- Select node A, then Shift+Double-click node B.
- Expected: exactly one UserGrouped edge A→B is created.
Drag-into-same-tab-group trigger
- Open nodes A and B in separate detail panes.
- Drag one pane into the same Tabs container as the other.
- Expected: exactly one UserGrouped edge created (no duplicates).
Explicit group-with-focused action
- Focus node A (pane A active).
- Run the "Group with Focused" command on node B (command palette or G key).
- Expected: exactly one UserGrouped edge A→B created.
No-trigger paths
- Switch pane focus, switch tabs, navigate URLs normally.
- Expected: no new UserGrouped edges created from those actions alone.

Step 4d: @ Omnibar Scope Validation

Source: 2026-02-18_edge_operations_and_cmd_palette_plan.md (active plan)

Context: Scope implementation complete; headed validation not yet executed. Automation Status: Ready. Strategy: Inspect DiagnosticsState.intents to verify OpenNodeFrameRouted vs CreateNodeAtUrl intent emission with correct scope.

F6: EGL Explicit Targeting — Extended Tests

Source: archive_docs/checkpoint_2026-02-19/2026-02-18_f6_explicit_targeting_plan.md

Context: Exit criteria met (two focused unit tests pass). Optional extended harness tests not yet written.

End-to-end wrapper dispatch: call load_uri_for_webview, go_back_for_webview, and at least two input _for_webview methods from a simulated host-caller harness. Verify: correct webview receives each command, no fallback warning logged, other active webview unaffected.
Multi-webview isolation: with two active webviews, route input events to each via _for_webview. Confirm no cross-contamination between webviews.
Fallback warning rate-limit: configure preferred_input_webview_id to return None. Confirm deprecation warning emitted once and rate-limited on repeated calls within the cooldown window.

Undo/Redo Validation

Source: Spec in 2026-02-18_edge_operations_and_cmd_palette_plan.md §Global Undo/Redo Boundary

Context: Undo/redo is functional but has no dedicated plan doc, no feature entry in IMPLEMENTATION_ROADMAP, and no recorded validation against the grouping/exclusion rules in the spec. A feature entry and these tests should be added when the roadmap is next updated.

Single graph mutation: add a node → undo → node is removed; redo → node reappears.
Multi-intent command as one undo step: run ConnectBothDirections (creates two edges) → undo → both edges removed in one step (not two separate undos).
Workspace/graph restore is atomic: load a named workspace snapshot → undo → prior workspace layout is restored as one transaction.
Webview navigation is not undoable: navigate a webview to a new URL via link click → undo → URL change is NOT reverted (only explicit user commands are undoable).
Physics simulation is not undoable: run physics until nodes settle → undo → node positions are NOT reverted; only explicit layout commands (e.g. Fit) are undoable.
Undo survives mode switch: perform an action in graph mode → switch to detail mode → undo → action is reversed (undo stack persists across view modes).
Persistence parity after undo/redo cycle: create UserGrouped edge → undo removes it → redo re-adds it → restart app → edge state reflects the final post-redo state in the persisted log.

Navigation Control-Plane Regression Sweep

Source: archive_docs/checkpoint_2026-02-16/2026-02-15_navigation_control_plane_plan.md (§Required Validation Artifacts)

Context: Archived plan called for deterministic integration validation around targeted navigation. These checks remain relevant for current omnibar + tile-focused routing behavior.

Validation Aid: Use Diagnostic Inspector > Intents Tab to confirm OpenNodeFrameRouted vs CreateNodeAtUrl intent emission. Automation Status: Ready. Strategy: Assert DiagnosticsState.intents contains expected intent variants with correct target keys.

Omnibar URL submit in graph mode: submit a URL from graph mode and verify deterministic target behavior (new/opened node, correct selection/focus outcome).
Omnibar URL submit in detail mode: with two panes, focus pane A then pane B and submit distinct URLs; verify each submit targets the focused pane only.
@query no-match feedback: enter a query with no matches and verify UI feedback is explicit, stable, and does not trigger unintended navigation.
Back/Forward/Reload target isolation: with distinct histories in two panes, invoke controls and verify only the focused pane is affected.
Mode and focus regression pass: switch graph/detail modes, switch active tiles/tabs, and repeat submit/navigation actions; verify no stale target leakage.

Physics and Selection Behavior (Legacy Carry-Over)

Source: archive_docs/checkpoint_2026-02-16/2026-02-12_physics_selection_plan.md (§Validation Tests)

Context: These behaviors still exist and are user-visible; archived validation list remains useful as manual/integration guardrails.

Physics toggle responsiveness: T key reliably pauses/resumes layout updates with no stuck intermediate state.
Physics panel parameter effect: changing force parameters in panel produces observable layout response without instability/panic.
Pinned-node invariance under motion: pinned nodes remain fixed while surrounding unpinned nodes continue to move.
Selection durability through interactions: selection state remains coherent across drag, focus changes, and node delete operations.
Mid-size convergence sanity: graph with ~50 nodes reaches a visually stable layout in a reasonable interval on baseline hardware.

VALIDATION_TESTING - mark-ik/graphshell GitHub Wiki

Graphshell: Incomplete Validation Tests

Diagnostic Tools (New)

Accessibility Subsystem: Phase 1 WebView Bridge Validation (P10.d)

Automated harness check

Manual handoff checklist (repeatable)

Handoff evidence template

Performance Baseline: Viewport Culling Validation + Benchmark (P10.b)

Automated harness checks

Manual benchmark run notes (before/after comparability)

Before/after capture template (required for handoff)

Frame Routing and Membership (Headed Manual)

Graph UX Polish: Phase 1.1 / 1.2 (Headed Manual)

Graph UX Polish: Carry-Over Manual Checks (Current)

Warning/Stub Revival Watchlist

Navigation: Back/Forward Delegate Event Ordering

F1: Multi-Pane Grouping Behavior

Step 4d: @ Omnibar Scope Validation

F6: EGL Explicit Targeting — Extended Tests

Undo/Redo Validation

Navigation Control-Plane Regression Sweep

Physics and Selection Behavior (Legacy Carry-Over)

⚠️ GitHub.com Fallback ⚠️

VALIDATION_TESTING - mark-ik/graphshell GitHub Wiki

Graphshell: Incomplete Validation Tests

Diagnostic Tools (New)

Accessibility Subsystem: Phase 1 WebView Bridge Validation (P10.d)

Automated harness check

Manual handoff checklist (repeatable)

Handoff evidence template

Performance Baseline: Viewport Culling Validation + Benchmark (P10.b)

Automated harness checks

Manual benchmark run notes (before/after comparability)

Before/after capture template (required for handoff)

Frame Routing and Membership (Headed Manual)

Graph UX Polish: Phase 1.1 / 1.2 (Headed Manual)

Graph UX Polish: Carry-Over Manual Checks (Current)

Warning/Stub Revival Watchlist

Navigation: Back/Forward Delegate Event Ordering

F1: Multi-Pane Grouping Behavior

Step 4d: @ Omnibar Scope Validation

F6: EGL Explicit Targeting — Extended Tests

Undo/Redo Validation

Navigation Control-Plane Regression Sweep

Physics and Selection Behavior (Legacy Carry-Over)

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️