2026 02 22_test_harness_consolidation_plan - mark-ik/graphshell GitHub Wiki
Status: In Progress Goal: Consolidate integration tests into a unified harness driven by the Diagnostic System, enabling "Observability-Driven Testing" and high automation coverage.
Terminology note (2026-02-26): The struct this document calls
TestHarnesshas been renamed toTestRegistryin code and canonical docs. The nameTestHarnessnow refers to the planned in-pane runner (feature-gated, background execution, panic isolation). ReadTestRegistrywherever this document refers to thecargo testfixture struct. SeeSUBSYSTEM_DIAGNOSTICS.md §4.
Completed in this checkpoint:
- Migrated persistence switching scenario to harness:
-
switch_persistence_dir_reloads_graph_state->shell/desktop/tests/scenarios/persistence.rs
-
- Migrated one preference persistence scenario to harness:
-
set_toast_anchor_preference_persists_across_restart->shell/desktop/tests/scenarios/persistence.rs
-
- Added first grouping-intent scenario coverage:
-
create_user_grouped_edge_from_primary_selection_creates_grouped_edge->shell/desktop/tests/scenarios/grouping.rs
-
-
Added semantic tagging scenario coverage:
-
set_node_pinned_intent_syncs_pin_tag->shell/desktop/tests/scenarios/tags.rs -
tag_node_pin_updates_pinned_state->shell/desktop/tests/scenarios/tags.rs
-
- Removed migrated duplicate tests from
app.rsin same slice.
Validation evidence for this checkpoint:
-
cargo test shell::desktop::tests::scenarios::persistence:: -- --nocapture(pass, 12 tests) -
cargo test shell::desktop::tests::scenarios::grouping:: -- --nocapture(pass, 1 test) -
cargo test shell::desktop::tests::scenarios::tags:: -- --nocapture(pass, 2 tests) -
cargo test shell::desktop::tests::scenarios::registries -- --nocapture(pass) -
cargo check(pass) - Full scenario matrix: 49 tests passing
Currently, tests are scattered across modules (gui_tests.rs, persistence_ops.rs, app.rs). Many rely on internal visibility or fragile state checks.
We have introduced a robust Diagnostic System (DiagnosticsState, DiagnosticEvent) that exposes the system's internal topology and state as structured data.
We will leverage this to build a unified TestHarness that treats the app as a black box (mostly) and asserts on diagnostic signals.
A new top-level module (or shell/desktop/tests/) to house the harness and scenarios.
-
harness.rs: WrapsGraphBrowserApp+Gui(headless) +DiagnosticsState. -
scenarios/: Submodules for specific feature areas (e.g.,routing.rs,layout.rs).
pub struct TestHarness {
pub app: GraphBrowserApp,
pub gui: Gui, // Headless/Test configuration
pub events: Receiver<DiagnosticEvent>,
}
impl TestHarness {
pub fn new() -> Self { ... }
pub fn step(&mut self) { ... } // Runs one frame/tick
pub fn click_node(&mut self, key: NodeKey) { ... }
pub fn assert_intent(&self, predicate: impl Fn(&GraphIntent) -> bool) { ... }
pub fn snapshot(&self) -> Value { ... } // Returns DiagnosticsState snapshot
}Instead of checking app.selected_nodes.len(), we check:
-
Intents: Did the action produce the expected
GraphIntent? -
Compositor State: Does the
CompositorFrameSampleshow the expected tiles and rects? - Engine Topology: Did the channel message counts increment?
The Bug: WebView tiles appear in the tree but render as black/empty.
The Cause: The tile exists in egui_tiles, but the GraphBrowserApp has not mapped a WebViewId to the NodeKey, or the OffscreenRenderingContext is missing.
We create a test scenario that opens a node and asserts that it is "visible" in the compositor.
#[test]
fn test_webview_tile_renders_correctly() {
let mut harness = TestHarness::new();
let node = harness.add_node("https://example.com");
// Action: Open the node in a tile
harness.app.request_open_node_tile_mode(node, PendingTileOpenMode::Tab);
harness.step(); // Process intents
harness.step(); // Run layout/compositor
// Assertion: Check Diagnostic Snapshot
let snapshot = harness.snapshot();
let tile = find_tile_for_node(&snapshot, node);
assert!(tile.is_some(), "Tile should exist in tree");
let tile = tile.unwrap();
// The Bug: These assertions fail if the tile is black
assert!(tile.mapped_webview, "Node must map to a WebViewId");
assert!(tile.has_context, "Tile must have a GL context");
assert!(tile.rect.width() > 0.0, "Tile must have non-zero size");
}Running this test reveals:
assertion failed: tile.mapped_webview -> The app logic created the tile but didn't fire MapWebviewToNode.
We modify desktop/tile_view_ops.rs or app.rs to ensure the mapping intent is fired.
Re-running the test passes.
- Create
shell/desktop/tests/harness.rs. - Implement
TestHarness::new()with a headlessGuiinstance. - Expose
DiagnosticsStatefromGui(already done).
Stage 1 progress (2026-02-22):
- Added
shell/desktop/testsscaffold:shell/desktop/tests/mod.rsshell/desktop/tests/harness.rsshell/desktop/tests/scenarios/mod.rsshell/desktop/tests/scenarios/black_tile.rs
- Wired test module registration in
shell/desktop/mod.rsunder#[cfg(test)]. - Added minimal
TestHarnessAPI for observability-driven assertions:- app construction + node creation/open helpers
- diagnostics-driven frame sampling and snapshot extraction
- snapshot helpers for tile/channel assertions
- Added initial scenario coverage:
webview_tile_snapshot_reports_mapping_and_context_healthengine_snapshot_exposes_servo_runtime_channels
- Added test-only diagnostics accessors used by harness:
DiagnosticsState::force_drain_for_testsDiagnosticsState::snapshot_json_for_tests
- Validation:
-
cargo test shell::desktop::tests::scenarios::black_tile:: -- --nocapture(pass) -
cargo check --message-format short(pass)
-
- Move
workspace_routingtests to the harness. - Move
persistencetests to the harness.
Stage 2 progress (2026-02-22):
- Migrated first
workspace_routingcase into harness scenarios:- Added
shell/desktop/tests/scenarios/routing.rs - Added test:
open_node_workspace_routed_falls_back_to_current_workspace_for_zero_membership
- Added
- Registered routing scenario in
shell/desktop/tests/scenarios/mod.rs. - Removed duplicated original test from
app.rsto keep migration authoritative. - Stabilized harness diagnostics assertions by switching from global diagnostics emission to harness-local test-only diagnostics event injection.
- Continued routing migration with additional cases:
open_node_workspace_routed_with_preferred_workspace_requests_restoreremove_selected_nodes_clears_workspace_membership_entryresolve_workspace_open_prefers_recent_membershipresolve_workspace_open_honors_preferred_workspaceset_node_url_preserves_workspace_membership
- Removed duplicated originals from
app.rsfor migrated cases above. - Started persistence-focused Stage 2 migration:
- Added
shell/desktop/tests/scenarios/persistence.rs - Added tests:
open_node_workspace_routed_preserves_unsaved_prompt_state_until_restoreworkspace_has_unsaved_changes_for_graph_mutationsworkspace_not_modified_for_non_graph_mutationsworkspace_not_modified_for_set_node_positionworkspace_has_unsaved_changes_for_set_node_pinned
- Removed duplicated originals from
app.rsfor migrated persistence cases.
- Added
- Continued persistence migration with unsaved-prompt/save-state cases:
workspace_modified_for_graph_mutations_even_when_not_synthesizedunsaved_prompt_warning_resets_on_additional_graph_mutationsave_named_workspace_clears_unsaved_prompt_state- Removed duplicated originals from
app.rsfor migrated cases above.
- Validation:
-
cargo test shell::desktop::tests::scenarios::persistence:: -- --nocapture(pass, 8 tests) -
cargo test shell::desktop::tests::scenarios::routing:: -- --nocapture(pass) -
cargo test shell::desktop::tests::scenarios::black_tile:: -- --nocapture(pass) -
cargo check(pass)
-
- Add
CompositorFrameSampleassertions for layout tests. - Add
Enginetopology assertions for performance tests.
Stage 3 progress (2026-02-22):
- Added
shell/desktop/tests/scenarios/layout.rsand registered it inshell/desktop/tests/scenarios/mod.rs. - Added initial layout/compositor scenario coverage:
compositor_frames_capture_sequence_and_active_tile_count_transitionscompositor_tile_rects_are_non_zero_in_healthy_layout_pathhealthy_layout_path_keeps_active_tile_violation_channel_zerounhealthy_layout_signal_is_observable_via_active_tile_violation_channel
- Expanded Stage 3 layout coverage with topology assertions:
compositor_multi_tile_layout_samples_have_non_overlapping_rectscompositor_hierarchy_samples_include_split_container_and_child_tiles
- Migrated Session Autosave/Retention tests into harness scenarios (
shell/desktop/tests/scenarios/persistence.rs):session_workspace_blob_autosave_uses_runtime_layout_hash_and_caches_runtime_layoutsession_workspace_blob_autosave_rotates_previous_latest_bundle_on_layout_change
- Removed migrated autosave/retention duplicates from
app.rs. - Validation:
-
cargo test shell::desktop::tests::scenarios::layout:: -- --nocapture(pass, 6 tests) -
cargo test shell::desktop::tests::scenarios::persistence:: -- --nocapture(pass, 10 tests) -
cargo test shell::desktop::tests::scenarios::routing:: -- --nocapture(pass, 6 tests) -
cargo test shell::desktop::tests::scenarios::black_tile:: -- --nocapture(pass, 2 tests) -
cargo check(pass)
-
Stage 3 scope (next):
- Add a
layout.rsscenario module undershell/desktop/tests/scenarios/for tile geometry + viewport invariants. - Add scenario assertions for:
- active tile count transitions during open/close flows
- stable non-zero tile rects after routing and restore
- active-tile invariant channels (
tile_render_pass.active_tile_violation) remaining zero in healthy paths
- Add Engine topology checks for hot-path channels and percentile latency controls where deterministic in tests.
Stage 3 acceptance criteria:
- Layout/compositor assertions execute only via harness snapshots (no private-field reach-through).
- At least one scenario fails when an expected mapping/context invariant is intentionally broken.
- Existing Stage 2 suites continue to pass unchanged.
- Finalize migration coverage and remove obsolete duplicates from legacy test locations.
- Document the harness as the default integration test entrypoint.
- Keep targeted command matrix stable for CI/local validation.
Stage 4 acceptance criteria:
- Migrated tests live in
shell/desktop/tests/scenarios/*with no equivalent duplicates inapp.rs. - Consolidation doc includes final migrated test inventory and command matrix.
-
cargo test shell::desktop::tests::scenarios:: -- --nocaptureandcargo checkare green.
This inventory maps all functional areas to migration stages.
- Harness Scaffold
- Workspace Routing (Basic)
- Persistence (Basic)
- Session Autosave & Retention
test_session_workspace_blob_autosave_uses_runtime_layout_hash_and_caches_runtime_layouttest_session_workspace_blob_autosave_rotates_previous_latest_bundle_on_layout_change- [~] Persistence Switching & Preferences
-
test_switch_persistence_dir_reloads_graph_state - Preference persistence (toast anchor)
- Preference persistence (shortcut bindings)
- Preference persistence (lasso binding)
- Tile Geometry Invariants
- Verify active tile count matches expected open nodes.
- Verify tile rects are non-zero and non-overlapping (sanity check).
- Verify
tile_render_pass.active_tile_violationchannel remains zero.
- [~] Multi-Pane Grouping
- Verify split operations create correct container hierarchy.
- Verify drag-to-group creates
UserGroupededges (via intent inspection).
- [~] Navigation & History
- Verify
Back/Forwardintents update history index correctly. - Verify
WebViewUrlChangedtriggers correct node updates. - Validation: Replaces manual "Navigation: Back/Forward Delegate Event Ordering" checks.
- Added harness scenario module:
shell/desktop/tests/scenarios/navigation.rs. - Added scenarios:
webview_url_changed_updates_existing_mappingwebview_url_changed_appends_traversal_between_known_nodes_without_self_loopwebview_history_changed_clamps_index_to_entry_boundswebview_history_changed_adds_back_then_forward_traversals_with_repeat_countshistory_callback_is_authoritative_when_url_callback_stays_on_latest_entry
- Verify
- [~] Graph Interactions & Semantic Tagging
- Verify
SelectNode(single/multi) updates selection state. - Verify
CreateUserGroupedEdgeintents are emitted on grouping actions. - Verify
SetNodePinnedintent synchronizes with#pinsemantic tag (tags scenario). - Verify
TagNode/UntagNodefor#pinupdatesnode.is_pinnedstate (tags scenario). - Verify
Undo/Redorestores previous snapshot state.
- Verify
- Search & Filtering
- Verify Omnibar
@scopes filter results correctly (using harness snapshot of matches). - Verify Graph Search (Ctrl+F) highlights/filters nodes.
- Verify Omnibar
- Focus-owner matrix
- Add deterministic scenarios for focus owners: graph surface, omnibar field, node webview tile, and modal-active surface.
- Verify each scenario emits expected dispatch diagnostics (
ux:dispatch_*) and preserves two-authority routing boundaries.
- Global shortcut survivability
- Verify
F9(camera fit lock) andF6(focus cycle) survive host/webview focus routing and reach reducer/workbench authority paths. - Verify modal isolation does not consume
ToggleCameraFitLockwhile still consuming explicitly blocked intents (e.g.,Undo).
- Verify
- Graph interaction liveness under focus churn
- Verify pan/zoom/drag remain available when camera lock is disabled and graph surface regains focus after omnibar/modal activity.
- Verify fit-lock toggles via both settings pane and keyboard shortcut produce observable state transitions.
- First scenario bundle (minimum 6)
camera_lock_toggle_survives_webview_focus_routingcamera_lock_toggle_survives_omnibar_focus_routingfocus_cycle_survives_webview_focus_routingmodal_isolation_preserves_camera_lock_togglegraph_pan_zoom_liveness_after_omnibar_focus_releasesettings_and_f9_toggle_paths_produce_identical_lock_state_transition
- Stack Mechanics
test_capture_undo_checkpoint_pushes_and_clears_redotest_undo_stack_trimmed_at_maxtest_new_action_clears_redo_stack
- State Restoration
test_perform_undo_reverts_to_previous_graphtest_perform_redo_reapplies_after_undotest_undo_returns_false_when_stack_empty
- Resolver Trace Coverage
test_resolve_workspace_open_emits_trace_candidates_ranking_and_reasontest_resolve_workspace_open_explicit_target_trace_reason
- Membership Affordance Coverage
test_membership_badge_hides_for_local_only_membershiptest_workspace_target_palette_actions_include_membership_hint
- Batch Operation Observability
test_prune_empty_workspaces_emits_intent_and_diagnosticstest_retention_sweep_emits_intent_and_diagnostics
- Engine Topology
- Verify channel message counts increment during activity.
- Verify latency percentiles stay within bounds (using simulated clock if possible).
-
Identify: Locate existing tests in
app.rs,gui_tests.rs, ortest_guide.md. -
Port: Rewrite as a scenario in
shell/desktop/tests/scenarios/<area>.rsusingTestHarness. - Verify: Run the new scenario.
- Delete: Remove the old test code or manual checklist item.
Policy note (2026-03-03): The canonical command source is now
scripts/dev/test-contracts.jsonconsumed viascripts/dev/test-select.ps1 run-policy. The explicit commands below remain as diagnostic examples and migration checkpoints.
Run after each migration increment:
cargo test shell::desktop::tests::scenarios::layout:: -- --nocapturecargo test shell::desktop::tests::scenarios::persistence:: -- --nocapturecargo test shell::desktop::tests::scenarios::routing:: -- --nocapturecargo test shell::desktop::tests::scenarios::registries:: -- --nocapturecargo test shell::desktop::tests::scenarios::tags:: -- --nocapturecargo test shell::desktop::tests::scenarios::black_tile:: -- --nocapturecargo test shell::desktop::tests::scenarios::input_routing:: -- --nocapturecargo test shell::desktop::tests::scenarios::navigation:: -- --nocapturecargo check
Run at stage boundaries:
cargo test shell::desktop::tests::scenarios:: -- --nocapture
Policy-driven selectors (CI/local):
pwsh -NoProfile -File scripts/dev/test-select.ps1 lint-policy --platform linux --base origin/mainpwsh -NoProfile -File scripts/dev/test-select.ps1 run-policy --tier pr-required --platform linux --affected --base origin/main --quietpwsh -NoProfile -File scripts/dev/test-select.ps1 run-policy --tier nightly --platform windows --quiet
-
Risk: black-box drift back to private-field assertions
- Mitigation: prefer public app API + diagnostics snapshot assertions; only add test-only accessors when unavoidable and scoped.
-
Risk: duplicate tests diverge between
app.rsand scenarios- Mitigation: remove migrated originals in the same PR/change-set.
-
Risk: flaky diagnostics-based assertions
- Mitigation: use harness-local deterministic event/sample injection paths rather than global channel timing.
Done
- Stage 1 harness scaffold complete and green.
- Stage 2 routing migration increment complete (6 routing scenarios).
- Stage 2 persistence migration increment complete for unsaved prompt + workspace mutation semantics (10 tests).
- Stage 3 expansion complete: non-overlap, split/container hierarchy, layout compositor assertions (6 layout tests).
- Phase A Session Autosave & Retention migration complete (scenarios authoritative;
app.rsduplicates removed). - Phase A Persistence Switching migration complete (
switch_persistence_dir_reloads_graph_state). - Phase A Preference Persistence: toast anchor migrated.
- Phase A Preference Persistence: shortcut bindings migrated (
set_shortcut_bindings_persist_across_restart). - Phase A Preference Persistence: lasso binding migrated (
set_lasso_binding_preference_persists_across_restart). - Semantic tagging scenarios added:
#pintag sync andTagNode/UntagNodestate (2 tests intags.rs). - First grouping-intent scenario:
create_user_grouped_edge_from_primary_selection_creates_grouped_edge(1 test ingrouping.rs). - Registries scenarios passing (
cargo test shell::desktop::tests::scenarios::registries -- --nocapture). - Full scenario matrix: 49 tests passing (2026-02-23 checkpoint).
Remaining
- Phase C.1: Undo/Redo stack mechanics and state restoration scenarios.
- Phase C.2: Workspace routing explainability (resolver trace, membership affordance, batch observability).
- Phase C: Navigation & History scenarios (
Back/Forwardintent ordering). - Phase C: Search & Filtering harness assertions (Omnibar scopes, Ctrl+F).
- Stage 4 final migration cleanup and closure pass (remove remaining legacy duplicates, update command matrix evidence).
Immediate next batch:
- Add Phase C.1 Undo/Redo scenarios (
stack_mechanics+state_restorationsub-modules or inline in a newundo.rs). - Extend grouping coverage to split/container hierarchy semantics in harness-observable outputs.
- Run full stage-boundary matrix and append evidence:
cargo test shell::desktop::tests::scenarios:: -- --nocapture.
To ensure future features respect the Observability-Driven Testing paradigm:
-
The "No Invisible State" Rule: Any new state field in
GraphBrowserApporGuithat affects logic must be exposed inDiagnosticsState(via snapshot or event). If you can't see it in the Inspector, you can't test it reliably. -
The "Intent-First" Rule: All user-visible changes must flow through
GraphIntent. Direct mutation of state from UI code is forbidden (except for transient visual-only state like hover highlights). This ensures the "Intents" tab is always the source of truth. -
Test Harness Requirement: New features must include a
scenarios/*.rsintegration test.#[test]functions in implementation files are reserved for pure unit logic (e.g. math, parsing) only.
We define "Comprehensive" based on the component's ownership:
- Target: 100% State Visibility.
-
Metric: Every
GraphIntentvariant is logged. EveryTileKindis inspectable. Every persistence IO operation emits a start/end span.
- Target: 100% Boundary Visibility.
- Metric: We cannot instrument Servo internals (DOM/JS engine) without forking. Instead, we instrument the Bridge: every Delegate callback, every IPC message size/latency, and every resource request. We treat Servo as a black box with a highly instrumented surface.
- Target: 100% Host Boundary Visibility.
-
Metric: When we add Wasm mods, we instrument the Host Runtime. We trace when a mod is invoked, how much CPU/Memory it uses, and exactly what
GraphIntents it emits. We do not trace inside the mod's binary.