Semantic Clustering Follow-On Plan (2026-04-03)

Status: Active follow-on plan Scope: Extracts the semantic-clustering lane from 2026-02-24_physics_engine_extensibility_plan.md into an execution plan that bridges semantic enrichment, out-of-band clustering computation, and graph layout consumption. Related:

2026-02-24_physics_engine_extensibility_plan.md
2026-03-11_graph_enrichment_plan.md
force_layout_and_barnes_hut_spec.md
layout_behaviors_and_physics_spec.md
2026-04-03_layout_backend_state_ownership_plan.md
2026-04-03_layout_variant_follow_on_plan.md
2026-04-03_wasm_layout_runtime_plan.md

Context

Semantic clustering already exists in Graphshell, but only as a partial cross-lane capability:

the enrichment lane already owns semantic provenance, classification, and user-facing explanation requirements
the force-layout lane already treats semantic clustering as a Graphshell-owned extension force
the physics extensibility umbrella note sketches future algorithmic follow-ons such as k-means, DBSCAN, and embedding-driven grouping

What is still missing is a dedicated execution lane for the part in the middle:

what semantic inputs are allowed to drive clustering
how cluster assignments are computed, invalidated, and diagnosed
how those assignments affect layout without becoming a hidden source of graph truth

This plan exists so semantic clustering is no longer split between "enrichment someday" and "physics helper already exists" with no authority for the actual bridge.

Non-Goals

treating semantic clustering as a graph-canonical mutation
deepening hidden runtime-only semantic state without explanation or provenance
making ML embeddings a prerequisite for all semantic grouping behavior
replacing domain clustering, frame-affinity behavior, or other existing layout helpers
turning this lane into a general-purpose model-serving or vector-search plan

Feature Target 1: Define the Semantic Input Contract

Target 1 Context

The first missing decision is what data semantic clustering is allowed to consume. The umbrella note points at burn embeddings and UDC similarity; the enrichment lane already requires provenance, confidence, and user-facing explanation.

Target 1 Tasks

Define the allowed semantic inputs for clustering in priority order: embeddings when available, classification/tag similarity when not, and explicit fallback rules.
Require every clustering input source to remain attributable to enrichment metadata rather than hidden renderer or physics state.
Define whether clustering operates on per-node vectors, pairwise similarity tables, or both.
Specify how cluster inputs are invalidated when node content or semantic metadata changes.

Target 1 Validation Tests

Clustering can explain which semantic source produced a grouping decision.
Missing embeddings degrade to a documented fallback path rather than disabling the whole lane.
Input invalidation triggers only when the relevant semantic data changes.

Feature Target 2: Land the Out-Of-Band Clustering Pipeline

Target 2 Context

The umbrella note explicitly places clustering computation out-of-band rather than inside the per-frame physics step. That boundary is important for both performance and inspectability.

Target 2 Tasks

Define a background or on-demand clustering pipeline that computes cluster assignments outside the interactive layout step.
Start with a bounded first-slice algorithm choice and leave richer alternatives such as DBSCAN as later admissions rather than day-one complexity.
Produce a stable cluster-assignment artifact keyed by GraphViewId and node identity.
Keep clustering recomputation policy explicit: manual refresh, data-change invalidation, or bounded automatic recompute.

Target 2 Validation Tests

Cluster assignments are stable for identical inputs.
Recompute cadence is explicit and diagnosable.
Background clustering cannot directly mutate graph truth or bypass reducer-owned enrichment.

Feature Target 3: Define Layout Consumption Rules

Target 3 Context

The force-layout spec already says semantic clustering is a post-step extension force. This plan needs to define how richer cluster assignments feed that force without becoming a second hidden layout engine.

Target 3 Tasks

Define how cluster assignments feed layout behavior: centroid targets, affinity groups, or other explicit extension-force inputs.
Keep semantic clustering independent from domain clustering and frame-affinity behavior, while allowing them to compose predictably.
Define profile and diagnostics surfaces for enabling, weighting, or disabling semantic clustering effects.
Ensure semantic clustering remains a toggleable behavioral consumer rather than an always-on replacement for baseline layout semantics.

Target 3 Validation Tests

Enabling semantic clustering measurably changes related-node positions.
Disabling the feature removes its spatial effect without altering graph truth.
Semantic clustering composes with domain clustering rather than silently overriding it.

Feature Target 4: Make The Results Explainable And User-Visible

Target 4 Context

The enrichment umbrella already sets the prototype rule: explain before automate. Semantic clustering should not become a hidden grouping engine that users cannot inspect, reject, or reason about.

Target 4 Tasks

Expose requested vs resolved semantic clustering state through diagnostics.
Provide user-facing explanation hooks for why nodes are being grouped semantically.
Keep clustering provenance aligned with the enrichment inspector/filter surfaces instead of a physics-only debug panel.
Define how suggested or low-confidence semantic inputs affect clustering policy.

Target 4 Validation Tests

A user can inspect why a node is participating in a semantic cluster.
Diagnostics distinguish semantic clustering from other organizer helpers.
Low-confidence or missing semantic inputs degrade according to documented policy.

Exit Condition

This plan is complete when Graphshell has a documented and testable semantic clustering pipeline that starts from attributable semantic inputs, computes cluster assignments out-of-band, feeds them into explicit layout behavior, and exposes the result through both diagnostics and enrichment-facing explanation surfaces.

2026 04 03_semantic_clustering_follow_on_plan - mark-ik/graphshell GitHub Wiki

Semantic Clustering Follow-On Plan (2026-04-03)

Context

Non-Goals

Feature Target 1: Define the Semantic Input Contract

Target 1 Context

Target 1 Tasks

Target 1 Validation Tests

Feature Target 2: Land the Out-Of-Band Clustering Pipeline

Target 2 Context

Target 2 Tasks

Target 2 Validation Tests

Feature Target 3: Define Layout Consumption Rules

Target 3 Context

Target 3 Tasks

Target 3 Validation Tests

Feature Target 4: Make The Results Explainable And User-Visible

Target 4 Context

Target 4 Tasks

Target 4 Validation Tests

Exit Condition

⚠️ GitHub.com Fallback ⚠️

2026 04 03_semantic_clustering_follow_on_plan - mark-ik/graphshell GitHub Wiki

Semantic Clustering Follow-On Plan (2026-04-03)

Context

Non-Goals

Feature Target 1: Define the Semantic Input Contract

Target 1 Context

Target 1 Tasks

Target 1 Validation Tests

Feature Target 2: Land the Out-Of-Band Clustering Pipeline

Target 2 Context

Target 2 Tasks

Target 2 Validation Tests

Feature Target 3: Define Layout Consumption Rules

Target 3 Context

Target 3 Tasks

Target 3 Validation Tests

Feature Target 4: Make The Results Explainable And User-Visible

Target 4 Context

Target 4 Tasks

Target 4 Validation Tests

Exit Condition

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️