2026 02 11_clipping_dom_extraction_plan - mark-ik/graphshell GitHub Wiki
Status: Archived 2026-04-03 after landing the current-state viewer clipping slice Phase: Viewer lane, capture-focused Architecture: backend context-menu adapter -> Graphshell-owned inspector surface -> read-only DOM extraction -> explicit clip materialization into graph nodes
Archive note: This file is retained as the record of the landed clipping execution slice. Active remaining viewer-lane work now lives in graphshell_docs/implementation_strategy/viewer/2026-04-03_clipping_viewer_follow_on_plan.md.
Related:
clipping_and_dom_extraction_spec.mdVIEWER.md../graph/2026-03-11_graph_enrichment_plan.md../aspect_projection/ASPECT_PROJECTION.md../aspect_distillery/ASPECT_DISTILLERY.md../system/2026-03-12_architectural_inconsistency_register.md
This plan is the viewer-owned clipping plan, not a general document-analysis roadmap.
Its job is to keep one product lane clear:
- inspect live page structure inside a viewer,
- select the correct element or small set of elements,
- materialize clips explicitly,
- preserve enough local metadata for provenance, display, and re-open.
This document no longer treats clipping as the owner of broader site/document analysis features such as outbound-link harvesting, selector recipes, or typed extraction artifacts. Those remain valid follow-on ideas, but they must live under the correct downstream lanes.
Posterity note (2026-04-02): this plan was intentionally rewritten after the viewer/runtime and enrichment seams were clearer in code. The main change was not feature removal; it was boundary cleanup. Clipping was narrowed to inspect/select/materialize/hand-off, broader document analysis was split into follow-on lanes, and the recommended bridge resolution changed from "maybe explicit node type" toward an explicit clip content facet with #clip retained as derived compatibility state.
The current runtime shape is no longer greenfield. The following are already landed in code:
- clip inspector runtime state and non-modal inspector surface,
- clip materialization helpers for single and batch capture,
- pointer-stack inspection / stacked-element stepping in the inspector,
- inherited source-classification carryover onto clip nodes with explicit inherited provenance,
- clip route handling through
Address::Clip(...)/AddressKind::GraphshellClip, - bridge acceptance for both
verso://clip/...and legacygraphshell://clip/....
This plan therefore focuses on:
- tightening the viewer-lane contract,
- recording what is landed versus deferred,
- preventing broader analysis ideas from silently becoming viewer-lane scope.
- context-menu or contextual-surface entry into page inspection,
- single-hit clip capture for simple cases,
- inspector-first candidate discovery for complex pages,
- pointer-stack and candidate inspection over live page content,
- explicit materialization of one or more selected clips,
- clip-local metadata capture needed for provenance, display, re-open, and enrichment handoff.
- site-wide link harvesting as graph mutation,
- automatic document-to-graph projection of full extracted structure,
- selector-driven batch extraction recipes that create graph artifacts without a separate analysis contract,
- distillation or intelligence workflows beyond clip-local metadata inheritance,
- treating the entire page element tree as durable graph truth by default.
The clipping lane is owned by the viewer stack. Browser-native context meaning is an adapter seam that feeds Graphshell-owned inspection and clip actions.
The canonical complex-page workflow is:
- user invokes clip/inspect from page context,
- Graphshell extracts one hit or a bounded candidate set,
- user inspects/filter/selects in a Graphshell-owned surface,
- Graphshell materializes clips explicitly.
Direct clipping outside inspector mode remains allowed for simple one-hit capture. It is a convenience path, not the canonical multi-element discovery path.
The exploded inspector direction remains valuable, but it must be read as temporary inspection state unless and until a later projection contract gives it stronger semantics.
For this plan:
- inspector state is viewer/runtime state,
- clip nodes are durable graph artifacts created only on explicit materialization,
- entering inspector mode must not materialize the page structure into the user graph by default.
#clip remains the current bridge carrier for clip semantics. That is acceptable for the current slice, but it is still an active architectural inconsistency: the system is using a tag like a content/type facet.
This plan must not deepen that ambiguity. Treat #clip as a current bridge, not settled long-term authority. See ../system/2026-03-12_architectural_inconsistency_register.md.
Recommended resolution:
- keep nodes as the primary identity model,
- introduce an explicit clip content facet rather than a broad top-level node-type hierarchy,
- treat
#clip, clip badge state, andis:clip-style query affordances as derived compatibility projections from that facet.
Recommended shape:
NodeContentFacet::Clip(ClipFacetData)Where ClipFacetData owns clip-specific truth such as stable clip identity, source provenance, capture metadata, and future storage references.
Clip route handling is no longer just a historical graphshell://clip/<uuid> idea.
Current effective contract:
- clip addresses are represented as
Address::Clip(...), - they surface as
AddressKind::GraphshellClip, - runtime bridge accepts both
verso://clip/...and legacygraphshell://clip/....
This plan therefore stops presenting graphshell://clip/... as the likely runtime direction. The route family remains a bridge period with verso://clip/... accepted and the legacy alias retained.
- Graphshell has a clip inspector state carrier and inspector panel.
- Graphshell supports single clip creation and multi-clip materialization from extracted captures.
- Pointer-stack stepping exists as the current in-situ inspection affordance.
- clip capture payload already includes the local metadata needed for clip rendering and provenance handoff,
- clip content now persists through an explicit clip content facet bridge stored in node-owned state,
- runtime viewer display for clip nodes is synthesized from stored clip HTML rather than treating the route URL itself as render authority,
- source classifications can be inherited onto clips with
InheritedFromSourceprovenance and non-accepted status, - clip creation already participates in the enrichment lane as a concrete Stage C producer.
- clip route typing exists in the graph model via
Address::Clip(...)andAddressKind::GraphshellClip, - clip route identity now survives snapshot restore rather than being rewritten back to the source page URL,
- both
verso://clip/...andgraphshell://clip/...are accepted during the bridge period.
- opening a clip route now resolves to the matching clip node pane rather than only pivoting to History Manager,
- live and historical omnibar/search surfaces now match and label clips by user-visible clip title and source URL rather than leaking internal
verso://clip/...identity strings, - user-facing workbench, navigator, accessibility, toolbar, and tag-panel labels now prefer clip-visible metadata over internal route identity.
These remain valid viewer-lane follow-ons:
- stronger inspector ergonomics beyond the current panel and pointer-stack flow,
- richer clip fidelity choices (
Clean,Contextual,Screenshot Note,Offline Slice), - more robust extraction coverage for complex page structures,
- dedicated clip content storage/route cleanup after the route bridge settles,
- richer clip presentation and provenance chrome beyond the now-landed visible title/source cleanup.
These are deferred viewer improvements, not separate feature lanes.
If the exploded inspector grows into a richer temporary element-tree or element-graph view, that work belongs under the Projection aspect as a derived local world rather than durable graph truth.
Clip-derived classifications, content-kind hints, inherited metadata, and explanation/filter surfaces continue under the graph enrichment lane. This plan only hands off clip-local metadata; it does not own the enrichment system.
Outbound-link extraction, selector-driven extraction recipes, and broader document-analysis batches are intentionally split out from clipping. They require their own contract under graph/projection or graph/enrichment depending on whether the output is temporary derived representation or durable graph metadata.
Any future workflow that turns page or clip content into typed extracted artifacts must depend on the Distillery aspect and privacy-boundary rules. That work does not belong inside the viewer clipping plan.
Nostr publication is no longer part of the core clipping execution path in this plan. It may remain a downstream integration that consumes clip artifacts, but it should not shape the viewer-lane clipping architecture.
- keep the plan and spec aligned with the real runtime seam,
- document inspector-first as the canonical complex-page workflow,
- document direct one-hit clipping as a narrow convenience path.
- describe clips in terms of
Address::Clip(...)/AddressKind::GraphshellClip, - record
verso://clip/...plus legacygraphshell://clip/...bridge behavior, - keep route identity distinct from runtime render URL synthesis,
- avoid reasserting
data:URLs as the long-term clip authority model.
- preserve broader ideas in roadmap language,
- attach them to projection, enrichment, analysis, or distillery follow-ons,
- do not let viewer-lane docs imply ownership of those future systems.
- continue using
#clipas the current bridge carrier, - link explicitly to the architectural inconsistency rather than treating the issue as settled,
- avoid adding new semantics that depend on
#clipbeing the final node-type carrier, - prepare the bridge to collapse into an explicit clip content facet rather than a broad node-type system.
- A reader can tell which parts of clipping are already landed versus future work.
- A reader can tell where viewer-owned clipping ends and broader document/site analysis begins.
- The plan no longer implies that entering inspector mode materializes graph truth by default.
- The route/address story reflects current code reality rather than historical
graphshell://clip/...assumptions. - User-facing clip labels and search surfaces no longer depend on internal route identity for display.
- Broader ideas like link extraction and selector-driven analysis remain on the roadmap, but are attached to explicit downstream lanes rather than hidden inside the viewer plan.
- Default workflow: inspector-first for complex pages, direct clipping allowed for simple cases.
- Default ownership: viewer lane owns inspection and explicit clip creation only.
- Default bridge stance:
#clipremains current bridge carrier, not final content/type authority; recommended destination is an explicit clip content facet with derived tag/badge/query projections. - Default projection stance: exploded inspector remains temporary derived inspection state until a projection contract says otherwise.