SUBSYSTEM_STORAGE - mark-ik/graphshell GitHub Wiki
Status: Active / Project Goal
Subsystem label: storage
Long form: Persistence & Data Integrity Subsystem
Scope: WAL journal integrity, snapshot consistency, serialization round-trip correctness, single-write-path enforcement, at-rest encryption, and named-graph/workspace-layout data integrity โ across all persistence paths
Subsystem type: Cross-Cutting Runtime Subsystem (see TERMINOLOGY.md)
Peer subsystems: diagnostics (Diagnostics), accessibility (Accessibility), security (Security & Access Control), history (Traversal & Temporal Integrity)
Doc role: Canonical subsystem implementation guide (summarizes guarantees/roadmap and links to detailed persistence specs/code references; avoid duplicate persistence design docs unless needed)
Sources consolidated:
-
2026-02-22_registry_layer_plan.mdPhase 6 (three-authority-domain boundary, single-write-path enforcement,pub(crate)boundary lock) -
services/persistence/mod.rs(GraphStore: fjall WAL + redb snapshots + rkyv serialization + zstd compression + AES-256-GCM encryption) -
2026-03-08_unified_storage_architecture_plan.md(storage track split and durability-boundary clarification) -
2026-03-11_graphstore_vs_client_storage_manager_note.md(GraphStore vs future WHATWG-style browser client storage boundary) -
archive_docs/โ historical persistence plans (superseded by this document) Related:SUBSYSTEM_SECURITY.mdยง3.4 (cryptographic correctness invariants overlap),../aspect_control/settings_and_control_surfaces_spec.md(settings/control surfaces consume persisted configuration state),../canvas/2026-03-14_graph_relation_families.md(durable vs session-only relation tiers)
Policy authority: This file is the single canonical policy authority for the Storage subsystem. Supporting storage docs may refine contracts, interfaces, and execution details, but must defer policy authority to this file. Policy in this file should be distilled from canonical specs and accepted research conclusions.
Adopted standards (see 2026-03-04_standards_alignment_report.md for full rationale):
-
RFC 3986 โ URI syntax for internal scheme parsing (
parser.rs); all internal address tokens must be RFC 3986-valid. -
RFC 4122 UUID v4 โ node identity (
NodeId); stable across sessions, no ordering semantics. -
RFC 4122 UUID v7 โ WAL journal entry tokens only; time-ordered sequencing. Must not be used for
NodeId. -
XDG Base Directory Specification (via
directoriescrate) โ canonical storage path semantics across platforms. Data โXDG_DATA_HOME, config โXDG_CONFIG_HOME, cache โXDG_CACHE_HOME. -
WHATWG Storage Standard โ adopted for future browser-origin client storage coordination (
ClientStorageManager), not for Graphshell app-state durability. Governs storage-key/scoped shed/shelf/bucket/bottle modeling when Graphshell grows a Servo-facing site-data authority. - FIPS 197 / NIST SP 800-38D โ AES-256-GCM at-rest encryption. 256-bit key, 12-byte nonce, 16-byte GCM tag. Nonce never reused.
Referenced as prior art (no conformance obligation):
- OAIS (ISO 14721) โ SIP/AIP/DIP vocabulary informs export/archive design. Not adopted due to disproportionate fixity/format-migration obligations.
-
RFC 6902 JSON Patch โ not adopted for local undo/redo (conflicts with
NodeIndexinstability); see standards report ยง4.3.
- Single-write-path policy: Durable graph mutation must flow through canonical write paths; side-channel persistence writes are disallowed.
- WAL-first integrity policy: Journal and snapshot paths must remain consistent and recoverable under interruption/failure.
- Roundtrip-safety policy: Serialization/deserialization and schema evolution must preserve state or degrade explicitly.
- Encryption-completeness policy: Sensitive persistence keyspaces require mandated cryptographic handling with explicit failure behavior.
- Recovery-observability policy: Recovery/snapshot corruption, fallback, and repair paths must be diagnosable and test-backed.
- Servo-compatibility-first policy for browser storage: Graphshell's browser-storage layer must align with Servo's storage-spec direction and avoid inventing a rival browser-storage model.
- Reference-truth vs storage-truth policy: Graph nodes and panes are not the default owners of browser site data; deleting a node does not implicitly purge the associated storage context.
- Shared-consumer policy: Settings pages, Navigator projections, workbench chrome, and diagnostics may expose persisted state, but they must not become parallel storage authorities or write-path exceptions.
The persistence layer is the single point where durable graph state transitions become durable. Every cold start depends on GraphStore.recover(), and every durable graph mutation depends on the canonical journal/snapshot path. A silent corruption in either path is an unrecoverable data loss event.
The dominant failure mode is silent contract erosion: a new serialization type is added without a round-trip test, a snapshot path writes unencrypted data, a new keyspace bypasses the WAL journal, a durable mutation path escapes the approved reducer/persistence boundary, or recovery silently skips corrupted entries without surfacing degraded state. None of these produce immediate errors. All produce data loss or integrity failure on the next recovery.
Without subsystem-level treatment, every change to Graph, every new LogEntry variant, every new persistence keyspace, every new named-snapshot path, and every new persisted workspace/settings payload becomes an unaudited integrity boundary crossing.
This subsystem authority is intentionally scoped to Graphshell-owned application
durability. A future browser-origin storage authority for IndexedDB,
localStorage, Cache API, OPFS, and related site data belongs to a separate
ClientStorageManager-style component and must not be conflated with
GraphStore.
Graphshell may still host a thin runtime orchestration layer above that browser storage authority to manage backend compatibility and user-facing compound actions, especially for Servo/Wry interoperability. That orchestration layer is not itself a second storage authority.
| Layer | Persistence Instantiation |
|---|---|
| Contracts | WAL integrity, snapshot consistency, serialization round-trip, single-write-path, encryption completeness, archive integrity โ ยง3 |
| Runtime State |
GraphStore (fjall WAL, redb snapshots, AES-256-GCM encryption, zstd compression); GraphWorkspace (single-write-path boundary via pub(crate)) |
| Diagnostics |
persistence.* channel family โ ยง5 |
| Validation | Round-trip tests, boundary contract tests, snapshot/recovery tests, encryption verification โ ยง6 |
The subsystem is not just "graph WAL + snapshots." It currently spans four related storage tracks:
- GraphDurability โ durable graph WAL, latest snapshot, named graph snapshots, graph recovery
- WorkspaceLayoutPersistence โ workspace layouts, session autosave rotation, persisted settings payloads currently stored through layout keys
- ArchivePersistence โ traversal archive, dissolved archive, export/clear/curation, replay support inputs
- PersistenceRecoveryAndHealth โ startup open/recover supervision, timeout fallback, degradation/health observability
The unified storage plan is the canonical staging document for closing the gaps between those tracks and the current guide language.
The storage subsystem now distinguishes two different storage authorities:
-
GraphStoreโ the landed persistence authority for Graphshell-owned app state: graph durability, layouts, archives, recovery, encryption, and health. -
Future
ClientStorageManagerโ a prospective Servo-compatible authority for browser-origin site data modeled on the WHATWG Storage Standard.
This distinction is architectural, not cosmetic.
-
GraphStoreis keyed by Graphshell app concepts and durability contracts. -
ClientStorageManagerwould be keyed by storage keys and site-data policy. -
GraphStoreowns Graphshell recovery semantics. -
ClientStorageManagerwould own bucket lifecycle, quota, persistence mode, and site-data clearing for web-origin storage endpoints.
The two authorities may share low-level path, crypto, diagnostic, or quota helpers, but they are not a single merged conceptual model.
Likewise, user-facing surfaces may summarize or mutate persisted state through approved routes, but must not grow separate durable stores for the same truth. Examples:
- settings pages configure persisted preferences through canonical write paths;
- Navigator/workbench project durable arrangement and history-derived state but do not own it;
- diagnostics reports persistence health but is not a repair authority.
Graphshell may additionally host a StorageInteropCoordinator policy layer that sits above browser storage truth and handles backend routing, Servo/Wry transition policy, and explicit compound actions. That layer must not take over ownership of storage-key/bucket metadata.
-
Complete journaling of durable graph mutations โ Every durable graph mutation that enters the approved reducer/app persistence path is journaled to fjall via
log_mutation(). No durable mutation path bypasses the journal. -
Sequence monotonicity โ
log_sequenceis monotonically increasing. No gaps, no reuse, no reset. A gap indicates corruption or truncation. -
Serialization fidelity โ
rkyv::to_bytes(entry)โ fjall โrkyv::from_bytes(stored)produces a bitwise-identicalLogEntry. Deserialization failure on any stored entry is a corruption/degradation event, not an invisible contract success. -
Keyspace isolation โ The three fjall keyspaces (
mutations,traversal_archive,dissolved_archive) are independent. A corruption in one does not affect the others. -
Archive append-only โ
archive_append_traversal()andarchive_dissolved_traversal()are append-only. Entries are never modified after write.
- Snapshot-journal coherence โ On recovery, the graph state is: latest snapshot + all journal entries after it. The snapshot and journal together must produce a valid graph state identical to what was in memory before shutdown.
-
Snapshot atomicity โ
take_snapshot()is an atomic redb write transaction. A crash during snapshot does not corrupt the snapshot DB. The previous snapshot remains valid. -
Periodic snapshot guarantee โ
check_periodic_snapshot()fires atsnapshot_intervalintervals. The interval is configurable but never zero. -
Named snapshot isolation โ Named graph snapshots (
save_named_graph_snapshot) are independent of the automatic snapshot. Saving/loading a named snapshot does not affect the automatic snapshot or the WAL sequence.
-
Graph round-trip โ For any
Graphvalueg,deserialize(serialize(g)) == g. This must hold for all node types, edge types, metadata, and workspace membership. -
LogEntry round-trip โ For any
LogEntryvaluee,rkyv::from_bytes(rkyv::to_bytes(e)) == e. This must hold for allGraphIntentvariants. -
Tile layout round-trip โ
load_tile_layout_json(save_tile_layout_json(json)) == json. JSON fidelity is preserved. -
Workspace layout round-trip โ
load_workspace_layout_json(save_workspace_layout_json(name, json)) == json. Named workspace layouts round-trip with name-key fidelity. -
Backward compatibility โ Legacy plaintext payloads (pre-encryption migration) are still readable.
decode_persisted_bytes()detects the absence ofGSEV0001magic and falls back to plaintext decode.
-
pub(crate)boundary โ Graph topology mutators ingraph/mod.rsarepub(crate). No external crate can callgraph.add_node()directly. - Durable-boundary exclusivity โ All durable graph mutations flow through the approved reducer/app persistence boundary. The subsystem does not require every ephemeral or view-local in-memory graph-adjacent mutation to be journaled.
-
Three-authority domains โ As defined in the registry layer plan Phase 6:
- Semantic graph: owned by
GraphWorkspace, mutated only viaapply_reducer_intents() - Spatial layout: owned by
Tree<TileKind>insideGraphWorkspace, driven by intents - Runtime instances: owned by
AppServices, reconciled vialifecycle_reconcile.rs
- Semantic graph: owned by
-
Compiler enforcement โ The
pub(crate)visibility restriction is a compile-time guarantee. Violation requires explicitpubescalation, which is reviewable.
Clarification:
- durable graph truth and graph citizenship are inside the storage contract,
- ephemeral graph-adjacent state such as transient view/layout/form-runtime state is not automatically part of WAL guarantees unless explicitly declared durable.
-
Default encryption โ All new data written to fjall or redb passes through
encode_persisted_bytes()which applies zstd compression then AES-256-GCM encryption. No path writes plaintext. -
Magic-byte detection โ
GSEV0001magic prefix distinguishes encrypted from legacy plaintext payloads. Every encrypted payload starts with this prefix + 12-byte nonce + ciphertext. -
Key provenance โ The AES-256-GCM key is loaded from the OS keychain (
keyringcrate) or generated and stored there on first use. The key never appears in logs or diagnostic output. -
Nonce freshness โ Each
encode_persisted_bytes()call generates a fresh 12-byte random nonce viaOsRng. Nonces are never reused (see Security subsystem ยง3.4). -
Legacy migration โ
has_legacy_plaintext_data()detects unencrypted data on open.migrate_legacy_plaintext_data()re-encodes it in place. After migration, no plaintext remains.
-
Traversal archive completeness โ Every dissolved traversal has its state journaled to
traversal_archive_keyspacebefore the dissolve mutation is applied. -
Dissolved archive completeness โ Every dissolve operation journals what was removed to
dissolved_archive_keyspace. -
Export fidelity โ
export_traversal_archive()andexport_dissolved_archive()produce valid String representations of all archive entries. No entry is silently skipped.
Persistence capability declarations are folded into the relevant registry entries:
Each viewer/surface declares:
state_persistence: full | partial | none // Can this surface's state be saved/restored?
undo_support: full | partial | none // Does this surface support undo/redo?
export_support: full | partial | none // Can content be exported?
notes: String
GraphStore itself declares:
journal_backend: fjall (append-only log)
snapshot_backend: redb (ACID transactions)
serialization: rkyv (zero-copy)
compression: zstd (level 3)
encryption: AES-256-GCM (OS keychain key)
These are not runtime-configurable but are documented for diagnostics and capability introspection.
fjall was selected as the WAL journal backend for the following reasons:
-
Append-only log semantics: fjall is a log-structured storage engine with explicit keyspace isolation. It makes the three-domain separation (
mutations,traversal_archive,dissolved_archive) a first-class storage concept, not an application-level convention. - Failure guarantees: fjall uses a crash-safe log-structured merge tree (LSM); partial writes are recoverable. This matches the WAL requirement that a sequence gap is detectable corruption, not a silent silent data loss.
- Pure Rust: no C FFI, no system library dependency, no WASM concern (fjall is host-only by design โ it links against the OS filesystem, which is appropriate for a desktop WAL backend).
- Upgrade story: fjall exposes a versioned keyspace API. Schema migration is additive keyspace extension; old keyspaces remain readable during migration windows.
-
WASM-clean boundary: fjall stays entirely in the host crate (
graphshell-desktopor equivalent).graphshell-corenever imports fjall; the WAL log entry types are WASM-clean structs that the host serializes into fjall. This matches the core/host split in../../technical_architecture/2026-03-08_graphshell_core_extraction_plan.md ยง2.5.
| Channel | Severity | Description |
|---|---|---|
persistence.store.opened |
Info | GraphStore successfully opened |
persistence.store.open_failed |
Error | GraphStore failed to open |
persistence.key.loaded |
Info | Persistence key loaded from keychain |
persistence.key.generated |
Info | New persistence key generated (first launch) |
persistence.key.unavailable |
Error | Keychain access failed |
persistence.journal.entry_written |
Info | Log entry successfully journaled |
persistence.journal.write_failed |
Error | Journal write failed |
persistence.journal.sequence_gap |
Error | Gap detected in log sequence numbers |
persistence.snapshot.taken |
Info | Periodic snapshot completed |
persistence.snapshot.failed |
Error | Snapshot write failed |
persistence.snapshot.named_saved |
Info | Named graph snapshot saved |
persistence.snapshot.named_loaded |
Info | Named graph snapshot loaded |
persistence.recovery.started |
Info | Recovery from snapshot+journal started |
persistence.recovery.succeeded |
Info | Recovery completed successfully |
persistence.recovery.failed |
Error | Recovery failed |
persistence.recovery.journal_replay_count |
Info | Number of journal entries replayed |
persistence.encryption.legacy_detected |
Warn | Legacy plaintext data detected |
persistence.encryption.migration_complete |
Info | Legacy data migrated to encrypted format |
persistence.encryption.decrypt_failed |
Error | AES-GCM decryption or tag verification failed |
persistence.serialization.roundtrip_failed |
Error | rkyv round-trip mismatch detected |
persistence.archive.traversal_appended |
Info | Entry added to traversal archive |
persistence.archive.dissolved_appended |
Info | Entry added to dissolved archive |
Current implementation note:
- startup/open/recover coverage already exists in runtime diagnostics under
startup.persistence.*andpersistence.recover.*, - the remaining work is steady-state integrity telemetry and a unified persistence health surface,
- this guide no longer treats the diagnostics story as entirely missing.
- Store status:
active/degraded (key unavailable)/failed - Journal: entry count, last write timestamp, sequence continuity check
- Snapshot: last snapshot timestamp, snapshot interval, named snapshot count
- Encryption:
encrypted/legacy-migration-pending/key-unavailable - Archive: traversal archive size, dissolved archive size
- Recovery: last recovery status, replay count
Shared projection/use rule:
- These health summaries may be reused by settings and nearby control surfaces as read-only status badges or links.
- UI surfaces should not recompute their own persistence-health model from raw storage internals when diagnostics/state snapshots already expose one.
Required watchdog invariants (start โ terminal pairs):
-
persistence.store.open_startedโopened | open_failed(5000ms) -
persistence.recovery.startedโrecovery.succeeded | recovery.failed(30000ms) -
persistence.snapshot.startedโsnapshot.taken | snapshot.failed(10000ms) -
persistence.encryption.migration_startedโmigration_complete | migration_failed(60000ms)
-
Round-trip tests (deterministic) โ For every serializable type (
Graph,LogEntry,GraphSnapshot,TileLayout,WorkspaceLayout): serialize โ deserialize โ assert equality. For every newGraphIntentvariant: serialize โ deserialize โ assert equality. These are the core contract tests. - WAL integrity tests โ Open store, write N entries, close, reopen, verify sequence continuity and entry fidelity.
- Snapshot/recovery tests โ Populate graph, snapshot, add more mutations, recover โ assert graph equals expected state.
-
Encryption tests โ Verify
encode_persisted_bytesโdecode_persisted_bytesround-trip. Verify corrupted ciphertext produces error (not silent truncation). Verify legacy plaintext fallback works. Verify nonce uniqueness across multiple calls. -
Boundary tests โ Attempt graph mutation from outside
apply_reducer_intents(): verify compilation failure (pub(crate)boundary) or runtime rejection. - Named snapshot tests โ Save, list, load, delete named snapshots. Verify named snapshot isolation (saving named doesn't affect automatic).
- Archive tests โ Append to traversal/dissolved archives, verify export, verify clear, verify recent-entries query.
Required checks for PRs touching:
-
services/persistence/โ Full persistence test suite. -
services/persistence/types.rsโ Round-trip tests for any new/modified type. -
graph/mod.rsโ Boundary enforcement (no newpubescalation without justification). - Any file adding new
GraphIntentvariants โ Must includeLogEntryround-trip test.
New serialization types or intent variants that lack round-trip tests are blocked by CI. This prevents the most common persistence regression: a type that serializes correctly today but deserializes incorrectly after a schema change.
- Full: Keychain available, encryption active, journal active, snapshots on schedule.
- Degraded (read-only): If journal write fails (disk full, permission denied), app enters read-only mode. Graph can be browsed but not mutated. Explicit diagnostics emitted.
- Degraded (no encryption): If keychain is unavailable but legacy data exists, data is accessible but new writes are blocked until key is available. No silent fallback to plaintext writes.
- Recovery mode: On startup, if snapshot is corrupted, attempt journal-only recovery. If journal is also corrupted, start with empty graph and emit critical diagnostic.
Current implementation note:
- startup timeout fallback and open/recover failure handling exist today,
- the explicit degraded-state model and health summary remain partial and are tracked by the unified storage plan.
- Degradation states emit to
persistence.*channels. - Diagnostic Inspector reflects persistence status prominently.
- User-visible indicators for: read-only mode, key unavailable, recovery failure.
- No silent data loss. Every unrecoverable corruption event produces an Error-severity diagnostic.
| Owner | Guarantees |
|---|---|
GraphStore |
Journal writes, snapshot atomicity, encryption, archive management. The single persistence authority. |
GraphWorkspace |
Single-write-path boundary. All mutations through apply_reducer_intents(). pub(crate) enforcement. |
AppServices |
Holds GraphStore handle. No other component has direct persistence access. |
Serialization types (types.rs) |
GraphSnapshot, LogEntry, TileLayout definitions. Round-trip correctness is their contract. |
| OS Keychain | Key storage. Persistence layer trusts but verifies (key format validation on load). |
Current priority order:
-
close the storage taxonomy gap (
GraphDurability,WorkspaceLayoutPersistence,ArchivePersistence,PersistenceRecoveryAndHealth) -
keep the
GraphStore/ futureClientStorageManagerseam explicit so app durability is not forced into browser site-data terminology -
add missing integrity telemetry and persistence health summary
-
formalize degraded persistence states
-
audit durable mutation boundaries against actual app logging paths and document intentional non-durable exceptions
-
Wire diagnostic channels โ Add
persistence.*channel family toDiagnosticsRegistry. Emit from allGraphStoremethods (open, journal, snapshot, recover, encrypt/decrypt). -
Add round-trip test coverage โ Verify every serializable type has an explicit round-trip test. Audit all
GraphIntentvariants forLogEntrycoverage. -
Add recovery integrity test โ Populate graph, snapshot, add mutations, corrupt snapshot, verify journal-only recovery produces correct state.
-
Add encryption edge-case tests โ Corrupted ciphertext โ error (not silent), corrupted nonce โ error, empty payload โ error, legacy plaintext โ fallback.
-
Wire health summary โ Expose journal count, snapshot schedule, encryption status, archive sizes in Diagnostic Inspector.
-
Add sequence continuity watchdog โ On open, verify
log_sequencehas no gaps. Emitpersistence.journal.sequence_gapif gaps detected. -
Document degradation states โ Wire read-only mode on journal failure. Wire blocked-write on key unavailability.
Based on the existing services/persistence/mod.rs (2340 lines):
What exists:
- fjall WAL with three keyspaces (mutations, traversal_archive, dissolved_archive) โ
- redb snapshot with ACID transactions โ
- rkyv serialization โ
- zstd compression โ
- AES-256-GCM encryption with OS keychain key โ
- Legacy plaintext migration โ
- Named graph snapshots โ
- Workspace layout persistence โ
-
pub(crate)boundary on graph mutators โ
What's missing:
- No
persistence.*diagnostic channels (errors arelog::warnonly) - No explicit round-trip tests for all
GraphIntentvariants - No sequence continuity validation on open
- No degradation-mode handling (read-only on failure)
- No persistence health summary in diagnostics pane
- No invariant watchdogs for long-running operations
- Some degradation-mode behavior (read-only transitions) requires app-level UX/state wiring beyond
GraphStore. - History subsystem replay/preview work depends on persistence archive and WAL guarantees being diagnosable first.
- Security subsystem overlaps on cryptographic correctness and keychain behavior; shared diagnostics naming/severity should stay aligned.
-
2026-02-22_registry_layer_plan.md(Phase 6 single-write-path and authority boundary contracts) -
services/persistence/mod.rs(current implementation reference) -
SUBSYSTEM_SECURITY.md(crypto/keychain overlap) -
SUBSYSTEM_HISTORY.md(archive/replay temporal integrity dependencies) -
SUBSYSTEM_DIAGNOSTICS.md(diagnostics infrastructure for persistence health) -
PLANNING_REGISTER.md(cross-subsystem sequencing and priorities)
Persistence is a guaranteed system property when:
- Every graph mutation flows through
apply_reducer_intents()โlog_mutation()with no bypass paths. - Every serializable type has an explicit round-trip test.
- Snapshot + journal recovery produces bit-identical graph state.
- All encryption paths are tested (including corruption detection and legacy fallback).
-
persistence.*diagnostic channels cover all operations with appropriate severity. - Sequence continuity is validated on open.
- Degradation modes (read-only, key-unavailable) are wired and tested.
- New intent variants without round-trip tests are blocked by CI.
- The single-write-path boundary (
pub(crate)) is maintained and reviewed.