On Graph Vault Synchronization - robbiemu/aclarai GitHub Wiki
aclarai maintains a bidirectional sync between Obsidian Markdown files and a Neo4j knowledge graph. To do this reliably, each atomic unit of content—called a block—must be:
- Uniquely identifiable
- Versioned
- Traceable to a graph node
This document defines what a “block” is, how it is marked in Markdown, and how the sync process detects changes, prevents overwrites, and updates Neo4j accordingly.
A block is the smallest unit of content that is tracked individually in the graph.
Vault Tier | Block Type | Example |
---|---|---|
Tier 1 (Raw Conversations) | A user utterance or sentence | "Alice: Let’s release v1.2 next week." |
Tier 2 (Summaries) | A paragraph, bullet, or summary sentence | • Alice proposed releasing v1.2 next week. |
Tier 3 (Concept Pages) | A definition, claim, or evidence entry | • Release cadence is discussed in 3 conversations. |
Tier 3/Global | A full, agent-generated report/summary |
Top Concepts.md , Trending Topics - 2024-05-22.md
|
aclarai tracks content at different granularities, and each tracked unit (a "block" in aclarai's terminology) is marked with unique identifiers and versioning information using invisible HTML comments. These comments are placed either inline within a file for smaller content units, or at the file level for entire agent-generated documents.
For individual sentences, paragraphs, or claims within a Markdown file, markers are embedded directly after the content.
Some text of the claim or summary. <!-- aclarai:id=clm_abc123 ver=2 -->
^clm_abc123
Marker | Purpose |
---|---|
<!-- aclarai:id=clm_abc123 ver=2 --> |
Hidden HTML comment for sync logic. Holds the unique ID and version number. |
^clm_abc123 |
Obsidian block anchor. Enables links like [[file#^clm_abc123]] . |
These markers travel with the text. If a user moves or reorders content, the ID and version stay intact.
For pages that are generated entirely by aclarai agents (e.g., Top Concepts.md
, Trending Topics - {date}.md
, [[Subject:XYZ]].md
), the aclarai:id
and ver=
markers apply to the entire file content. These markers are placed as an invisible HTML comment at the very end of the file.
## Trending This Week
- [[Concept X]]
- [[Concept Y]]
<!-- aclarai:id=file_trending_20240522 ver=1 -->
- The
aclarai:id
(e.g.,file_trending_20240522
,subject_gpu_errors
) provides a unique ID for the entire file. - The
ver=
(e.g.,ver=1
) is the version number for the entire file content, which increments on any semantic change. - The ID format (e.g.,
file_
,subject_
,concept_
) will indicate the type of generated page.
This consistent HTML comment approach ensures robust tracking while maintaining compatibility with Obsidian and other Markdown parsers. When aclarai processes these files, it hashes the entire semantic content (excluding these markers) to detect changes for versioning.
- Runs in the background.
- Watches for
.md
changes in Tier 2 and Tier 3 folders. - Batches events (e.g., using
watchdog
in Python orchokidar
in JS).
- Parse old and new versions of a file into a Markdown Abstract Syntax Tree (AST).
- Scan for
aclarai:id
comments. - Build a mapping:
{id: (version, content)}
.
Change | Action |
---|---|
New block | Add new node in Neo4j: CREATE (:Claim {id, ver=1, text, ...})
|
Edited block | If version matches graph: SET ver = ver+1, text = $new
|
Missing block | Optionally DETACH DELETE in Neo4j, or flag as archived
|
Version mismatch | Skip update. Flag conflict. Log: “Vault out of sync with graph.” |
When aclarai modifies a .md
file:
- Generate updated Markdown with inserted/updated
aclarai:id
comments. - Write to
.filename.md.tmp
. -
fsync()
. -
rename(tmp, filename.md)
.
This ensures that Obsidian or git sees either the old file or the fully updated file—never a partially written file.
Each block’s ver=N
(within its HTML comment) lets the system detect concurrent edits:
- Graph holds version
N
. - Vault sends update for version
N
. - Graph increments to
N+1
. - If graph already had
N+1
, the update is rejected—vault was stale.
Fallback: log it, surface a <!-- aclarai:conflict ... -->
comment, or queue it for review.
Claimify never judges a sentence in total isolation. Each focal block is processed with a fixed, finite slice of neighbouring sentences:
Stage |
p (sentences before) |
f (sentences after) |
---|---|---|
Selection | 5 | 5 |
Disambiguation | 5 | 0 |
Decomposition | 5 | 0 |
WINDOW =
max(p, f)
across all stages → 5 in the default configuration.
- Dirty detection – Hash each block’s visible text; any change marks that block dirty.
-
Window expansion (non‑transitive) – For every dirty index i, add indices
i‑WINDOW … i+WINDOW
to aset<int>
. - Deduplicate – The set contains every block that could be in‑prompt for any dirty block.
-
Single Claimify pass – Run Claimify once per block in the set, supplying the exact
p
,f
required by each stage. -
Graph update – Increment
ver
on every block in the set (claims may change even if the text didn’t).
- The expansion is one hop only; neighbours are not further expanded.
- The size of the batch is bounded by
11
sentences per dirty block with defaults (≤ file length overall). - No block is queued twice in the same cycle.
An edit to S[i]
therefore triggers exactly one batch, covers all sentences Claimify can see (±5
), and halts.
If you later adjust p
or f
, run a single full‑vault migration that reprocesses every block once with the new parameters. Routine edits then continue with the new window.
A daily job should:
- Walk the vault
- Parse all known
aclarai:id
blocks (including file-level IDs) - Hash the visible text
- Compare to graph values
- Queue any drifted blocks for reprocessing
- Sync is block-level, not file-level (though files can be treated as single blocks).
- Sync is optimistic—it never overwrites user edits without a version check.
- Sync is content-aware—it diffs ASTs, not lines.
- Sync is safe—file writes are atomic.
- Claimify block processing always respects local context windows.