On Refreshing Concept Embeddings - robbiemu/aclarai GitHub Wiki
This job ensures that each (:Concept)
node and its corresponding entry in the vector index reflect the current meaning of its Tier 3 Markdown file in the Obsidian vault. It detects semantic drift caused by manual edits and updates the embedding and graph metadata accordingly.
Concept files in the vault may be edited by users or aclarai agents. These changes can alter the concept’s meaning, requiring a refreshed embedding to:
- Maintain accurate vector search results
- Detect similarity collisions
- Ensure claim-to-concept linking remains relevant
-
All files matching
vault/concepts/*.md
-
Each file contains:
- Markdown text with semantic content
- Embedded metadata block:
<!-- aclarai:id=concept_<slug> ver=3 -->
import os, hashlib
for file in os.listdir("vault/concepts"):
if not file.endswith(".md"):
continue
concept_name = file[:-3] # strip .md
text = read_file(f"vault/concepts/{file}")
def strip_metadata(md):
return "\n".join([
line for line in md.splitlines()
if not line.startswith("<!-- aclarai:")
])
semantic_text = strip_metadata(text)
embedding_hash = hashlib.sha256(semantic_text.encode()).hexdigest()
result = neo4j.run("""
MATCH (c:Concept {name: $name}) RETURN c.embedding_hash
""", {"name": concept_name})
if result["embedding_hash"] == embedding_hash:
continue # No update needed
embedding = embed_text(semantic_text)
vector_store.upsert(concept_name, embedding)
neo4j.run("""
MATCH (c:Concept {name: $name})
SET c.embedding_hash = $hash,
c.last_updated = datetime()
""", {
"name": concept_name,
"hash": embedding_hash
})
Suppose the user edits:
# CUDA error
Common issue with PyTorch on Linux when using incompatible CUDA versions like 12.3 or 12.4. These may trigger “out of memory” or “invalid device function”.
<!-- aclarai:id=concept_cuda_error ver=4 -->
^concept_cuda_error
The next nightly job will:
-
Detect a new hash for this content
-
Recompute its embedding
-
Update both:
- The
concepts
vector store (e.g., PGVector or hnswlib) - The
(:Concept)
node in Neo4j withembedding_hash
andlast_updated
- The
- All
(:Concept)
nodes stay aligned with their Markdown definitions - Vector search and similarity checks remain meaningful
- Changes are visible in the graph and vector layers with minimal latency