Vault Rummager: How I was unsuccessful in building a chat bot to query my second brain - lindseyburnett/vault-rummager GitHub Wiki
🧠 Vault Rummager: How I was unsuccessful in building a chat bot to query my second brain
TL;DR
I tried to build a custom chatbot over my personal notes using ChromaDB, SentenceTransformers, and Ollama. The goal was to “ask my notes questions,” like an AI-powered second brain. Instead, I built a little assistant that either:
- confidently dodges my questions,
- hallucinates with flair,
- or politely tells me it has no idea what I’m talking about.
This is the story of Vault Rummager.
🔮 The Vision
I set out to build a local AI assistant to answer questions about my personal notes. I use Obsidian for my notes, so they're in markdown format. There are hundreds, if not thousands of files. Since my notes sometimes have sensitive information in them, I wanted a local solution because I don't want to upload that sensitive data anywhere. Think of it as ChatGPT++.
I called it Vault Rummager. Here was the vision:
- Parse my Markdown notes directory (a chaotic tangle of half-thoughts, acronyms, and Taylor Swift trivia).
- Split notes into chunks, embed them using
sentence-transformers
. - Store the chunks in a vector database (ChromaDB).
- When I ask a question, embed the query and retrieve relevant chunks.
- Send a prompt to Ollama (running
gemma:2b
) with just those chunks. - Profit? 🫠
I was hoping to use it to be able to ask things like:
- "Hey what was the command that I used to solve that problem I encountered that one time?"
- "Hey can you summarize what I worked on over this last quarter?" to help with professional development goals
🛠️ Toolchain
And to REALLY make this extra AI-y, I used chatgpt to help me come up with an outline of my plan, and then I would prompt it for each step to write most of the code for me because I'm not a python developer.
- ChromaDB: lightweight, local vector store with nice Python bindings.
- SentenceTransformers: specifically
all-MiniLM-L6-v2
for embedding. - Ollama: serving local LLMs via an HTTP API (I used
gemma:2b
). - Python scripts:
parse_notes.py
,embed_notes.py
, andchat.py
made up the brain.
🪵 The Reindexing Gauntlet
Initial excitement died quickly when I ran:
poetry run python scripts/reindex.py --reset
...and got:
⚠️ Duplicate chunk_id skipped: TSWIFT_md_0
⚠️ Duplicate chunk_id skipped: TSWIFT_md_1
...
chromadb.errors.DuplicateIDError: Expected IDs to be unique, found 55 duplicated IDs
Turns out my reset
flag was a liar. It parsed notes and skipped duplicates but never truly reset Chroma.
Fix: Actually call client.delete_collection(...)
when reset=True
before re-adding anything. Revolutionary, I know.
Before this exercise, I wasn't concerned that AI has gotten so smart that it can rewrite it's own code and computers will take over the world. This exercise solidified my position on the matter. Remember, I had been asking ChatGPT to write all this python code for me.
🔍 The Birthdate Chunk Debacle
At one point, I asked a very simple question:
When was Taylor Swift born?
VaultBot answered confidently:
The context does not provide information about the birth date of Taylor Swift, so I cannot answer this question from the provided context.
🙃
But I knew for a fact that one of my note chunks literally said:
Taylor Swift was born on December 13, 1989
So what gives?
❗ Problem: The Birthdate Chunk Was Buried
Here’s what happened:
- Chroma returned 10 chunks most similar to the question embedding.
- The birthdate lived in an early chunk of a long markdown file.
- But the top-ranked chunks were about Abercrombie modeling, RCA showcases, and Maybelline CDs.
The sentence I cared about wasn’t semantically close enough to beat those other entries. Apparently this is a classic case of the vector DB doing its job... just not the one I wanted.
Fixes I explored:
- Log and inspect top results + similarity scores.
- Set a similarity threshold, and fallback to Ollama alone if none exceeded it.
- Print the “best” chunk used to add transparency (and more disappointment).
- Consider pre-chunking notes more meaningfully — headlines, paragraphs, etc.
🤥 Hallucination Mitigation
Another hiccup: if context was vague or irrelevant, Ollama filled in the blanks with creative nonsense. To stop the bot from playing Mad Libs with my notes, I updated the prompt to explicitly say:
"If the answer is not in the context, respond with: 'The answer is not in the provided notes.' Do not make up facts or assume information not present in the context."
This helped a bit — especially once I turned off chat history threading in the prompt (too noisy, often misleading).
🧱 Lessons Learned
✅ What Worked
- Embedding and storing chunks in ChromaDB was easy and fast.
- Ollama worked great locally — fast and predictable with
gemma:2b
.
❌ What Didn’t
- Similarity ranking wasn't good enough to always retrieve relevant info.
- My chunk hashing logic. My chunks were too granular in some cases, too long in others.
- LLM hallucinations needed stricter prompt scaffolding.
🧁 Closing Thoughts
I didn’t end up with a perfect second brain, but I did end up with a locally hosted, overly verbose assistant who talks like they read my notes once during a coffee break.
Ultimately, vault rummager will accurately tell me Taylor Swift's age as long as there is only one note with just this as the file contents.
**Taylor Alison Swift** born December 13, 1989
The fallback logic is clearly still broken - it can't tell you what 2+2 equals.
But I learned a ton about embeddings, vector search tradeoffs, prompt engineering, and Ollama integration. I also learned how much of an art it is to chunk data and determine an appropriate threshold levels. Art has never really been my thing.