Vault Rummager: How I was unsuccessful in building a chat bot to query my second brain - lindseyburnett/vault-rummager GitHub Wiki

🧠 Vault Rummager: How I was unsuccessful in building a chat bot to query my second brain

TL;DR

I tried to build a custom chatbot over my personal notes using ChromaDB, SentenceTransformers, and Ollama. The goal was to “ask my notes questions,” like an AI-powered second brain. Instead, I built a little assistant that either:

  • confidently dodges my questions,
  • hallucinates with flair,
  • or politely tells me it has no idea what I’m talking about.

This is the story of Vault Rummager.


🔮 The Vision

I set out to build a local AI assistant to answer questions about my personal notes. I use Obsidian for my notes, so they're in markdown format. There are hundreds, if not thousands of files. Since my notes sometimes have sensitive information in them, I wanted a local solution because I don't want to upload that sensitive data anywhere. Think of it as ChatGPT++.

I called it Vault Rummager. Here was the vision:

  • Parse my Markdown notes directory (a chaotic tangle of half-thoughts, acronyms, and Taylor Swift trivia).
  • Split notes into chunks, embed them using sentence-transformers.
  • Store the chunks in a vector database (ChromaDB).
  • When I ask a question, embed the query and retrieve relevant chunks.
  • Send a prompt to Ollama (running gemma:2b) with just those chunks.
  • Profit? 🫠

I was hoping to use it to be able to ask things like:

  • "Hey what was the command that I used to solve that problem I encountered that one time?"
  • "Hey can you summarize what I worked on over this last quarter?" to help with professional development goals

🛠️ Toolchain

And to REALLY make this extra AI-y, I used chatgpt to help me come up with an outline of my plan, and then I would prompt it for each step to write most of the code for me because I'm not a python developer.

  • ChromaDB: lightweight, local vector store with nice Python bindings.
  • SentenceTransformers: specifically all-MiniLM-L6-v2 for embedding.
  • Ollama: serving local LLMs via an HTTP API (I used gemma:2b).
  • Python scripts: parse_notes.py, embed_notes.py, and chat.py made up the brain.

🪵 The Reindexing Gauntlet

Initial excitement died quickly when I ran:

poetry run python scripts/reindex.py --reset

...and got:

⚠️ Duplicate chunk_id skipped: TSWIFT_md_0
⚠️ Duplicate chunk_id skipped: TSWIFT_md_1
...
chromadb.errors.DuplicateIDError: Expected IDs to be unique, found 55 duplicated IDs

Turns out my reset flag was a liar. It parsed notes and skipped duplicates but never truly reset Chroma.

Fix: Actually call client.delete_collection(...) when reset=True before re-adding anything. Revolutionary, I know.

Before this exercise, I wasn't concerned that AI has gotten so smart that it can rewrite it's own code and computers will take over the world. This exercise solidified my position on the matter. Remember, I had been asking ChatGPT to write all this python code for me.


🔍 The Birthdate Chunk Debacle

At one point, I asked a very simple question:

When was Taylor Swift born?

VaultBot answered confidently:

The context does not provide information about the birth date of Taylor Swift, so I cannot answer this question from the provided context.

🙃

But I knew for a fact that one of my note chunks literally said:

Taylor Swift was born on December 13, 1989

So what gives?

❗ Problem: The Birthdate Chunk Was Buried

Here’s what happened:

  • Chroma returned 10 chunks most similar to the question embedding.
  • The birthdate lived in an early chunk of a long markdown file.
  • But the top-ranked chunks were about Abercrombie modeling, RCA showcases, and Maybelline CDs.

The sentence I cared about wasn’t semantically close enough to beat those other entries. Apparently this is a classic case of the vector DB doing its job... just not the one I wanted.

Fixes I explored:

  • Log and inspect top results + similarity scores.
  • Set a similarity threshold, and fallback to Ollama alone if none exceeded it.
  • Print the “best” chunk used to add transparency (and more disappointment).
  • Consider pre-chunking notes more meaningfully — headlines, paragraphs, etc.

🤥 Hallucination Mitigation

Another hiccup: if context was vague or irrelevant, Ollama filled in the blanks with creative nonsense. To stop the bot from playing Mad Libs with my notes, I updated the prompt to explicitly say:

"If the answer is not in the context, respond with: 'The answer is not in the provided notes.' Do not make up facts or assume information not present in the context."

This helped a bit — especially once I turned off chat history threading in the prompt (too noisy, often misleading).


🧱 Lessons Learned

✅ What Worked

  • Embedding and storing chunks in ChromaDB was easy and fast.
  • Ollama worked great locally — fast and predictable with gemma:2b.

❌ What Didn’t

  • Similarity ranking wasn't good enough to always retrieve relevant info.
  • My chunk hashing logic. My chunks were too granular in some cases, too long in others.
  • LLM hallucinations needed stricter prompt scaffolding.

🧁 Closing Thoughts

I didn’t end up with a perfect second brain, but I did end up with a locally hosted, overly verbose assistant who talks like they read my notes once during a coffee break.

Ultimately, vault rummager will accurately tell me Taylor Swift's age as long as there is only one note with just this as the file contents.

**Taylor Alison Swift** born December 13, 1989

correct answer

The fallback logic is clearly still broken - it can't tell you what 2+2 equals.

fallback broken demo

But I learned a ton about embeddings, vector search tradeoffs, prompt engineering, and Ollama integration. I also learned how much of an art it is to chunk data and determine an appropriate threshold levels. Art has never really been my thing.


🔗 Repo

https://github.com/lindseyburnett/vault-rummager