LLM stacks \ 4.5.1 External data RAG - terrytaylorbonn/auxdrone GitHub Wiki

25.0717 Lab notes (Gdrive), Git

  • Agent calls RAG DB API ✅. Why does the agent decided to call RAG?
    • Depends on implementation! In simple_rag.py, it always calls RAG first
    • More sophisticated agents might:
      • Check if question seems factual first
      • Try RAG, then fallback to general knowledge
      • Let the model decide: "Do I need to search documents?"
  • 1 Agent vectorizes prompt ✅
    • Agent uses embedding model (e.g., "nomic-embed-text") to convert prompt to vectors
    • Same embedding model was used to create the RAG database vectors
  • 2 Agent finds best match and retrieves ORIGINAL TEXT ✅
    • "Original text" is the actual readable text that was vectorized
    • The vectors are just for finding the text
    • for example: [return results["documents"][0]] ← This is readable text
  • 3 Agent gives retrieved text to model. Agent also sends the original human prompt.
    • For example :
      • context = relevant_docs[0][:200] # Retrieved text
      • prompt = f"Based on this: '{context}' Answer: {question}" # Both context + original question