Introduction to RAG - ua-datalab/Generative-AI GitHub Wiki

Understanding RAG: An AI Framework for Enhanced Information Retrieval

(Image credit: GoogleDeepmind, Unsplash.com)

What is RAG?

RAG, or "Retrieval-Augmented Generation", is a technique that combines the power of large language models (LLMs) with external knowledge sources. By retrieving relevant information from a knowledge base, RAG enables LLMs to generate more accurate, informative, and contextually aware responses. It functions like an intelligent librarian, swiftly locating relevant information. RAG assists AI models—akin to advanced computers capable of comprehending and generating text—in obtaining the most accurate and pertinent facts from a vast knowledge repository.

(Image credit: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401 [cs.CL])

What are Large Language Models (LLMs)?

Large Language Models, or LLMs, are AI systems that can read, write, and understand human language. They're trained on extensive text datasets from books, websites, and various sources. Picture LLMs as incredibly intelligent robots capable of human-like communication. However, they may occasionally lack access to the most current information.

Why do we need RAG?

RAG enhances LLMs by linking them to an external knowledge base—imagine a colossal library of facts and data that LLMs can access on demand. By employing RAG, LLMs can retrieve the most accurate and up-to-date information, ensuring their responses are both correct and relevant.

How does RAG work?

When an LLM requires information, RAG extracts facts from its external knowledge base—similar to consulting a librarian for the latest book on a subject. RAG locates the appropriate information and feeds it to the LLM, which then uses this data to formulate improved responses. This process allows users to understand the LLM's reasoning, enhancing the AI's transparency and trustworthiness.

What does "grounding" mean?

Grounding involves anchoring AI responses to real-world facts. It ensures that the AI's output is based on factual information rather than conjecture. This process is crucial for building user trust, as it guarantees the reliability of the information provided.

Why is this important for users?

RAG enables users to gain insight into the LLMs' answer generation process. When posing a question, users can trace the information's origin—similar to reviewing a research paper's sources. This transparency reinforces users' confidence in the responses they receive.

RAG Variants

Dense Retrieval: Uses compact vector representations for documents and queries, allowing for quick and accurate similarity searches.
Sparse Retrieval: Utilizes keyword matching and conventional search methods to find relevant information.
Hybrid Retrieval: Merges dense and sparse retrieval techniques to harness the advantages of both approaches.

Advantages of RAG

Improved Factual Accuracy: RAG utilizes a vast repository of information, analogous to a library, to retrieve facts. This enables it to provide more accurate and relevant answers to questions. For instance, when asked about the capital of France, RAG can swiftly locate and report that it's Paris, rather than resorting to guesswork or making errors.
Enhanced Contextual Understanding: RAG enhances machines' ability to grasp the context or background of a query. Consequently, it can deliver responses that align precisely with the intent of specific questions. For example, if queried about "bark," RAG can discern whether the inquiry pertains to a dog's vocalization or a tree's outer layer, based on the context provided.
Reduced Hallucination: In AI lingo, "hallucination" refers to the generation of false information. RAG mitigates this issue by relying on verified facts from its knowledge base. Rather than fabricating information, it adheres to established and verified data. For example, when asked about a historical event, RAG will furnish details grounded in actual records instead of creating a fictional narrative.
Ability to Handle Complex Queries: RAG excels at managing intricate questions that require information from diverse sources. This process resembles assembling a puzzle, where pieces from various origins are needed to complete the picture. For instance, when asked about the impact of climate change on polar bears, RAG can synthesize information from environmental studies, animal behavior research, and climate data to provide a comprehensive answer.

Disadvantages of RAG

Dependency on Knowledge Base Quality: The knowledge base acts as a vast information repository for the computer to reference. If this repository contains inaccurate or poor-quality data, the computer's responses will be similarly flawed or unhelpful. For instance, if the knowledge base erroneously states that London is the capital of France instead of Paris, the computer will propagate this misinformation. Thus, the quality of the knowledge base directly correlates with the accuracy of the answers provided.
Computational Overhead: RAG can be computationally intensive, demanding significant processing power and time. As the computer searches for and processes information, it consumes resources such as memory and processing capacity. This process is analogous to searching for a specific book in an enormous library—time-consuming and labor-intensive. When the computer must sift through numerous documents, it may experience reduced efficiency and slower performance. Consequently, this can increase operational costs due to the need for more powerful computing systems to manage the workload effectively.
Potential for Bias: Bias in this context refers to unfair prejudice or favoritism. If the knowledge base contains biased information, the computer's generated answers will inevitably reflect these biases. For example, if the repository primarily includes sources that present only one perspective on a topic, the computer will provide responses limited to that viewpoint. This can result in misleading or unbalanced answers. A case in point: if a knowledge base predominantly features information praising a specific political party, the computer's responses might disproportionately favor that party, neglecting alternative perspectives.

Learning Resources

Haystack: An open-source framework for building RAG applications.
Hugging Face Transformers: A library for state-of-the-art natural language processing.
LangChain: A framework for building LLM applications.
Hugging Face RAG Tutorial
What is retrieval-augmented generation, and what does it do for generative AI?.

Jupyter Notebook Example

[!note] 📔 Read and execute the next Jupyter Notebook in Google Colab for this session.

Code Example: A Simple RAG Pipeline Demo

from transformers import pipeline, AutoTokenizer
from haystack import DocumentStore, FAISSDocumentStore, Retriever

# Create a document store
document_store = FAISSDocumentStore(embedding_model="sentence-transformers/all-MiniLM-L6-v2")

# Add documents to the store
documents = [
    {"text": "The capital of France is Paris."},
    {"text": "The Eiffel Tower is located in Paris."}
]
document_store.add_documents(documents)

# Create a retriever
retriever = Retriever.from_documents(document_store)

# Create a generative LLM pipeline
generator = pipeline("text-generation", model="gpt2")

def generate_text(query):
    # Retrieve relevant documents
    docs = retriever.retrieve(query=query)

    # Generate text using the retrieved documents
    prompt = query + "\n\n" + "\n".join([doc["text"] for doc in docs])
    return generator(prompt, max_length=50, num_beams=4)[0]["generated_text"]

# Use the pipeline
print(generate_text("What is the Eiffel Tower?"))

This example demonstrates how to use Haystack to create a simple RAG pipeline. The retriever retrieves relevant documents from the document store, and the LLM generates text based on the retrieved information.

Created: 10/29/2024 (C. Lizárraga); Last update: 10/31/2024 (C. Lizárraga)

CC BY-NC-SA

UArizona DataLab, Data Science Institute, University of Arizona, 2024.