Advanced RAG - trankhoidang/RAG-wiki GitHub Wiki

Advanced RAG

✏️ Page Contributors: Khoi Tran Dang

🕛 Creation date: 26/06/2024

📥 Last Update: 01/07/2024

RAG is continuously evolving, and many new methods are born from both the development communities and the research communities. Naive RAG refers to the initial and most basic methodology of RAG, that we discussed in Introduction to RAG. As a reminder, Naive RAG follows core processes of indexing, retrieval and generation. In literature, it can also be called Retrieve-Read framework.

Naive RAG limitations

Challenges in Retrieval - The retrieval process is not good enough?:

  • Low precision: bad retrieval can lead to hallucinations or « mid-air drop».
  • Low recall: fail to retrieve all relevant chunks, leading to not comprehensive responses.

Challenges in Augmentation & Generation - Once retrieved, how to combine query+context for better generation?:

  • Bad integration of context to prompt can lead to lack-coherence output
  • Redundancy and repetition of retrieved context
  • What weight given to each retrieved passages
  • Effect of different writing styles/tones on the output consistency
  • Overly mimicking and not summarizing retrieved context
  • Answer not relevant to retrieved context
  • Potential toxicity and bias

How to improve Naive RAG - Advanced RAG?

Advanced RAG addresses Naive RAG limitations by introducing pre-retrieval, retrieval, and post-retrieval optimizations, as well as advanced augmentation and generation methods.

A comprehensive survey comparing techniques in Naive RAG and Advanced RAG: Retrieval-Augmented Generation for Large Language Models: A Survey | 2023 | Gao et al.

Overview of different advanced RAG techniques

There are plenty amazing blogs writing about these techniques with great illustrations:

A summary of these methods and concepts is presented in the following tables.

Stage Method Family Objective Methods
Pre-Retrieval Data Preparation Enhancing data granularity Data Parsing
Data Cleaning/Transformation
Data Loading
Knowledge Graph NebulaGraph Edge et al., 2024
Chunking Optimization Breaking data into optimal chunks for downstream tasks Five Levels of Chunking YouTube Video
Other Chunking Strategies
Adjusting Chunk/Overlap Size Theja, 2024
Metadata Attachment
Embedding Optimization Better data representation and alignment Choosing Embedding Models MTEB Leaderboard
Fine-tuning Embedding Models
Indexing Structure Creating efficient structures for data indexing and retrieval Choosing Type of Indexing LlamaIndex Documentation
Choosing Vector Database Chroma Weaviate Milvus PgVector
Query Transformation Modifying queries for improved retrieval performance Rephrasing
Hyde Gao et al., 2022
Query2doc Wang et al., 2023
StepBack Prompting Zheng et al., 2023
RQ-RAG Chan et al., 2024
Query Expansion Expanding queries for better query understanding Subqueries Zhou et al., 2022
Multi-queries Raudaschl, 2023
CoVe Dhuliawala et al., 2023
Query Routing Routing to different data sources/pipelines based on the query intent Logical routers
Natural language routers
Retrieval Advanced Retrieval Strategies Implementing sophisticated methods to improve retrieval Hybrid Search Advanced RAG Illustration
Sentence Window Advanced RAG Illustration
Auto Merging Advanced RAG Illustration
Hierarchical Index Advanced RAG Illustration
Document summaries lookup Document Summary Index
Metadata Filtering Filtering data based on metadata to enhance relevance Metadata Filtering
FT Retriever Utilizing fine-tuned models for specific retrieval tasks FT Retriever
Post-Retrieval Reranking Reordering retrieved items to prioritize relevance Rule-based Reranking
Cross Encoders Reranking Types
Multi-vector Santhanam et al., 2021 Reranking Types
LLMRerank Reranking Types
Reordering Adjusting the order of retrieved items within the context for better generation Reverse
LongContext Reordering Liu et al., 2024
Information Compression Summarizing information for efficiency and eliminating redundancies LLM Lingua Jiang et al., 2023
Selective Context Li, 2023
Recomp Xu et al., 2023
LLM Critique

Advanced RAG methods: pre-retrieval, retrieval, post-retrieval

Stage Method Family Objective Methods
Augmentation Direct Concatenation Concatenating retrieved chunks as the context Concatenation
Iterative Retrieval Continuously retrieve-generate N-times for refinement ITER-RETGEN Shao et al., 2023
Adaptive Retrieval Dynamically determine when to retrieve and when to return the response Flare Jiang et al., 2023
Self RAG Asai et al., 2023
Corrective RAG Yan et al., 2024
Recursive Retrieval Employing recursive techniques to deepen the retrieval process ToC Kim et al., 2023
IRCoT Trivedi et al., 2022
Generation Prompting Guiding the language model to produce desired outputs Standard Prompting
Few Shot Prompting
XoT Prompt Ding et al., 2023
Response Synthesis Combining retrieved information into coherent responses Compact LlamaIndex Documentation
Refine LlamaIndex Documentation
Tree-Summarize LlamaIndex Documentation
Fine-Tuning LLM Improving model performance through targeted adjustments Supervised FT
Reinforcement Learning
Dual FT

Advanced RAG methods: Augmentation and Generation

It is worth noting that some methods in Tables Pre-Retrieval, Retrieval, Post-Retrieval and Augmentation and Generation can span multiple stages but are placed in specific categories according to their primary objectives.

Additionally, there are techniques not included in the tables that provide overarching safety, performance, and reliability features. These techniques involve complex multi-stage processes or are broad frameworks rather than discrete methods:

  • Agentic RAG: Uses LLM-powered knowledge workers, or agents, to manage and refine the retrieval-augmented generation process. These agents perform tasks such as retrieving relevant information, optimizing retrieval strategies, and interacting with external APIs. They dynamically ingest, process, and modify data, using reasoning loops to decide the sequence and parameters for tool usage, ensuring better overall performance and adaptability.
  • Guardrails (e.g., NeMo-Guardrails NeMo Guardrails, Llama Guard Llama Guard): Implement safety measures to prevent harmful or unintended outputs, enhancing the reliability and safety of AI systems. These guardrails control the output of language models by avoiding sensitive topics, maintaining specific dialogue styles, categorizing and filtering content based on predefined criteria, and ensuring compliance with safety regulations.
  • Citations: Various citation methods (inline, per sentence, per response, etc.) to indicate the source of supported claims within the generated text.
  • Conversational memory: Various strategies to help the chatbot retain context from previous conversations, despite limitations in the context window or the LLM's capacity to handle large volumes of text. This can involve using short or deep memories, storing entire conversations or summarized versions, and extracting key entities.
  • Interactive Retrieval: Implementing user feedback loops where the system iteratively refines its search based on user interactions.
  • Monitoring: Continuously evaluating the system's performance, including the factual accuracy of answers, to ensure reliability and trustworthiness.

Further read

← Previous: Introduction to RAG

Next: →

⚠️ **GitHub.com Fallback** ⚠️