Advanced RAG

✏️ Page Contributors: Khoi Tran Dang

🕛 Creation date: 26/06/2024

📥 Last Update: 01/07/2024

RAG is continuously evolving, and many new methods are born from both the development communities and the research communities. Naive RAG refers to the initial and most basic methodology of RAG, that we discussed in Introduction to RAG. As a reminder, Naive RAG follows core processes of indexing, retrieval and generation. In literature, it can also be called Retrieve-Read framework.

Naive RAG limitations

Challenges in Retrieval - The retrieval process is not good enough?:

Low precision: bad retrieval can lead to hallucinations or « mid-air drop».
Low recall: fail to retrieve all relevant chunks, leading to not comprehensive responses.

Challenges in Augmentation & Generation - Once retrieved, how to combine query+context for better generation?:

Bad integration of context to prompt can lead to lack-coherence output
Redundancy and repetition of retrieved context
What weight given to each retrieved passages
Effect of different writing styles/tones on the output consistency
Overly mimicking and not summarizing retrieved context
Answer not relevant to retrieved context
Potential toxicity and bias

How to improve Naive RAG - Advanced RAG?

Advanced RAG addresses Naive RAG limitations by introducing pre-retrieval, retrieval, and post-retrieval optimizations, as well as advanced augmentation and generation methods.

A comprehensive survey comparing techniques in Naive RAG and Advanced RAG: Retrieval-Augmented Generation for Large Language Models: A Survey | 2023 | Gao et al.

Overview of different advanced RAG techniques

There are plenty amazing blogs writing about these techniques with great illustrations:

A summary of these methods and concepts is presented in the following tables.

Stage	Method Family	Objective	Methods
Pre-Retrieval	Data Preparation	Enhancing data granularity	Data Parsing Data Cleaning/Transformation Data Loading Knowledge Graph NebulaGraph Edge et al., 2024
	Chunking Optimization	Breaking data into optimal chunks for downstream tasks	Five Levels of Chunking YouTube Video Other Chunking Strategies Adjusting Chunk/Overlap Size Theja, 2024 Metadata Attachment
	Embedding Optimization	Better data representation and alignment	Choosing Embedding Models MTEB Leaderboard Fine-tuning Embedding Models
	Indexing Structure	Creating efficient structures for data indexing and retrieval	Choosing Type of Indexing LlamaIndex Documentation Choosing Vector Database Chroma Weaviate Milvus PgVector
	Query Transformation	Modifying queries for improved retrieval performance	Rephrasing Hyde Gao et al., 2022 Query2doc Wang et al., 2023 StepBack Prompting Zheng et al., 2023 RQ-RAG Chan et al., 2024
	Query Expansion	Expanding queries for better query understanding	Subqueries Zhou et al., 2022 Multi-queries Raudaschl, 2023 CoVe Dhuliawala et al., 2023
	Query Routing	Routing to different data sources/pipelines based on the query intent	Logical routers Natural language routers
Retrieval	Advanced Retrieval Strategies	Implementing sophisticated methods to improve retrieval	Hybrid Search Advanced RAG Illustration Sentence Window Advanced RAG Illustration Auto Merging Advanced RAG Illustration Hierarchical Index Advanced RAG Illustration Document summaries lookup Document Summary Index
	Metadata Filtering	Filtering data based on metadata to enhance relevance	Metadata Filtering
	FT Retriever	Utilizing fine-tuned models for specific retrieval tasks	FT Retriever
Post-Retrieval	Reranking	Reordering retrieved items to prioritize relevance	Rule-based Reranking Cross Encoders Reranking Types Multi-vector Santhanam et al., 2021 Reranking Types LLMRerank Reranking Types
	Reordering	Adjusting the order of retrieved items within the context for better generation	Reverse LongContext Reordering Liu et al., 2024
	Information Compression	Summarizing information for efficiency and eliminating redundancies	LLM Lingua Jiang et al., 2023 Selective Context Li, 2023 Recomp Xu et al., 2023 LLM Critique

Advanced RAG methods: pre-retrieval, retrieval, post-retrieval

Stage	Method Family	Objective	Methods
Augmentation	Direct Concatenation	Concatenating retrieved chunks as the context	Concatenation
	Iterative Retrieval	Continuously retrieve-generate N-times for refinement	ITER-RETGEN Shao et al., 2023
	Adaptive Retrieval	Dynamically determine when to retrieve and when to return the response	Flare Jiang et al., 2023 Self RAG Asai et al., 2023 Corrective RAG Yan et al., 2024
	Recursive Retrieval	Employing recursive techniques to deepen the retrieval process	ToC Kim et al., 2023 IRCoT Trivedi et al., 2022
Generation	Prompting	Guiding the language model to produce desired outputs	Standard Prompting Few Shot Prompting XoT Prompt Ding et al., 2023
	Response Synthesis	Combining retrieved information into coherent responses	Compact LlamaIndex Documentation Refine LlamaIndex Documentation Tree-Summarize LlamaIndex Documentation
	Fine-Tuning LLM	Improving model performance through targeted adjustments	Supervised FT Reinforcement Learning Dual FT

Advanced RAG methods: Augmentation and Generation

It is worth noting that some methods in Tables Pre-Retrieval, Retrieval, Post-Retrieval and Augmentation and Generation can span multiple stages but are placed in specific categories according to their primary objectives.

Additionally, there are techniques not included in the tables that provide overarching safety, performance, and reliability features. These techniques involve complex multi-stage processes or are broad frameworks rather than discrete methods:

Agentic RAG: Uses LLM-powered knowledge workers, or agents, to manage and refine the retrieval-augmented generation process. These agents perform tasks such as retrieving relevant information, optimizing retrieval strategies, and interacting with external APIs. They dynamically ingest, process, and modify data, using reasoning loops to decide the sequence and parameters for tool usage, ensuring better overall performance and adaptability.
Guardrails (e.g., NeMo-Guardrails NeMo Guardrails, Llama Guard Llama Guard): Implement safety measures to prevent harmful or unintended outputs, enhancing the reliability and safety of AI systems. These guardrails control the output of language models by avoiding sensitive topics, maintaining specific dialogue styles, categorizing and filtering content based on predefined criteria, and ensuring compliance with safety regulations.
Citations: Various citation methods (inline, per sentence, per response, etc.) to indicate the source of supported claims within the generated text.
Conversational memory: Various strategies to help the chatbot retain context from previous conversations, despite limitations in the context window or the LLM's capacity to handle large volumes of text. This can involve using short or deep memories, storing entire conversations or summarized versions, and extracting key entities.
Interactive Retrieval: Implementing user feedback loops where the system iteratively refines its search based on user interactions.
Monitoring: Continuously evaluating the system's performance, including the factual accuracy of answers, to ensure reliability and trustworthiness.

Further read

Eibich, Matouš, Shivay Nagpal, and Alexander Fred-Ojala. "ARAGOG: Advanced RAG Output Grading." arXiv preprint arXiv:2404.01037 (2024).

← Previous: Introduction to RAG

Advanced RAG - trankhoidang/RAG-wiki GitHub Wiki

Advanced RAG

Naive RAG limitations

How to improve Naive RAG - Advanced RAG?

Overview of different advanced RAG techniques

Further read

⚠️ GitHub.com Fallback ⚠️

Advanced RAG - trankhoidang/RAG-wiki GitHub Wiki

Advanced RAG

Naive RAG limitations

How to improve Naive RAG - Advanced RAG?

Overview of different advanced RAG techniques

Further read

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️