Advanced RAG - trankhoidang/RAG-wiki GitHub Wiki
✏️ Page Contributors: Khoi Tran Dang
🕛 Creation date: 26/06/2024
📥 Last Update: 01/07/2024
RAG is continuously evolving, and many new methods are born from both the development communities and the research communities. Naive RAG refers to the initial and most basic methodology of RAG, that we discussed in Introduction to RAG. As a reminder, Naive RAG follows core processes of indexing, retrieval and generation. In literature, it can also be called Retrieve-Read framework.
Challenges in Retrieval - The retrieval process is not good enough?:
- Low precision: bad retrieval can lead to hallucinations or « mid-air drop».
- Low recall: fail to retrieve all relevant chunks, leading to not comprehensive responses.
Challenges in Augmentation & Generation - Once retrieved, how to combine query+context for better generation?:
- Bad integration of context to prompt can lead to lack-coherence output
- Redundancy and repetition of retrieved context
- What weight given to each retrieved passages
- Effect of different writing styles/tones on the output consistency
- Overly mimicking and not summarizing retrieved context
- Answer not relevant to retrieved context
- Potential toxicity and bias
Advanced RAG addresses Naive RAG limitations by introducing pre-retrieval, retrieval, and post-retrieval optimizations, as well as advanced augmentation and generation methods.
A comprehensive survey comparing techniques in Naive RAG and Advanced RAG: Retrieval-Augmented Generation for Large Language Models: A Survey | 2023 | Gao et al.
There are plenty amazing blogs writing about these techniques with great illustrations:
- How to improve RAG results in your LLM apps: from basics to advanced | by Guodong (Troy) Zhao | Bootcamp (uxdesign.cc)
- Advanced RAG Techniques: an Illustrated Overview | by IVAN ILIN | Towards AI
A summary of these methods and concepts is presented in the following tables.
Stage | Method Family | Objective | Methods |
---|---|---|---|
Pre-Retrieval | Data Preparation | Enhancing data granularity | Data Parsing Data Cleaning/Transformation Data Loading Knowledge Graph NebulaGraph Edge et al., 2024 |
Chunking Optimization | Breaking data into optimal chunks for downstream tasks | Five Levels of Chunking YouTube Video Other Chunking Strategies Adjusting Chunk/Overlap Size Theja, 2024 Metadata Attachment |
|
Embedding Optimization | Better data representation and alignment | Choosing Embedding Models MTEB Leaderboard Fine-tuning Embedding Models |
|
Indexing Structure | Creating efficient structures for data indexing and retrieval | Choosing Type of Indexing LlamaIndex Documentation Choosing Vector Database Chroma Weaviate Milvus PgVector |
|
Query Transformation | Modifying queries for improved retrieval performance | Rephrasing Hyde Gao et al., 2022 Query2doc Wang et al., 2023 StepBack Prompting Zheng et al., 2023 RQ-RAG Chan et al., 2024 |
|
Query Expansion | Expanding queries for better query understanding | Subqueries Zhou et al., 2022 Multi-queries Raudaschl, 2023 CoVe Dhuliawala et al., 2023 |
|
Query Routing | Routing to different data sources/pipelines based on the query intent | Logical routers Natural language routers |
|
Retrieval | Advanced Retrieval Strategies | Implementing sophisticated methods to improve retrieval | Hybrid Search Advanced RAG Illustration Sentence Window Advanced RAG Illustration Auto Merging Advanced RAG Illustration Hierarchical Index Advanced RAG Illustration Document summaries lookup Document Summary Index |
Metadata Filtering | Filtering data based on metadata to enhance relevance | Metadata Filtering | |
FT Retriever | Utilizing fine-tuned models for specific retrieval tasks | FT Retriever | |
Post-Retrieval | Reranking | Reordering retrieved items to prioritize relevance | Rule-based Reranking Cross Encoders Reranking Types Multi-vector Santhanam et al., 2021 Reranking Types LLMRerank Reranking Types |
Reordering | Adjusting the order of retrieved items within the context for better generation | Reverse LongContext Reordering Liu et al., 2024 |
|
Information Compression | Summarizing information for efficiency and eliminating redundancies | LLM Lingua Jiang et al., 2023 Selective Context Li, 2023 Recomp Xu et al., 2023 LLM Critique |
Advanced RAG methods: pre-retrieval, retrieval, post-retrieval
Stage | Method Family | Objective | Methods |
---|---|---|---|
Augmentation | Direct Concatenation | Concatenating retrieved chunks as the context | Concatenation |
Iterative Retrieval | Continuously retrieve-generate N-times for refinement | ITER-RETGEN Shao et al., 2023 | |
Adaptive Retrieval | Dynamically determine when to retrieve and when to return the response | Flare Jiang et al., 2023 Self RAG Asai et al., 2023 Corrective RAG Yan et al., 2024 |
|
Recursive Retrieval | Employing recursive techniques to deepen the retrieval process | ToC Kim et al., 2023 IRCoT Trivedi et al., 2022 |
|
Generation | Prompting | Guiding the language model to produce desired outputs | Standard Prompting Few Shot Prompting XoT Prompt Ding et al., 2023 |
Response Synthesis | Combining retrieved information into coherent responses | Compact LlamaIndex Documentation Refine LlamaIndex Documentation Tree-Summarize LlamaIndex Documentation |
|
Fine-Tuning LLM | Improving model performance through targeted adjustments | Supervised FT Reinforcement Learning Dual FT |
Advanced RAG methods: Augmentation and Generation
It is worth noting that some methods in Tables Pre-Retrieval, Retrieval, Post-Retrieval and Augmentation and Generation can span multiple stages but are placed in specific categories according to their primary objectives.
Additionally, there are techniques not included in the tables that provide overarching safety, performance, and reliability features. These techniques involve complex multi-stage processes or are broad frameworks rather than discrete methods:
- Agentic RAG: Uses LLM-powered knowledge workers, or agents, to manage and refine the retrieval-augmented generation process. These agents perform tasks such as retrieving relevant information, optimizing retrieval strategies, and interacting with external APIs. They dynamically ingest, process, and modify data, using reasoning loops to decide the sequence and parameters for tool usage, ensuring better overall performance and adaptability.
- Guardrails (e.g., NeMo-Guardrails NeMo Guardrails, Llama Guard Llama Guard): Implement safety measures to prevent harmful or unintended outputs, enhancing the reliability and safety of AI systems. These guardrails control the output of language models by avoiding sensitive topics, maintaining specific dialogue styles, categorizing and filtering content based on predefined criteria, and ensuring compliance with safety regulations.
- Citations: Various citation methods (inline, per sentence, per response, etc.) to indicate the source of supported claims within the generated text.
- Conversational memory: Various strategies to help the chatbot retain context from previous conversations, despite limitations in the context window or the LLM's capacity to handle large volumes of text. This can involve using short or deep memories, storing entire conversations or summarized versions, and extracting key entities.
- Interactive Retrieval: Implementing user feedback loops where the system iteratively refines its search based on user interactions.
- Monitoring: Continuously evaluating the system's performance, including the factual accuracy of answers, to ensure reliability and trustworthiness.
← Previous: Introduction to RAG