Introducing RAG - trankhoidang/RAG-wiki GitHub Wiki
✏️ Page Contributors: Khoi Tran Dang
🕛 Creation date: 19/06/2024
📥 Last Update: 05/07/2024
Retrieval-Augmented Generation (RAG) is an innovative method in Natural Language Processing that combines information retrieval and generative models. Fundamentally, RAG addresses the limitations of Large Language Models (LLMs) by leveraging knowledge from external databases. This approach improves accuracy, boost credibility and reliability by providing reference sources, and allows for continuous updates of information.
The term RAG was first introduced in:
A concise survey of RAG for LLMs can be found here:
A blog on RAG applications in businesses:
With the rise of ChatGPT and similar technologies, Large Language Models (LLMs) have gained significant attention in recent years. However, LLMs face several major challenges:
- Hallucinations - LLMs may generate confident but false responses, leading to misinformation.
- Outdated information - Knowledge of LLMs are fixed to the date of their last training.
- Domain-specific information or private information - During training, LLMs prioritize general knowledge, have limited access to sparse data from niche domains, and cannot access private documents.
- Limited Traceability/Verification - LLMs are often considered as "black-box" models due to its lack of transparency in the decision-making process.
- Long-Tail Distribution of Data - Rare or less common information in the training data is not well represented.
By leveraging fast and efficient access to external knowledge databases, RAG is a promising solution to:
- Adapt the LLMs to frequently updated knowledge and proprietary database.
- Facilitate LLMs verification and traceability (by providing citations and decision provenance).
- Reduce LLMs hallucinations and improve accuracy (by extracting and supplying LLMs with short and correct information).
To enhance the performance of LLMs, key approaches include:
- Prompt engineering
- Fine-tuning
- Retrieval-Augmented Generation (RAG).
Prompt engineering is typically the first step to improve LLM outputs. By providing clear instructions such as breaking complex tasks into simpler ones and adding few-shot examples, we offer an initial guide on how the LLM should act.
RAG, at its core, merges information retrieval with LLM prompting, providing a textbook (short-term memory) for the model to search for tailored answers. In contrast, fine-tuning resembles a student gradually absorbing knowledge to form long-term knowledge, making it more effective for mimicking specific structures, styles, or formats.
While fine-tuning can be a solution to adapt the LLMs to domain-specific or proprietary data, it is costly, lacks interpretability and is inflexible to frequent knowledge updates. In this case, RAG can be a solution, however, it is not without its flaws...
RAG and fine-tuning are not always independent and can be complementary. A quick overview of when to use each approach, along with their pros and cons, can be found here:

Simple application of RAG in question-answering, leveraging up-to-date information, compared to a simple LLM generation with knowledge cut-off.
A simple workflow of LLM:
-
User inputs a question.
-
Generation: LLM generate the answer based on the question.
Example of what LLM sees (prompt):
Given the question {question}, provide an answer. Answer:
A simple workflow of LLM+RAG:
-
User inputs a question
-
Retrieve: Chunks of documents from the external database that are relevant to the question are retrieved.
-
Augment: These chunks serve as context to augment the initial question.
-
Generation: LLM generate the answer based on the question AND the context.
Example of what LLM sees (prompt):
Context information is below. {context}. Given the information and not prior knowledge, answer the question {question}. Answer:
← Previous: