Cache‐augmented generation - chunhualiao/public-docs GitHub Wiki

Cache-augmented generation (CAG) is a technique in natural language processing where a model leverages a precomputed cache of knowledge or responses to enhance its output. The cache stores frequently accessed information, such as key-value pairs of queries and their corresponding answers or embeddings, allowing the model to quickly retrieve relevant data instead of generating it from scratch. This improves efficiency, reduces computational costs, and can enhance response quality by reusing high-quality, contextually relevant information.

Retrieval-augmented generation (RAG), on the other hand, involves a model retrieving relevant documents or data from an external knowledge base (e.g., a database or corpus) during generation to provide contextually grounded responses. RAG typically involves a retriever component that fetches information based on the input query, which is then used by the generator to produce the final output.

Relationship Between CAG and RAG

Similarities: Both CAG and RAG aim to improve the performance of language models by incorporating external information. They reduce the need for the model to generate everything from its internal parameters, instead relying on stored or retrieved knowledge to enhance accuracy and relevance.
Differences:
- Source of Information: CAG uses a precomputed cache, which is typically smaller, faster, and stores frequently used or preprocessed data. RAG retrieves information dynamically from a larger, often unstructured external knowledge base.
- Efficiency: CAG is generally more efficient because it avoids real-time retrieval, relying on cached data that’s readily accessible. RAG may involve slower retrieval steps, especially if searching through large datasets.
- Use Case: CAG is ideal for scenarios with repetitive queries or where specific knowledge is frequently needed, like in chatbots with common FAQs. RAG is better suited for open-domain tasks where diverse, context-specific information is required, such as answering novel questions based on a large corpus.
Complementary Nature: CAG can be seen as an optimization of RAG in certain contexts. For example, a system could cache frequently retrieved results from a RAG process to speed up future queries, effectively combining both approaches.

In summary, while RAG focuses on dynamically retrieving external knowledge, CAG emphasizes reusing precomputed, cached information for efficiency. They can work together, with caching potentially enhancing RAG systems by storing commonly retrieved results.