RAG - chunhualiao/public-docs GitHub Wiki
LOCAL rag based semantic search systems
- https://github.com/paperless-ngx/paperless-ngx paperless-ngx, NO embedding database!! it doesn't perform "semantic search" in the way you might expect from a modern AI search engine that understands natural language intent directly.
- https://github.com/neuml/txtai An embeddings database for semantic search and language model workflows, which can be run locally.
- AIWhispr: A privacy-focused semantic search tool that operates entirely offline, suitable for sensitive documents.
- Open Semantic Search: A comprehensive search engine that supports full-text and semantic search capabilities.
retrieval-augmented generation
Retrieval-Augmented Generation (RAG) for Knowledge Grounding
One of the most promising solutions for improving factual accuracy is Retrieval-Augmented Generation (RAG). In a RAG system, the language model is not left to rely solely on its internal memory; instead, it actively retrieves relevant information from external sources (documents, databases, the web) and uses that to formulate its output (What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs). In practice, this might mean that when writing a related work section, the AI will pull in snippets from actual papers and weave them (with proper citation) into the text, rather than hallucinating a summary.
RAG helps address hallucinations by grounding the generation in real data. As an NVIDIA explainer succinctly puts it: “Retrieval-augmented generation gives models sources they can cite, like footnotes in a research paper, so users can check any claims. That builds trust.” (What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs). By having the model present verifiable sources, the proposal writer (and eventually the reviewer) can trace statements back to evidence, much like we are doing in this report with citations. It also reduces the chance of the model making something up, since it is constrained to use retrieved passages. RAG has become quite feasible with existing tech stacks: for instance, using a vector database of papers (say, all relevant publications for your field) and employing embeddings, one can have a pipeline where the query (prompt) fetches the top-k relevant paragraphs which are then given to the LLM to condition its answer. OpenAI’s and Cohere’s APIs, as well as open frameworks like LangChain, provide recipes to implement RAG with just a few lines of code (What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs).
In the context of proposal writing, we are seeing tools incorporate RAG. For example, an AI assistant might be connected to Semantic Scholar or an internal repository of past proposals. If asked to write about the state of the art in a certain area, it will fetch key points from actual papers. This ensures that citations in the proposal are real (no fake references) and that specific factual claims (like "technology X reduces energy consumption by 20%") are supported by sources. The AcaWiki or other academic summary datasets could also be used for retrieval. Another benefit is currency: models have a knowledge cutoff (for GPT-4 it's late 2021, for example), but with retrieval, they can access up-to-date information such as very recent papers or current funding calls. This prevents the model from missing recent developments or, worse, proposing an idea that has already been done last year.
A concrete emerging example is the use of RAG in literature reviews. A 2023 paper by Aytar et al. introduced a RAG-based system for academic literature navigation that significantly improved relevance of retrieved info for data science research (A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science) (A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science). They integrated tools for parsing papers and fine-tuned embedding models to better fetch context. This kind of system could be directly applied when an AI is tasked with writing the background section of a proposal – it would retrieve the most relevant prior work and only then generate the background text, citing those works.
Challenges in RAG: While powerful, RAG is not without challenges. The retrieval component needs to be precise; irrelevant or low-quality sources can mislead the generation. There’s also the issue of integrating the retrieved text smoothly – models sometimes copy large chunks verbatim (raising plagiarism concerns) or misrepresent the source if they don’t adequately understand it. Ongoing research, however, is making strides: improved relevance scoring for retrieved passages, and using training or prompting strategies that encourage faithful summary of sources. Another interesting development is RAG with citation: training the model or designing the prompt such that it outputs not just text but also the reference keys or URLs for each fact (much as we are doing manually here). Some AI writing tools now automatically produce citations, which is a direct application of RAG. NVIDIA’s overview suggests that RAG may become a standard component of generative AI services because it’s often easier and cheaper than trying to train a truly all-knowing giant model (What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs).
In summary, retrieval-augmented generation is a key technique to make AI-written proposals trustworthy and well-grounded. It addresses the factuality challenge by bridging LLMs with the vast repositories of scientific knowledge in real time. As these systems mature, we expect proposal drafting AIs to routinely come with a built-in literature retrieval module, effectively serving as an AI librarian + writer combined.