Semantic Search Retrieval with Google AQA - gunpal5/Google_GenerativeAI GitHub Wiki
This guide walks you through setting up semantic search retrieval using Google’s AQA model and the GenerativeAI and DocumentChunker libraries. It includes an introduction to the key concepts involved.
- Introduction
- Prerequisites
- Google Cloud Setup (One-Time)
- Project Setup (NuGet Packages)
- Authentication and Model Initialization
- Corpus Management
- Document and Chunking
- Question Answering
- RAG combines the power of large language models (LLMs) with information retrieval.
- Instead of just using the LLM’s internal knowledge, RAG systems first retrieve relevant information from a knowledge base (a corpus of documents).
- Then, they augment the LLM’s response generation with this retrieved information.
- This leads to:
- More accurate and factual answers.
- Up-to-date information (since you can update the corpus).
- Ability to answer questions about specific documents.
- Traditional search engines often rely on keyword matching.
- Semantic search goes deeper. It understands the meaning and context of both the search query and the documents.
- It uses techniques like embeddings (vector representations of text) to find documents that are semantically similar to the query, even if they don’t share the exact same keywords.
- This model is specifically designed for semantic search and question answering.
- It’s good at:
- Understanding the intent of a question.
- Finding the most relevant passages within a corpus.
- Providing an "Answerable Probability" score (confidence in the answer).
This tutorial shows you how to build a simple RAG system using Google AQA.
- C# Project: A new or existing .NET console application.
-
NuGet Packages:
- Google_GenerativeAI
- DocumentChunker
- Open the Google Cloud Console.
- Go to “IAM & Admin” → “Service accounts”.
- Create/Use a service account.
- Create and download a JSON key.
-
Environment Variable: Set
Google_Service_Account_Json
to the JSON key file’s full path.
In your C# project, install the necessary NuGet packages:
dotnet add package Google_GenerativeAI
dotnet add package DocumentChunker
Retrieve the path to your service account key.
string serviceAccountKeyPath = Environment.GetEnvironmentVariable("Google_Service_Account_Json")!;
if (string.IsNullOrEmpty(serviceAccountKeyPath))
{
throw new Exception("The Google_Service_Account_Json environment variable is not set.");
}
Create a GoogleServiceAccountAuthenticator
.
var authenticator = new GoogleServiceAccountAuthenticator(serviceAccountKeyPath);
Create a SemanticRetrieverModel
for AQA.
var retrieverModel = new SemanticRetrieverModel(
GoogleAIModels.Aqa,
EnvironmentVariables.GOOGLE_API_KEY,
authenticator: authenticator
);
Access the CorporaManager
.
var corporaManager = retrieverModel.CorporaManager;
This function handles finding or creating your corpus.
private static async Task<Corpus> GetOrCreateCorpusAsync(CorporaManager corporaManager, string displayName)
{
var corporaList = await corporaManager.ListCorporaAsync();
Corpus? existingCorpus = corporaList?.FirstOrDefault(c => c.DisplayName == displayName);
if (existingCorpus != null)
{
return existingCorpus;
}
else
{
return await corporaManager.CreateCorpusAsync(displayName);
}
}
Use the function to get your corpus.
string corpusDisplayName = "My Search Corpus";
Corpus? corpus = await GetOrCreateCorpusAsync(corporaManager, corpusDisplayName);
This function adds a document and splits it into manageable chunks.
private static async Task AddDocumentAndChunksAsync(
CorporaManager corporaManager,
string corpusName,
string contentUrl,
string documentName,
string author)
{
// Use DocumentChunker to split the text.
var chunker = new PlainTextDocumentChunker(new ChunkerConfig(500, ChunkType.Paragraph));
var document = await corporaManager.AddDocumentAsync(
corpusName,
documentName,
new List<CustomMetadata> { new CustomMetadata() { Key = "Author", StringValue = author } }
);
await foreach (var textParts in chunker.ExtractChunksInPartsFromUrlAsync(contentUrl, 100))
{
var chunks = textParts.Select(text => new Chunk() { Data = new ChunkData() { StringValue = text } }).ToList();
await corporaManager.AddChunksAsync(document.Name, chunks);
}
}
Add your document (from a URL in this example).
string documentUrl = "https://www.gutenberg.org/cache/epub/1184/pg1184.txt";
string documentDisplayName = "The Count of Monte Cristo";
string authorName = "Alexandre Dumas";
await AddDocumentAndChunksAsync(
corporaManager,
corpus.Name,
documentUrl,
documentDisplayName,
authorName
);
Create a ChatSession
for interacting with the corpus.
var chatSession = retrieverModel.CreateChatSession(corpus.Name, AnswerStyle.VERBOSE);
Provide your question.
string userQuestion = "What is Edmond Dantes imprisoned for?";
Get the response from the model.
var answerResponse = await chatSession.GenerateAnswerAsync(userQuestion);
Show the answer and its confidence level.
Console.WriteLine($"Question: {userQuestion}");
Console.WriteLine($"Answer: {answerResponse.GetAnswer()}");
Console.WriteLine($"Answerable Probability: {answerResponse.AnswerableProbability}");
using GenerativeAI;
using GenerativeAI.Authenticators;
using GenerativeAI.Clients;
using GenerativeAI.Types;
using DocumentChunker.Chunkers;
using DocumentChunker.Core;
using DocumentChunker.Enum;
public class Program
{
public static async Task Main(string[] args)
{
// --- 1. Authentication and Model Initialization ---
string serviceAccountKeyPath = Environment.GetEnvironmentVariable("Google_Service_Account_Json")!;
if (string.IsNullOrEmpty(serviceAccountKeyPath))
{
throw new Exception("The Google_Service_Account_Json environment variable is not set.");
}
var authenticator = new GoogleServiceAccountAuthenticator(serviceAccountKeyPath);
var retrieverModel = new SemanticRetrieverModel(GoogleAIModels.Aqa, EnvironmentVariables.GOOGLE_API_KEY, authenticator: authenticator);
// --- 2. Corpus Management ---
var corporaManager = retrieverModel.CorporaManager;
string corpusDisplayName = "My Search Corpus";
Corpus? corpus = await GetOrCreateCorpusAsync(corporaManager, corpusDisplayName);
// --- 3. Document and Chunking ---
string documentUrl = "https://www.gutenberg.org/cache/epub/1184/pg1184.txt"; // Example: Count of Monte Cristo
string documentDisplayName = "The Count of Monte Cristo";
string authorName = "Alexandre Dumas";
await AddDocumentAndChunksAsync(corporaManager, corpus.Name, documentUrl, documentDisplayName, authorName);
// --- 4. Question Answering ---
var chatSession = retrieverModel.CreateChatSession(corpus.Name, AnswerStyle.VERBOSE);
string userQuestion = "What is Edmond Dantes imprisoned for?";
var answerResponse = await chatSession.GenerateAnswerAsync(userQuestion);
Console.WriteLine($"Question: {userQuestion}");
Console.WriteLine($"Answer: {answerResponse.GetAnswer()}");
Console.WriteLine($"Answerable Probability: {answerResponse.AnswerableProbability}");
}
// --- Helper Functions ---
private static async Task<Corpus> GetOrCreateCorpusAsync(CorporaManager corporaManager, string displayName)
{
var corporaList = await corporaManager.ListCorporaAsync();
Corpus? existingCorpus = corporaList?.FirstOrDefault(c => c.DisplayName == displayName);
if (existingCorpus != null)
{
return existingCorpus;
}
else
{
return await corporaManager.CreateCorpusAsync(displayName);
}
}
private static async Task AddDocumentAndChunksAsync(CorporaManager corporaManager, string corpusName, string contentUrl, string documentName, string author)
{
var chunker = new PlainTextDocumentChunker(new ChunkerConfig(500, ChunkType.Paragraph));
var document = await corporaManager.AddDocumentAsync(corpusName, documentName,
new List<CustomMetadata> { new CustomMetadata() { Key = "Author", StringValue = author } });
await foreach (var textParts in chunker.ExtractChunksInPartsFromUrlAsync(contentUrl, 100))
{
var chunks = textParts.Select(text => new Chunk() { Data = new ChunkData() { StringValue = text } }).ToList();
await corporaManager.AddChunksAsync(document.Name, chunks);
}
}
}