Document Client - gunpal5/Google_GenerativeAI GitHub Wiki

Introduction

The DocumentsClient provides methods for interacting with the Gemini API's Documents endpoint. This allows you to create, manage, and query documents within a corpus. Documents are the individual units of content within a corpus, containing the text that will be used for semantic search.

Details

The DocumentsClient offers the following functionalities:

Creating a Document

The CreateDocumentAsync method creates a new document within a specified corpus.

using GenerativeAI.Clients;
using GenerativeAI.Types;

// ... other code ...

var documentsClient = new DocumentsClient(platform, httpClient, logger); // Initialize DocumentsClient

var parentCorpus = "corpora/my-corpus-id"; // Replace with the parent corpus name

var document = new Document
{
    DisplayName = "My Sample Document",
    CustomMetadata = new List<CustomMetadata>{new CustomMetadata(){Key="my key", StringValue = "This is a test document" }} 
    // ... other document properties ...
};

var createdDocument = await documentsClient.CreateDocumentAsync(parentCorpus, document);

if (createdDocument != null)
{
    Console.WriteLine($"Document created: {createdDocument.Name}");
}
else
{
    Console.WriteLine("Failed to create document.");
}

Querying a Document

The QueryDocumentAsync method performs semantic search within a specific document.

using GenerativeAI.Clients;
using GenerativeAI.Types;

// ... other code ...

var documentsClient = new DocumentsClient(platform, httpClient, logger); // Initialize DocumentsClient

var documentName = "corpora/my-corpus-id/documents/my-document-id"; // Replace with the document name

var queryDocumentRequest = new QueryDocumentRequest
{
    Query = "What is mentioned about topic X in this document?",
    // ... other query parameters ...
};

var queryDocumentResponse = await documentsClient.QueryDocumentAsync(documentName, queryDocumentRequest);

if (queryDocumentResponse != null && queryDocumentResponse.RelevantChunks != null)
{
    foreach (var chunk in queryDocumentResponse.RelevantChunks)
    {
        Console.WriteLine($"Relevant Chunk: {chunk.ChunkData.Text}");
    }
}
else
{
    Console.WriteLine("No relevant chunks found.");
}

Listing Documents

The ListDocumentsAsync method retrieves a list of documents within a corpus.

using GenerativeAI.Clients;
using GenerativeAI.Types;

// ... other code ...

var documentsClient = new DocumentsClient(platform, httpClient, logger); // Initialize DocumentsClient

var parentCorpus = "corpora/my-corpus-id"; // Replace with the parent corpus name

var listDocumentsResponse = await documentsClient.ListDocumentsAsync(parentCorpus); // You can provide pageSize and pageToken

if (listDocumentsResponse != null && listDocumentsResponse.Documents != null)
{
    foreach (var document in listDocumentsResponse.Documents)
    {
        Console.WriteLine($"Document Name: {document.Name}");
    }
}
else
{
    Console.WriteLine("No documents found.");
}

Getting a Document

The GetDocumentAsync method retrieves a specific document by name.

using GenerativeAI.Clients;
using GenerativeAI.Types;

// ... other code ...

var documentsClient = new DocumentsClient(platform, httpClient, logger); // Initialize DocumentsClient

var documentName = "corpora/my-corpus-id/documents/my-document-id"; // Replace with the document name

var document = await documentsClient.GetDocumentAsync(documentName);

if (document != null)
{
    Console.WriteLine($"Document Display Name: {document.DisplayName}");
    Console.WriteLine($"Document Content: {document.Content?.Text}");
}
else
{
    Console.WriteLine("Document not found.");
}

Updating a Document

The UpdateDocumentAsync method updates an existing document.

using GenerativeAI.Clients;
using GenerativeAI.Types;

// ... other code ...

var documentsClient = new DocumentsClient(platform, httpClient, logger); // Initialize DocumentsClient

var documentName = "corpora/my-corpus-id/documents/my-document-id"; // Replace with the document name

var updatedDocument = new Document
{
    Name = documentName, // Important: Include the name in the updated document object.
    DisplayName = "My Updated Document Name",
    CustomMetadata = new List<CustomMetadata>{new CustomMetadata(){Key="my key", StringValue = "This is a test document updated" }} # 
    // ... other updated properties ...
};

string updateMask = "displayName,content"; // Specify the fields to update

var resultDocument = await documentsClient.UpdateDocumentAsync(documentName, updatedDocument, updateMask);

if (resultDocument != null)
{
    Console.WriteLine($"Document updated: {resultDocument.DisplayName}");
}
else
{
    Console.WriteLine("Failed to update document.");
}

Deleting a Document

The DeleteDocumentAsync method deletes a document.

using GenerativeAI.Clients;

// ... other code ...

var documentsClient = new DocumentsClient(platform, httpClient, logger); // Initialize DocumentsClient

var documentName = "corpora/my-corpus-id/documents/my-document-id"; // Replace with the document name

await documentsClient.DeleteDocumentAsync(documentName); // You can optionally set force to true

Console.WriteLine($"Document deleted: {documentName}");

Important Considerations

  • Ensure proper authorization is configured before using the DocumentsClient. See the Authentication page.
  • Replace placeholder document names, IDs, and corpus names with actual values.
  • Handle potential exceptions during API calls.
  • Be mindful of rate limits when making frequent requests. See the official documentation for details.
  • The updateMask parameter in UpdateDocumentAsync is crucial. It specifies which fields of the Document object should be updated. Only the fields listed in the updateMask will be modified.

API Reference