Home - MakerCorn/nbedr GitHub Wiki

nBedR - RAG Embedding Toolkit

A powerful tool for creating and managing embedding databases for Retrieval Augmented Generation (RAG) applications.

Overview

What Are Embeddings and Why Do They Matter?

Imagine you want to teach a computer to understand the meaning of words, sentences, and documents - not just the letters and spelling, but the actual meaning behind them. This is where embeddings come in.

Embeddings are like a universal translator for computers. They convert human language (text) into numbers that computers can understand and compare. Think of it like giving every piece of text a unique "fingerprint" made of numbers that captures its meaning.

Here's a simple analogy: If you were organizing books in a library, you wouldn't just sort them alphabetically. You'd group books by topic - science books near other science books, cooking books with other cooking books, etc. Embeddings do something similar but for any text - they help computers understand which pieces of text are similar in meaning, even if they use completely different words.

How Embeddings Power Modern AI Applications

In Generative AI (GenAI) applications, embeddings are the secret sauce that makes AI systems truly intelligent. Here's how:

Understanding Context: When you ask an AI assistant a question, embeddings help it understand what you're really asking about, not just match keywords.
Finding Relevant Information: In Retrieval Augmented Generation (RAG) systems, embeddings help AI find the most relevant documents or passages to answer your questions, just like a very smart librarian who instantly knows which books contain information related to your question.
Maintaining Consistency: Embeddings ensure that AI responses are grounded in actual facts from your documents, rather than hallucinating information.

The RAG Process Explained Simply

graph LR
    A[Your Question] --> B[Find Similar Content]
    B --> C[Retrieved Documents]
    C --> D[AI Generates Answer]
    D --> E[Factual Response]
    
    F[Your Documents] --> G[Break into Chunks]
    G --> H[Create Embeddings]
    H --> I[Store in Database]
    I --> B

Preparation Phase: Your documents are broken into smaller pieces (chunks) and converted into embeddings
Question Phase: When you ask a question, the system finds the most relevant chunks using embeddings
Answer Phase: AI uses those relevant chunks to generate an accurate, factual answer

This application handles the preparation phase - the crucial foundation that makes everything else possible.

Why This Application Matters

This tool provides a streamlined solution for processing documents, generating embeddings, and storing them in vector databases. Built on the foundation of the excellent RAFT toolkit, it focuses specifically on embedding operations and vector database management - essentially preparing your documents so they can be used effectively in AI applications.

Architecture

graph TD
    A[Document Sources] --> B[Document Service]
    B --> C[Text Chunking]
    C --> D[Embedding Generation]
    D --> E[Vector Database]
    
    A1[Local Files] --> A
    A2[S3 Bucket] --> A
    A3[SharePoint] --> A
    
    E1[FAISS] --> E
    E2[Pinecone] --> E
    E3[ChromaDB] --> E
    E4[Azure AI Search] --> E
    E5[AWS Elasticsearch] --> E
    E6[PGVector] --> E
    
    F[Configuration] --> B
    G[Rate Limiter] --> D

Features

🚀 Core Capabilities

Multi-format Document Processing: PDF, TXT, JSON, PPTX, and more
Advanced Chunking Strategies: Semantic, fixed-size, and sentence-aware chunking
Multiple Vector Databases: FAISS, Pinecone, and ChromaDB support
Parallel Processing: Efficient batch processing with configurable workers and multi-instance coordination
Rate Limiting: Smart API rate limiting to prevent quota exhaustion
Cloud Storage Integration: S3, Azure Blob, and SharePoint support

🔧 Document Sources

Local Files: Process files from local directories
Amazon S3: Direct integration with S3 buckets
SharePoint: Microsoft SharePoint document libraries
Batch Processing: Handle large document collections efficiently

🎯 Vector Database Support

FAISS: High-performance similarity search and clustering
Pinecone: Managed vector database service
ChromaDB: Open-source embedding database
Azure AI Search: Microsoft's enterprise search service with vector capabilities
AWS Elasticsearch: Amazon's managed Elasticsearch with vector search support
PGVector: PostgreSQL with pgvector extension for vector operations

Quick Start

System Requirements

Python 3.11 or higher (Python 3.9 and 3.10 are no longer supported)
4GB+ RAM recommended for large document processing
Sufficient disk space for vector databases (varies by collection size)

Installation

# Ensure you're using Python 3.11 or higher
python3.11 --version  # Should show 3.11.x or higher

# Install from PyPI
pip install nbedr

# Or install from source
git clone https://github.com/your-org/nbedr.git
cd nbedr
pip install -e .

Basic Usage

Set up your environment variables:

# Choose your embedding provider
export EMBEDDING_PROVIDER="openai"
export OPENAI_API_KEY="your-api-key-here"

# Choose your vector database
export VECTOR_DATABASE_TYPE="faiss"

Process documents and create embeddings:

# Process local documents
nbedr create-embeddings \
    --source local \
    --local-path ./documents \
    --output-path ./embeddings

# Process S3 documents
nbedr create-embeddings \
    --source s3 \
    --s3-bucket my-documents \
    --output-path ./embeddings

Search your embeddings:

nbedr search \
    --query "What is machine learning?" \
    --embeddings-path ./embeddings \
    --top-k 5

Configuration

Basic configuration is handled through environment variables. Here are the essential settings:

Core Settings

# Embedding Provider
export EMBEDDING_PROVIDER="openai"  # openai, azure_openai, aws_bedrock, etc.
export OPENAI_API_KEY="your-api-key-here"
export EMBEDDING_MODEL="text-embedding-3-large"
export EMBEDDING_DIMENSIONS=1536

# Vector Database
export VECTOR_DATABASE_TYPE="faiss"  # faiss, pinecone, chromadb, etc.
export FAISS_INDEX_PATH="./embeddings_db"

# Document Processing
export CHUNK_SIZE=512
export CHUNKING_STRATEGY="semantic"  # semantic, fixed_size, sentence_aware
export BATCH_SIZE=10
export MAX_WORKERS=4

Quick Provider Setup

OpenAI:

export OPENAI_API_KEY="your-api-key"
export OPENAI_ORGANIZATION="your-org-id"  # Optional

Pinecone:

export PINECONE_API_KEY="your-api-key"
export PINECONE_ENVIRONMENT="your-environment"
export PINECONE_INDEX_NAME="your-index"

ChromaDB:

export CHROMA_HOST="localhost"
export CHROMA_PORT=8000

For detailed configuration options, advanced settings, and comprehensive setup guides, see the Advanced Configuration section below.

Advanced Configuration

This section covers advanced configuration options for production deployments, performance optimization, and specialized use cases.

Rate Limiting Configuration

Rate limiting is configured separately for embedding providers and vector databases to prevent API quota exhaustion and optimize performance:

Embedding Providers:

# Enable rate limiting for embedding providers
RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=sliding_window
RATE_LIMIT_REQUESTS_PER_MINUTE=500
RATE_LIMIT_TOKENS_PER_MINUTE=350000
RATE_LIMIT_MAX_BURST=100

Vector Databases:

# Enable rate limiting for vector store operations
VECTOR_STORE_RATE_LIMIT_ENABLED=true
VECTOR_STORE_RATE_LIMIT_STRATEGY=sliding_window
VECTOR_STORE_RATE_LIMIT_REQUESTS_PER_MINUTE=300
VECTOR_STORE_RATE_LIMIT_MAX_BURST=50

Pre-configured Rate Limit Presets

nBedR includes optimized presets for popular services:

Service	Preset	RPM	TPM	Strategy	Description
OpenAI Tier 1	`openai_embeddings_tier1`	500	350,000	sliding_window	Standard OpenAI limits
OpenAI Tier 2	`openai_embeddings_tier2`	5,000	2,000,000	sliding_window	Higher tier limits
Azure OpenAI	`azure_openai_standard`	120	240,000	sliding_window	Standard deployment
AWS Bedrock	`aws_bedrock_titan`	2,000	400,000	sliding_window	Titan embedding limits
Google Vertex	`google_vertex_gecko`	600	1,000,000	sliding_window	Gecko model limits
Local Providers	`local_providers`	1,000	N/A	sliding_window	Conservative local limits

Best Practices for Rate Limiting Configuration

For Production Workloads:

Use sliding_window strategy for accuracy
Set rates to 80% of your actual limits
Enable burst handling for peak loads
Monitor rate limit statistics regularly

For Development:

Use conservative preset for safety
Enable detailed logging for debugging
Test with small document sets first

For High-Volume Processing:

Use adaptive strategy for auto-tuning
Configure multiple workers with shared rate limits
Monitor response times and adjust accordingly

Cost Optimization:

Set token limits to control embedding costs
Use local providers for development
Batch documents efficiently to minimize API calls

Advanced Configuration

This section covers advanced configuration options for power users who need fine-grained control over nBedR's behavior, performance tuning, and enterprise deployment scenarios.

Advanced Configuration Topics

Rate Limiting Configuration

Rate Limiting Configuration Examples

Conservative Setup (Safe for Testing):

RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=sliding_window
RATE_LIMIT_REQUESTS_PER_MINUTE=60
RATE_LIMIT_TOKENS_PER_MINUTE=50000
RATE_LIMIT_MAX_BURST=10

Production Setup (OpenAI Tier 1):

RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=sliding_window
RATE_LIMIT_REQUESTS_PER_MINUTE=400  # 80% of 500 limit
RATE_LIMIT_TOKENS_PER_MINUTE=280000  # 80% of 350k limit
RATE_LIMIT_MAX_BURST=80
RATE_LIMIT_TARGET_RESPONSE_TIME=2.0

High-Volume Setup (Adaptive):

RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=adaptive
RATE_LIMIT_REQUESTS_PER_MINUTE=1000
RATE_LIMIT_TOKENS_PER_MINUTE=800000
RATE_LIMIT_TARGET_RESPONSE_TIME=1.5
RATE_LIMIT_MAX_RESPONSE_TIME=5.0

Monitoring Rate Limiting

nBedR provides detailed rate limiting statistics accessible through the application:

Total Requests: Number of API calls made
Rate Limit Hits: How many times rate limiting was applied
Average Response Time: Performance monitoring
Current Rate: Real-time rate limiting status
Wait Time: Total time spent waiting due to rate limits

Use these metrics to optimize your rate limiting configuration for your specific workload and API tier.

Parallel Processing and Multi-Instance Deployment

nBedR supports running multiple instances in parallel to dramatically speed up document processing for large datasets. The application includes sophisticated coordination mechanisms to prevent conflicts and ensure safe concurrent operation.

Why Run Multiple Instances?

When processing thousands of documents, a single instance can become a bottleneck. Multiple instances provide:

Faster Processing: Parallel document processing across multiple CPU cores
Higher Throughput: Multiple embedding API calls running simultaneously
Fault Tolerance: If one instance fails, others continue processing
Resource Utilization: Better utilization of available CPU, memory, and network bandwidth

Instance Coordination System

nBedR automatically coordinates multiple instances to prevent conflicts:

Conflict Detection:

Detects when multiple instances would write to the same output paths
Prevents concurrent access to the same vector database files
Validates configuration compatibility between instances

Automatic Path Separation:

Generates instance-specific output directories
Creates separate vector database paths for each instance
Ensures no file conflicts between concurrent instances

Resource Coordination:

Distributes rate limits fairly across all running instances
Coordinates API quota usage to prevent rate limit violations
Shares performance metrics for optimal load balancing

Running Multiple Instances

Basic Parallel Deployment:

# Terminal 1 - Instance 1
nbedr create-embeddings --datapath ./docs1 --output ./output1

# Terminal 2 - Instance 2  
nbedr create-embeddings --datapath ./docs2 --output ./output2

# Terminal 3 - Instance 3
nbedr create-embeddings --datapath ./docs3 --output ./output3

Shared Dataset Processing:

# All instances process the same dataset with automatic coordination
# Instance paths are automatically separated

# Terminal 1
nbedr create-embeddings --datapath ./large_dataset

# Terminal 2
nbedr create-embeddings --datapath ./large_dataset

# Terminal 3
nbedr create-embeddings --datapath ./large_dataset

Custom Instance Configuration:

# Disable coordination for specific use cases
nbedr create-embeddings --disable-coordination --datapath ./docs

# List all active instances
nbedr create-embeddings --list-instances

# Use specific instance ID
nbedr create-embeddings --instance-id my-custom-instance --datapath ./docs

Instance Management

Monitor Active Instances:

# List all running instances
nbedr create-embeddings --list-instances

Environment Variables for Coordination:

# Disable coordination system
NBEDR_DISABLE_COORDINATION=true

# Custom coordination directory
NBEDR_COORDINATION_DIR=/tmp/nbedr_coordination

# Instance heartbeat interval (seconds)
NBEDR_HEARTBEAT_INTERVAL=60

Rate Limiting with Multiple Instances

When multiple instances run simultaneously, rate limits are automatically distributed:

Single Instance:

500 requests per minute → 500 RPM for the instance

Three Instances:

500 requests per minute → 166 RPM per instance (500/3)
Prevents collective rate limit violations
Ensures fair resource distribution

Manual Rate Limit Override:

# Set per-instance rate limits manually
RATE_LIMIT_REQUESTS_PER_MINUTE=100
RATE_LIMIT_TOKENS_PER_MINUTE=50000

Best Practices for Parallel Processing

Data Organization:

Split large datasets into balanced chunks for each instance
Use different source directories to avoid file locking conflicts
Consider document types and sizes when distributing work

Resource Planning:

Monitor CPU usage - optimal is typically 2-4 instances per CPU core
Watch memory consumption - each instance loads its own models
Consider network bandwidth for API-heavy operations

Error Handling:

Each instance fails independently without affecting others
Use consistent configuration across all instances
Monitor logs from all instances for comprehensive debugging

Production Deployment:

# Use process managers like systemd or supervisor
systemctl start nbedr-instance-1
systemctl start nbedr-instance-2
systemctl start nbedr-instance-3

# Or container orchestration
docker run -d nbedr:latest --datapath /data/batch1
docker run -d nbedr:latest --datapath /data/batch2
docker run -d nbedr:latest --datapath /data/batch3

Troubleshooting Parallel Execution

Common Issues:

Path Conflicts

# Error: Multiple instances writing to same path
# Solution: Use automatic coordination or specify different paths

Rate Limit Violations

# Error: Combined instances exceed API limits
# Solution: Reduce per-instance rate limits or number of instances

Vector Database Locks

# Error: FAISS index file locked
# Solution: Ensure each instance uses separate index paths

Debugging Commands:

# Check active instances
nbedr create-embeddings --list-instances

# View coordination logs
tail -f /tmp/nbedr_coordination/coordination.log

# Test configuration without running
nbedr create-embeddings --validate --datapath ./docs

Detailed Embedding Provider Configurations

For comprehensive configuration options for all 7 embedding providers, see the Embedding Providers section below.

Advanced Vector Database Configurations

For detailed vector database configuration options and selection guidance, see the Vector Databases section below.

Advanced Chunking Strategies

For detailed chunking configuration and optimization, see the Understanding Chunking section below.

Architecture

graph TD
    A[Document Sources] --> B[Document Service]
    B --> C[Text Chunking]
    C --> D[Embedding Generation]
    D --> E[Vector Database]
    
    A1[Local Files] --> A
    A2[S3 Bucket] --> A
    A3[SharePoint] --> A
    
    E1[FAISS] --> E
    E2[Pinecone] --> E
    E3[ChromaDB] --> E
    E4[Azure AI Search] --> E
    E5[AWS Elasticsearch] --> E
    E6[PGVector] --> E
    
    F[Configuration] --> B
    G[Rate Limiter] --> D

Using Your Embedding Database

Once you've created your embedding database with NBEDR, you can integrate it into RAG applications and chatbots. Here's how to query and utilize your embeddings effectively.

🔍 RAG Query Flow

graph LR
    A[User Question] --> B[Generate Query Embedding]
    B --> C[Search Vector Database]
    C --> D[Retrieve Similar Chunks]
    D --> E[Add to LLM Context]
    E --> F[Generate Answer]
    F --> G[Return Response]

🚀 Direct Search with NBEDR CLI

# Search your embedding database
python nbedr.py search \
    --query "How do I configure SSL certificates?" \
    --vector-db faiss \
    --index-path ./embeddings_db \
    --top-k 5

# Advanced search with filters
python nbedr.py search \
    --query "database optimization techniques" \
    --vector-db pgvector \
    --filters '{"source": "technical-docs"}' \
    --top-k 10

💻 Programmatic Integration Examples

Simple RAG Pipeline

from core.vector_stores import FAISSVectorStore
from core.clients import create_provider_from_config
from core.config import get_config

# Load configuration and initialize components
config = get_config()
embedding_provider = create_provider_from_config(config)
vector_store = FAISSVectorStore({'faiss_index_path': './embeddings_db'})

async def answer_question(question: str) -> str:
    # 1. Generate embedding for the question
    result = await embedding_provider.generate_embeddings([question])
    query_embedding = result.embeddings[0]
    
    # 2. Search for similar documents
    search_results = await vector_store.search(
        query_embedding=query_embedding,
        top_k=5
    )
    
    # 3. Combine context for LLM
    context = "\n\n".join([result.content for result in search_results])
    
    # 4. Generate answer with your LLM
    prompt = f"Context:\n{context}\n\nQuestion: {question}\nAnswer:"
    # (Add your LLM call here)
    
    return answer

Chatbot Integration

import asyncio
from typing import List, Dict

class RAGChatbot:
    def __init__(self, vector_store, embedding_provider):
        self.vector_store = vector_store
        self.embedding_provider = embedding_provider
        self.conversation_history = []
    
    async def chat(self, message: str) -> str:
        # Generate embedding for user message
        embedding_result = await self.embedding_provider.generate_embeddings([message])
        query_embedding = embedding_result.embeddings[0]
        
        # Search for relevant context
        search_results = await self.vector_store.search(
            query_embedding=query_embedding,
            top_k=3
        )
        
        # Build context with conversation history
        context_chunks = [f"Document: {r.content}" for r in search_results]
        recent_history = self.conversation_history[-4:]  # Last 2 exchanges
        
        # Combine for LLM prompt
        context = "\n".join(context_chunks)
        history = "\n".join([f"{h['role']}: {h['content']}" for h in recent_history])
        
        # Generate response (add your LLM integration)
        response = await self.generate_llm_response(context, history, message)
        
        # Update conversation history
        self.conversation_history.extend([
            {"role": "user", "content": message},
            {"role": "assistant", "content": response}
        ])
        
        return response

Batch Document Processing

async def process_user_queries(queries: List[str]) -> List[Dict]:
    """Process multiple queries efficiently"""
    # Generate embeddings for all queries at once
    embedding_result = await embedding_provider.generate_embeddings(queries)
    
    results = []
    for i, query_embedding in enumerate(embedding_result.embeddings):
        # Search for each query
        search_results = await vector_store.search(
            query_embedding=query_embedding,
            top_k=5
        )
        
        results.append({
            'query': queries[i],
            'matches': [
                {
                    'content': r.content,
                    'source': r.source,
                    'similarity': r.similarity_score
                }
                for r in search_results
            ]
        })
    
    return results

🎯 Database-Specific Usage

FAISS (Local)

# Load and search FAISS index
from core.vector_stores import FAISSVectorStore

store = FAISSVectorStore({'faiss_index_path': './my_embeddings'})
await store.initialize()

results = await store.search(query_embedding, top_k=10)

Pinecone (Cloud)

# Search Pinecone index
from core.vector_stores import PineconeVectorStore

store = PineconeVectorStore({
    'pinecone_api_key': 'your-key',
    'pinecone_environment': 'your-env',
    'pinecone_index_name': 'rag-embeddings'
})

results = await store.search(
    query_embedding=query_embedding,
    top_k=5,
    filters={'source': 'documentation'}
)

PGVector (SQL)

# Combine vector search with SQL queries
from core.vector_stores import PGVectorStore

store = PGVectorStore({
    'pgvector_host': 'localhost',
    'pgvector_database': 'vectordb',
    'pgvector_user': 'postgres',
    'pgvector_password': 'password'
})

# Search with metadata filters
results = await store.search(
    query_embedding=query_embedding,
    top_k=10,
    filters={'metadata.document_type': 'manual'}
)

🔧 Advanced Usage Patterns

Hybrid Search (Keyword + Semantic)

async def hybrid_search(query: str, keywords: List[str]) -> List[Dict]:
    # Semantic search
    embedding_result = await embedding_provider.generate_embeddings([query])
    semantic_results = await vector_store.search(
        query_embedding=embedding_result.embeddings[0],
        top_k=20
    )
    
    # Keyword filtering
    keyword_filtered = [
        r for r in semantic_results 
        if any(keyword.lower() in r.content.lower() for keyword in keywords)
    ]
    
    return keyword_filtered[:10]

Contextual Chunk Assembly

async def get_expanded_context(query: str, expand_chunks: int = 2) -> str:
    # Find relevant chunks
    embedding_result = await embedding_provider.generate_embeddings([query])
    results = await vector_store.search(
        query_embedding=embedding_result.embeddings[0],
        top_k=5
    )
    
    # Group by source and expand context
    context_blocks = []
    for result in results:
        # Get neighboring chunks for better context
        expanded_context = await get_neighboring_chunks(
            result.source, 
            result.id, 
            expand_chunks
        )
        context_blocks.append(expanded_context)
    
    return "\n\n---\n\n".join(context_blocks)

📊 Performance Optimization

Embedding Caching

from functools import lru_cache
import hashlib

class CachedEmbeddingProvider:
    def __init__(self, provider):
        self.provider = provider
        self.cache = {}
    
    async def generate_embeddings(self, texts: List[str]):
        # Hash texts for cache key
        cache_key = hashlib.md5('|'.join(texts).encode()).hexdigest()
        
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        result = await self.provider.generate_embeddings(texts)
        self.cache[cache_key] = result
        return result

🎨 Integration with Popular Frameworks

LangChain Integration

from langchain.embeddings.base import Embeddings
from langchain.vectorstores import VectorStore

class NBEDREmbeddings(Embeddings):
    def __init__(self, provider):
        self.provider = provider
    
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        result = asyncio.run(self.provider.generate_embeddings(texts))
        return result.embeddings
    
    def embed_query(self, text: str) -> List[float]:
        result = asyncio.run(self.provider.generate_embeddings([text]))
        return result.embeddings[0]

Quick Start

Installation

# Clone the repository
git clone https://github.com/your-org/nbedr.git
cd nbedr

# Install dependencies
pip install -r requirements.txt

# Or install with optional dependencies
pip install -e .[cloud,dev]

Basic Usage

Create embeddings from local documents:

python nbedr.py create-embeddings \
    --source local \
    --source-path ./documents \
    --vector-db faiss \
    --output-path ./embeddings_db

Process documents from S3:

python nbedr.py create-embeddings \
    --source s3 \
    --source-path s3://my-bucket/documents/ \
    --vector-db pinecone \
    --pinecone-index my-index

Use Azure AI Search:

python nbedr.py create-embeddings \
    --source local \
    --source-path ./documents \
    --vector-db azure_ai_search \
    --azure-search-service your-service-name \
    --azure-search-index rag-embeddings

Use AWS Elasticsearch:

python nbedr.py create-embeddings \
    --source local \
    --source-path ./documents \
    --vector-db aws_elasticsearch \
    --aws-elasticsearch-endpoint https://your-domain.region.es.amazonaws.com

Use PGVector:

python nbedr.py create-embeddings \
    --source local \
    --source-path ./documents \
    --vector-db pgvector \
    --pgvector-host localhost \
    --pgvector-database vectordb

Search for similar documents:

python nbedr.py search \
    --query "machine learning algorithms" \
    --vector-db faiss \
    --index-path ./embeddings_db \
    --top-k 5

Configuration

Create a .env file with your configuration:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_DIMENSIONS=1536

# Vector Database Configuration
VECTOR_DB_TYPE=faiss
FAISS_INDEX_PATH=./embeddings_db

# Pinecone Configuration (if using Pinecone)
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_environment
PINECONE_INDEX_NAME=your_index_name

# ChromaDB Configuration (if using ChromaDB)
CHROMA_HOST=localhost
CHROMA_PORT=8000

# Azure AI Search Configuration (if using Azure AI Search)
AZURE_SEARCH_SERVICE_NAME=your_search_service
AZURE_SEARCH_API_KEY=your_api_key
AZURE_SEARCH_INDEX_NAME=rag-embeddings

# AWS Elasticsearch Configuration (if using AWS Elasticsearch)
AWS_ELASTICSEARCH_ENDPOINT=https://your-domain.region.es.amazonaws.com
AWS_ELASTICSEARCH_REGION=us-east-1
AWS_ELASTICSEARCH_INDEX_NAME=rag-embeddings
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key

# PGVector Configuration (if using PGVector)
PGVECTOR_HOST=localhost
PGVECTOR_PORT=5432
PGVECTOR_DATABASE=vectordb
PGVECTOR_USER=postgres
PGVECTOR_PASSWORD=your_postgres_password
PGVECTOR_TABLE_NAME=rag_embeddings

# Processing Configuration
CHUNK_SIZE=512
CHUNKING_STRATEGY=semantic
BATCH_SIZE=10
MAX_WORKERS=4

Advanced Usage

Custom Chunking Strategies

# Semantic chunking (uses embeddings to determine boundaries)
python nbedr.py create-embeddings \
    --chunking-strategy semantic \
    --chunk-size 512 \
    --source local \
    --source-path ./docs

# Fixed-size chunking
python nbedr.py create-embeddings \
    --chunking-strategy fixed \
    --chunk-size 1000 \
    --chunk-overlap 100 \
    --source local \
    --source-path ./docs

Batch Processing

# Process large document collections
python nbedr.py create-embeddings \
    --source s3 \
    --source-path s3://large-corpus/ \
    --batch-size 50 \
    --max-workers 8 \
    --rate-limit-requests 100 \
    --rate-limit-period 60

Preview Mode

# Preview what will be processed without actually doing it
python nbedr.py create-embeddings \
    --source local \
    --source-path ./documents \
    --preview

API Usage

from core.services.document_service import DocumentService
from core.config import EmbeddingConfig

# Initialize configuration
config = EmbeddingConfig()

# Create document service
service = DocumentService(config)

# Process documents
results = await service.process_documents(
    source_path="./documents",
    source_type="local"
)

print(f"Processed {len(results.chunks)} chunks")
print(f"Generated {len(results.embeddings)} embeddings")

Understanding Chunking: The Art of Breaking Down Documents

Why Chunking Matters

Think of chunking like cutting a pizza into slices - you need pieces that are the right size to be useful. If the slices are too big, they're hard to handle and contain too much mixed information. If they're too small, you lose important context and meaning.

When processing documents for AI, chunking determines how well your AI system can find and use relevant information. Good chunking means better, more accurate AI responses.

Chunking Strategies Explained

🎯 Semantic Chunking (Recommended for Most Use Cases)

What it does: Uses AI to understand the meaning and flow of your text, then creates natural breakpoints where topics change.

Think of it like: A smart editor who reads your document and says "this paragraph is about marketing, but this next section switches to finance" and makes a cut there.

Best for:

Mixed content (reports, manuals, articles)
Documents with varying topic sections
When you want the highest quality results

Configuration:

CHUNKING_STRATEGY=semantic
CHUNK_SIZE=512

📏 Fixed-Size Chunking (Most Predictable)

What it does: Creates chunks of exactly the same size, like cutting a rope into equal lengths.

Think of it like: Using a ruler to mark off exact measurements - every piece is the same size.

Best for:

Consistent document types (legal docs, technical manuals)
When you need predictable processing times
Large volumes of similar content

Configuration:

CHUNKING_STRATEGY=fixed
CHUNK_SIZE=1000
CHUNK_OVERLAP=100

📝 Sentence-Aware Chunking (Natural Boundaries)

What it does: Breaks text at sentence endings, keeping complete thoughts together.

Think of it like: A careful reader who never cuts off someone mid-sentence.

Best for:

Narrative content (stories, case studies)
Interview transcripts
Conversational content

Configuration:

CHUNKING_STRATEGY=sentence
CHUNK_SIZE=500

Chunk Size and Overlap: Finding the Sweet Spot

Chunk Size Guidelines

Content Type	Recommended Size	Why
Technical Documentation	800-1200 tokens	Complex concepts need more context
Marketing Content	400-600 tokens	Concise, focused messages
Legal Documents	1000-1500 tokens	Detailed context is crucial
News Articles	300-500 tokens	Quick, digestible information
Academic Papers	600-1000 tokens	Balance between detail and focus

Token Rule of Thumb: 1 token ≈ 0.75 words in English, so 500 tokens ≈ 375 words

Overlap: The Safety Net

What is overlap?: When chunks share some content at their boundaries, like overlapping roof tiles.

Why use overlap?:

Prevents Context Loss: Important information spanning chunk boundaries isn't lost
Improves Search: Better chance of finding relevant information
Maintains Meaning: Keeps related concepts together

Overlap Guidelines:

Standard: 10-20% of chunk size
High Precision Needed: 20-30% overlap
Performance Focused: 5-10% overlap

# Example: 1000 token chunks with 20% overlap
CHUNK_SIZE=1000
CHUNK_OVERLAP=200

Configuration Impact Guide

Chunk Size Impact

Larger Chunks (1000+ tokens):

✅ Pros: More context, better for complex topics, fewer total chunks
❌ Cons: Less precise retrieval, higher costs, slower processing

Smaller Chunks (300-500 tokens):

✅ Pros: More precise retrieval, faster processing, lower costs
❌ Cons: May lose context, more chunks to manage

Overlap Impact

High Overlap (25%+):

✅ Pros: Better information preservation, improved search accuracy
❌ Cons: More storage needed, increased processing time

Low Overlap (5-10%):

✅ Pros: Efficient storage, faster processing
❌ Cons: Risk of losing information at boundaries

Recommended Configurations by Use Case

Customer Support Knowledge Base

CHUNKING_STRATEGY=semantic
CHUNK_SIZE=600
CHUNK_OVERLAP=120

Why: Balances quick answers with sufficient context

Legal Document Analysis

CHUNKING_STRATEGY=fixed
CHUNK_SIZE=1200
CHUNK_OVERLAP=300

Why: Maintains legal context integrity with high overlap

Product Documentation

CHUNKING_STRATEGY=semantic
CHUNK_SIZE=800
CHUNK_OVERLAP=160

Why: Keeps procedures and concepts together

News and Media Content

CHUNKING_STRATEGY=sentence
CHUNK_SIZE=400
CHUNK_OVERLAP=80

Why: Preserves story flow and readability

Performance Considerations

Cost Optimization:

Smaller chunks = Lower embedding costs
Less overlap = Lower storage costs
Batch processing = Better rate limits

Quality Optimization:

Semantic chunking = Best understanding
Higher overlap = Better information retention
Larger chunks = More context for complex topics

Speed Optimization:

Fixed chunking = Fastest processing
Smaller chunks = Faster search
Lower overlap = Less processing time

Configuration Options

Embedding Providers: Choose Your AI Platform

NBEDR supports 7 different embedding providers, from major cloud platforms to local solutions. This gives you complete flexibility to choose the right solution for your needs, budget, and privacy requirements.

🌟 Provider Overview

Provider	Type	Best For	Cost	Privacy	Setup
OpenAI	Cloud	Production, quality	Pay-per-use	Shared	Easy
Azure OpenAI	Cloud	Enterprise, compliance	Pay-per-use	Enterprise	Medium
AWS Bedrock	Cloud	AWS ecosystem	Pay-per-use	Enterprise	Medium
Google Vertex AI	Cloud	Google ecosystem	Pay-per-use	Enterprise	Medium
LMStudio	Local	Development, testing	Free	Complete	Easy
Ollama	Local	Privacy, offline use	Free	Complete	Easy
Llama.cpp	Local	Custom models, research	Free	Complete	Hard

🚀 Quick Start by Provider

OpenAI (Recommended for Most Users)

# Set your API key
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=your_api_key_here
export EMBEDDING_MODEL=text-embedding-3-small

# Run embedding generation
python nbedr.py create-embeddings --source local --source-path ./documents

Azure OpenAI (Enterprise)

# Configure Azure OpenAI
export EMBEDDING_PROVIDER=azure_openai
export AZURE_OPENAI_API_KEY=your_api_key
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
export AZURE_OPENAI_DEPLOYMENT_NAME=your-embedding-deployment

python nbedr.py create-embeddings --source local --source-path ./documents

Ollama (Local & Free)

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve

# Pull an embedding model
ollama pull nomic-embed-text

# Configure NBEDR
export EMBEDDING_PROVIDER=ollama
export EMBEDDING_MODEL=nomic-embed-text

python nbedr.py create-embeddings --source local --source-path ./documents

📋 Complete Configuration Guide

OpenAI Configuration

# Provider selection
EMBEDDING_PROVIDER=openai

# Authentication
OPENAI_API_KEY=your_api_key_here
OPENAI_ORGANIZATION=your_org_id  # Optional

# Model settings
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536

# Performance
OPENAI_TIMEOUT=60
OPENAI_MAX_RETRIES=3
EMBEDDING_BATCH_SIZE=100

Available Models:

text-embedding-3-large (3072 dims) - Highest quality, $0.00013/1K tokens
text-embedding-3-small (1536 dims) - Best balance, $0.00002/1K tokens
text-embedding-ada-002 (1536 dims) - Legacy, $0.0001/1K tokens

Azure OpenAI Configuration

# Provider selection
EMBEDDING_PROVIDER=azure_openai

# Authentication
AZURE_OPENAI_API_KEY=your_api_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-01

# Deployment mapping
AZURE_OPENAI_DEPLOYMENT_NAME=your-embedding-deployment
# For multiple models (JSON format):
AZURE_OPENAI_DEPLOYMENT_MAPPING={"text-embedding-3-small": "embedding-small", "text-embedding-3-large": "embedding-large"}

# Model settings
EMBEDDING_MODEL=text-embedding-3-small

AWS Bedrock Configuration

# Provider selection
EMBEDDING_PROVIDER=aws_bedrock

# AWS credentials (or use IAM roles)
AWS_BEDROCK_REGION=us-east-1
AWS_BEDROCK_ACCESS_KEY_ID=your_access_key
AWS_BEDROCK_SECRET_ACCESS_KEY=your_secret_key

# Model settings
EMBEDDING_MODEL=amazon.titan-embed-text-v1

Available Models:

amazon.titan-embed-text-v1 (1536 dims) - Amazon's embedding model
amazon.titan-embed-text-v2:0 (1024 dims) - Latest Amazon model
cohere.embed-english-v3 (1024 dims) - Cohere English embeddings
cohere.embed-multilingual-v3 (1024 dims) - Cohere multilingual

Google Vertex AI Configuration

# Provider selection
EMBEDDING_PROVIDER=google_vertex

# Google Cloud settings
GOOGLE_VERTEX_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Model settings
EMBEDDING_MODEL=textembedding-gecko@003

Available Models:

textembedding-gecko@003 (768 dims) - Latest Gecko model
textembedding-gecko@002 (768 dims) - Previous version
text-embedding-004 (768 dims) - Latest general model
text-multilingual-embedding-002 (768 dims) - Multilingual support

LMStudio Configuration (Local)

# Provider selection
EMBEDDING_PROVIDER=lmstudio

# Server settings
LMSTUDIO_BASE_URL=http://localhost:1234
LMSTUDIO_API_KEY=optional_api_key  # If you set one

# Model settings (use whatever model you loaded in LMStudio)
EMBEDDING_MODEL=your-loaded-model

Setup Steps:

Download and install LMStudio
Download an embedding model (like nomic-ai/nomic-embed-text-v1.5-GGUF)
Load the model and start the local server
Configure NBEDR with the settings above

Ollama Configuration (Local)

# Provider selection
EMBEDDING_PROVIDER=ollama

# Server settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_TIMEOUT=120

# Model settings
EMBEDDING_MODEL=nomic-embed-text

Setup Steps:

Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
Start Ollama: ollama serve
Pull an embedding model: ollama pull nomic-embed-text
Configure NBEDR with the settings above

Popular Embedding Models:

nomic-embed-text (768 dims) - High-quality English embeddings
mxbai-embed-large (1024 dims) - Large general-purpose model
snowflake-arctic-embed (1024 dims) - Snowflake's model
all-minilm (384 dims) - Lightweight multilingual

Llama.cpp Configuration (Local)

# Provider selection
EMBEDDING_PROVIDER=llamacpp

# Server settings
LLAMACPP_BASE_URL=http://localhost:8000
LLAMACPP_MODEL_NAME=your-model-name
LLAMACPP_DIMENSIONS=4096  # Set based on your model

# Authentication (if needed)
LLAMACPP_API_KEY=optional_api_key

Setup Steps:

Install llama-cpp-python: pip install llama-cpp-python[server]
Download a GGUF embedding model
Start the server: python -m llama_cpp.server --model path/to/model.gguf --embedding
Configure NBEDR with the settings above

🎯 Provider Selection Guide

Choose OpenAI when:

You want the highest quality embeddings
Cost is not the primary concern
You need reliable, proven performance
You're building a production application

Choose Azure OpenAI when:

You're in an enterprise environment
You need compliance guarantees (SOC 2, HIPAA)
You're already using Azure services
You need dedicated capacity and SLAs

Choose AWS Bedrock when:

You're already using AWS services
You want access to multiple model providers
You need enterprise-grade security
You prefer AWS pricing models

Choose Google Vertex AI when:

You're using Google Cloud Platform
You need integration with other Google AI services
You want access to Google's latest models
You're building multilingual applications

Choose LMStudio when:

You're developing and testing locally
You want an easy GUI for model management
You need to experiment with different models
You want local processing without complexity

Choose Ollama when:

Privacy is paramount (data never leaves your machine)
You want completely free operation
You need offline capabilities
You're comfortable with command-line tools

Choose Llama.cpp when:

You need maximum control and customization
You're doing research or advanced development
You want to use custom or fine-tuned models
Performance optimization is critical

💰 Cost Comparison

Provider	Cost Model	Example Cost (1M tokens)
OpenAI	Pay-per-token	$20-130 depending on model
Azure OpenAI	Pay-per-token	Similar to OpenAI
AWS Bedrock	Pay-per-token	$10-100 depending on model
Google Vertex	Pay-per-token	$25-200 depending on model
LMStudio	Free	$0
Ollama	Free	$0
Llama.cpp	Free	$0

🔒 Privacy & Security

Cloud Providers (OpenAI, Azure, AWS, Google)

Data is sent to external servers
Subject to provider's privacy policies
Enterprise options available with enhanced security
Data retention policies vary by provider

Local Providers (LMStudio, Ollama, Llama.cpp)

Data never leaves your machine
Complete privacy and control
No internet required for processing
Ideal for sensitive or proprietary content

🚀 Performance Characteristics

Provider	Latency	Throughput	Reliability
OpenAI	Low	High	Very High
Azure OpenAI	Low	High	Very High
AWS Bedrock	Medium	Medium	High
Google Vertex	Low	High	High
LMStudio	Very Low	Medium	Medium
Ollama	Very Low	Medium	Medium
Llama.cpp	Very Low	Variable	Medium

📝 Customizing Embedding Prompts

NBEDR allows you to customize the prompts used for generating embeddings to improve quality and relevance for your specific domain and use case.

Quick Start

Use Default Template: NBEDR includes a default embedding prompt template at templates/embedding_prompt_template.txt

Set Custom Template Path:

export EMBEDDING_PROMPT_TEMPLATE="templates/my_custom_template.txt"

Or Configure in Environment:

EMBEDDING_PROMPT_TEMPLATE=/path/to/your/custom_template.txt

Creating Custom Prompt Templates

Example Medical Domain Template (templates/medical_template.txt):

Generate embeddings for medical literature that capture clinical concepts effectively.

Focus on:
- Medical terminology and procedures: {content}
- Drug names, dosages, and interactions
- Symptoms, diagnoses, and treatment protocols
- Clinical outcomes and research findings

Document Type: {document_type}
Content: {content}
Metadata: {metadata}

Ensure embeddings enable accurate retrieval for medical information systems.

Example Legal Domain Template (templates/legal_template.txt):

Generate embeddings for legal documents optimized for legal research and analysis.

Focus on:
- Legal terminology and concepts
- Case citations and precedents: {content}
- Statutory references and regulations
- Contractual terms and legal obligations

Document Type: {document_type}
Chunk: {chunk_index} of document
Content: {content}

Prioritize legal concepts and relationships for accurate legal document retrieval.

Available Template Variables

Use these variables in your custom templates:

{content}: The document content to be embedded
{document_type}: File type (pdf, txt, json, pptx, etc.)
{metadata}: Additional document metadata (file size, source, etc.)
{chunk_index}: Index of the current chunk within the document
{chunking_strategy}: The chunking method used (semantic, fixed, sentence)

Custom Variables

Add your own variables using the EMBEDDING_CUSTOM_PROMPT_VARIABLES environment variable:

export EMBEDDING_CUSTOM_PROMPT_VARIABLES='{"domain": "healthcare", "use_case": "clinical_research"}'

Then use them in your template:

Generate embeddings for {domain} content optimized for {use_case}.
Content: {content}

Configuration Examples

Using Environment Variables:

# Set custom template
export EMBEDDING_PROMPT_TEMPLATE="templates/technical_docs_template.txt"

# Add custom variables
export EMBEDDING_CUSTOM_PROMPT_VARIABLES='{"company": "TechCorp", "product": "API"}'

# Run with custom prompts
python nbedr.py create-embeddings --datapath ./docs --doctype pdf

Using CLI Arguments:

python nbedr.py create-embeddings \
  --datapath ./documents \
  --doctype pdf \
  --embedding-prompt-template templates/my_template.txt

Template Best Practices

Be Domain-Specific: Include terminology and concepts specific to your field
Provide Context: Explain the intended use case for the embeddings
Keep It Focused: Avoid overly long prompts that might confuse the model
Test and Iterate: Experiment with different prompts and measure embedding quality
Use Variables: Leverage template variables for dynamic content insertion

Template Examples by Domain

See the templates/ directory for example templates:

embedding_prompt_template.txt - Default general-purpose template
templates/README.md - Complete template documentation with examples

Quick Domain Templates:

Technical Documentation:

export EMBEDDING_PROMPT_TEMPLATE="templates/tech_docs_template.txt"

Academic Research:

export EMBEDDING_PROMPT_TEMPLATE="templates/academic_template.txt"

Business Content:

export EMBEDDING_PROMPT_TEMPLATE="templates/business_template.txt"

Vector Databases

FAISS (Facebook AI Similarity Search)

Best for: Local development, high-performance searches, full control
Pros: Free, very fast, runs locally
Cons: Requires technical setup, no cloud features

Pinecone

Best for: Production applications, scaling, managed service
Pros: Fully managed, excellent performance, built-in scaling
Cons: Cost increases with usage

ChromaDB

Best for: Open-source preference, flexibility, development
Pros: Open source, good documentation, easy to extend
Cons: Requires more setup than managed services

Azure AI Search (Enterprise Search Platform)

Best for: Enterprise applications, Microsoft ecosystem, hybrid search
Pros:
- Enterprise-grade: Built for large-scale enterprise applications
- Hybrid Search: Combines keyword search, semantic search, and vector search
- Rich Filtering: Advanced filtering, faceting, and aggregation capabilities
- Security & Compliance: Enterprise security, compliance certifications (SOC 2, HIPAA)
- Multi-modal: Supports text, images, and structured data
- Built-in AI: Integrated with Azure Cognitive Services for text analysis
- High Availability: 99.9% SLA with automatic failover
Cons:
- Cost: Can be expensive for large-scale deployments
- Microsoft Lock-in: Best when already using Azure ecosystem
- Complexity: More complex setup compared to simple vector databases
- Learning Curve: Requires understanding of Azure services

AWS Elasticsearch (Amazon OpenSearch Service)

Best for: AWS ecosystem, complex analytics, multi-purpose search
Pros:
- AWS Integration: Seamless integration with other AWS services
- Mature Platform: Built on proven Elasticsearch technology
- Analytics Capabilities: Advanced analytics, visualizations with Kibana
- Flexible Deployment: Multiple instance types and configurations
- Cost-Effective Scaling: Pay-as-you-scale model
- Multi-tenancy: Support for multiple applications/indices
- Real-time Processing: Near real-time indexing and search
Cons:
- AWS Lock-in: Vendor lock-in to AWS ecosystem
- Operational Overhead: Requires monitoring and maintenance
- Cost Complexity: Pricing can be complex with multiple factors
- Version Lag: May not have latest Elasticsearch features immediately

PGVector (PostgreSQL with pgvector extension)

Best for: PostgreSQL shops, relational data integration, cost-conscious deployments
Pros:
- Familiar Technology: Built on PostgreSQL, widely known and trusted
- ACID Compliance: Full transactional support and data consistency
- Cost-Effective: Use existing PostgreSQL infrastructure
- Rich Querying: Combine vector search with SQL joins and filters
- Self-Hosted: Complete control over data and infrastructure
- Active Development: Growing ecosystem and community support
- Backup & Recovery: Leverage PostgreSQL's robust backup solutions
Cons:
- Performance Limitations: May not match specialized vector databases at scale
- Manual Setup: Requires PostgreSQL and pgvector extension installation
- Operational Overhead: Need to manage PostgreSQL maintenance and tuning
- Limited Tooling: Less specialized tooling compared to purpose-built vector DBs

Choosing the Right Vector Database

Decision Matrix

Factor	FAISS	Pinecone	ChromaDB	Azure AI Search	AWS Elasticsearch	PGVector
Setup Complexity	High	Low	Medium	High	Medium	Medium
Cost	Free	Pay-per-use	Free	High	Variable	Low
Performance	Excellent	Excellent	Good	Very Good	Good	Good
Scalability	Manual	Automatic	Manual	Automatic	Semi-automatic	Manual
Enterprise Features	None	Some	None	Extensive	Extensive	Some
Multi-modal Support	No	Limited	No	Yes	Limited	No
Analytics	No	Limited	No	Yes	Excellent	Limited

Use Case Recommendations

Choose FAISS when:

Building a prototype or research project
Need maximum performance and control
Have technical expertise for setup and maintenance
Budget is limited

Choose Pinecone when:

Want a simple, managed vector database
Need to get to market quickly
Prefer specialized vector search capabilities
Have predictable usage patterns

Choose ChromaDB when:

Prefer open-source solutions
Need customization flexibility
Building internal tools
Want to avoid vendor lock-in

Choose Azure AI Search when:

Already using Microsoft/Azure ecosystem
Need enterprise-grade security and compliance
Require hybrid search (keyword + semantic + vector)
Building customer-facing applications
Need rich filtering and faceting capabilities
Have complex data types (text, images, structured data)

Choose AWS Elasticsearch when:

Already using AWS ecosystem
Need comprehensive analytics and dashboarding
Have diverse data sources and types
Require complex aggregations and reporting
Want mature, battle-tested search technology
Need multi-tenancy support

Choose PGVector when:

Already using PostgreSQL as primary database
Need to combine vector search with relational data
Want cost-effective solution with existing infrastructure
Require ACID compliance and transactional consistency
Prefer self-hosted solutions
Have existing PostgreSQL expertise in your team

Development

Prerequisites

nBedR requires Python 3.11 or higher. For detailed development setup instructions, see docs/DEVELOPMENT.md.

Note: The project includes a .python-version file that specifies Python 3.11 as the default. Tools like pyenv will automatically use this version.

Quick Setup

# Check Python version
python3.11 --version

# Automated setup
./scripts/setup_dev.sh

# Or manual setup
python3.11 -m venv venv
source venv/bin/activate
pip install -e .[dev,all]

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=core --cov-report=html

# Run specific test modules
pytest tests/unit/test_models.py
pytest tests/integration/test_document_service.py

Code Quality

# Format code
black .
isort .

# Type checking
mypy core/

# Linting
flake8 core/
bandit -r core/

Build and Release

For comprehensive build instructions, CI/CD pipeline details, release procedures, and deployment guidelines, see the Build Documentation.

Quick Build Commands

# Local development setup
pip install -e .[dev,all]

# Run tests and quality checks
pytest tests/ -v --cov=core --cov=cli
black . && isort . && flake8 .

# Build Python package
python -m build

# Build Docker container
docker build -f deployment/docker/Dockerfile -t nbedr:local .

Release Process

Releases are managed through GitHub Actions workflows:

Automated CI/CD: Every push triggers comprehensive testing and building
Manual Releases: Use GitHub Actions UI to trigger releases with automatic version management
Multiple Artifacts: Releases include PyPI packages and Docker containers
Changelog Integration: Release notes automatically include changelog content

For detailed release procedures and troubleshooting, see the Build Documentation.

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on the foundation of the RAFT Toolkit
Utilizes LangChain for text processing
Powered by OpenAI embeddings

📚 Navigation: Home | Development | Deployment | Security

🔗 Links: Repository | Issues | Releases

Last updated: 2025-06-21 17:09 UTC

Home - MakerCorn/nbedr GitHub Wiki

nBedR - RAG Embedding Toolkit

Table of Contents

Overview

What Are Embeddings and Why Do They Matter?

How Embeddings Power Modern AI Applications

The RAG Process Explained Simply

Why This Application Matters

Architecture

Features

🚀 Core Capabilities

🔧 Document Sources

🎯 Vector Database Support

Quick Start

System Requirements

Installation

Basic Usage

Configuration

Core Settings

Quick Provider Setup

Advanced Configuration

Rate Limiting Configuration

Pre-configured Rate Limit Presets

Best Practices for Rate Limiting Configuration

Advanced Configuration

Advanced Configuration Topics

Rate Limiting Configuration

Rate Limiting Configuration Examples

Monitoring Rate Limiting

Parallel Processing and Multi-Instance Deployment

Why Run Multiple Instances?

Instance Coordination System

Running Multiple Instances

Instance Management

Rate Limiting with Multiple Instances

Best Practices for Parallel Processing

Troubleshooting Parallel Execution

Detailed Embedding Provider Configurations

Advanced Vector Database Configurations

Advanced Chunking Strategies

Architecture

Using Your Embedding Database

🔍 RAG Query Flow

🚀 Direct Search with NBEDR CLI

💻 Programmatic Integration Examples

Simple RAG Pipeline

Chatbot Integration

Batch Document Processing

🎯 Database-Specific Usage

FAISS (Local)

Pinecone (Cloud)

PGVector (SQL)

🔧 Advanced Usage Patterns

Hybrid Search (Keyword + Semantic)

Contextual Chunk Assembly

📊 Performance Optimization

Embedding Caching

🎨 Integration with Popular Frameworks

LangChain Integration

Quick Start

Installation

Basic Usage

Configuration

Advanced Usage

Custom Chunking Strategies

Batch Processing

Preview Mode

API Usage

Understanding Chunking: The Art of Breaking Down Documents

Why Chunking Matters

Chunking Strategies Explained

🎯 Semantic Chunking (Recommended for Most Use Cases)

📏 Fixed-Size Chunking (Most Predictable)

📝 Sentence-Aware Chunking (Natural Boundaries)

Chunk Size and Overlap: Finding the Sweet Spot

Chunk Size Guidelines

Overlap: The Safety Net

Configuration Impact Guide

Chunk Size Impact

Overlap Impact