Home - MakerCorn/nbedr GitHub Wiki
nBedR - RAG Embedding Toolkit
A powerful tool for creating and managing embedding databases for Retrieval Augmented Generation (RAG) applications.
Table of Contents
- Overview
- Features
- Quick Start
- Understanding Chunking
- Configuration
- Using Your Embedding Database
- Advanced Configuration
- Development
- Build and Release
- Contributing
Overview
What Are Embeddings and Why Do They Matter?
Imagine you want to teach a computer to understand the meaning of words, sentences, and documents - not just the letters and spelling, but the actual meaning behind them. This is where embeddings come in.
Embeddings are like a universal translator for computers. They convert human language (text) into numbers that computers can understand and compare. Think of it like giving every piece of text a unique "fingerprint" made of numbers that captures its meaning.
Here's a simple analogy: If you were organizing books in a library, you wouldn't just sort them alphabetically. You'd group books by topic - science books near other science books, cooking books with other cooking books, etc. Embeddings do something similar but for any text - they help computers understand which pieces of text are similar in meaning, even if they use completely different words.
How Embeddings Power Modern AI Applications
In Generative AI (GenAI) applications, embeddings are the secret sauce that makes AI systems truly intelligent. Here's how:
-
Understanding Context: When you ask an AI assistant a question, embeddings help it understand what you're really asking about, not just match keywords.
-
Finding Relevant Information: In Retrieval Augmented Generation (RAG) systems, embeddings help AI find the most relevant documents or passages to answer your questions, just like a very smart librarian who instantly knows which books contain information related to your question.
-
Maintaining Consistency: Embeddings ensure that AI responses are grounded in actual facts from your documents, rather than hallucinating information.
The RAG Process Explained Simply
graph LR
A[Your Question] --> B[Find Similar Content]
B --> C[Retrieved Documents]
C --> D[AI Generates Answer]
D --> E[Factual Response]
F[Your Documents] --> G[Break into Chunks]
G --> H[Create Embeddings]
H --> I[Store in Database]
I --> B
- Preparation Phase: Your documents are broken into smaller pieces (chunks) and converted into embeddings
- Question Phase: When you ask a question, the system finds the most relevant chunks using embeddings
- Answer Phase: AI uses those relevant chunks to generate an accurate, factual answer
This application handles the preparation phase - the crucial foundation that makes everything else possible.
Why This Application Matters
This tool provides a streamlined solution for processing documents, generating embeddings, and storing them in vector databases. Built on the foundation of the excellent RAFT toolkit, it focuses specifically on embedding operations and vector database management - essentially preparing your documents so they can be used effectively in AI applications.
Architecture
graph TD
A[Document Sources] --> B[Document Service]
B --> C[Text Chunking]
C --> D[Embedding Generation]
D --> E[Vector Database]
A1[Local Files] --> A
A2[S3 Bucket] --> A
A3[SharePoint] --> A
E1[FAISS] --> E
E2[Pinecone] --> E
E3[ChromaDB] --> E
E4[Azure AI Search] --> E
E5[AWS Elasticsearch] --> E
E6[PGVector] --> E
F[Configuration] --> B
G[Rate Limiter] --> D
Features
🚀 Core Capabilities
- Multi-format Document Processing: PDF, TXT, JSON, PPTX, and more
- Advanced Chunking Strategies: Semantic, fixed-size, and sentence-aware chunking
- Multiple Vector Databases: FAISS, Pinecone, and ChromaDB support
- Parallel Processing: Efficient batch processing with configurable workers and multi-instance coordination
- Rate Limiting: Smart API rate limiting to prevent quota exhaustion
- Cloud Storage Integration: S3, Azure Blob, and SharePoint support
🔧 Document Sources
- Local Files: Process files from local directories
- Amazon S3: Direct integration with S3 buckets
- SharePoint: Microsoft SharePoint document libraries
- Batch Processing: Handle large document collections efficiently
🎯 Vector Database Support
- FAISS: High-performance similarity search and clustering
- Pinecone: Managed vector database service
- ChromaDB: Open-source embedding database
- Azure AI Search: Microsoft's enterprise search service with vector capabilities
- AWS Elasticsearch: Amazon's managed Elasticsearch with vector search support
- PGVector: PostgreSQL with pgvector extension for vector operations
Quick Start
System Requirements
- Python 3.11 or higher (Python 3.9 and 3.10 are no longer supported)
- 4GB+ RAM recommended for large document processing
- Sufficient disk space for vector databases (varies by collection size)
Installation
# Ensure you're using Python 3.11 or higher
python3.11 --version # Should show 3.11.x or higher
# Install from PyPI
pip install nbedr
# Or install from source
git clone https://github.com/your-org/nbedr.git
cd nbedr
pip install -e .
Basic Usage
- Set up your environment variables:
# Choose your embedding provider
export EMBEDDING_PROVIDER="openai"
export OPENAI_API_KEY="your-api-key-here"
# Choose your vector database
export VECTOR_DATABASE_TYPE="faiss"
- Process documents and create embeddings:
# Process local documents
nbedr create-embeddings \
--source local \
--local-path ./documents \
--output-path ./embeddings
# Process S3 documents
nbedr create-embeddings \
--source s3 \
--s3-bucket my-documents \
--output-path ./embeddings
- Search your embeddings:
nbedr search \
--query "What is machine learning?" \
--embeddings-path ./embeddings \
--top-k 5
Configuration
Basic configuration is handled through environment variables. Here are the essential settings:
Core Settings
# Embedding Provider
export EMBEDDING_PROVIDER="openai" # openai, azure_openai, aws_bedrock, etc.
export OPENAI_API_KEY="your-api-key-here"
export EMBEDDING_MODEL="text-embedding-3-large"
export EMBEDDING_DIMENSIONS=1536
# Vector Database
export VECTOR_DATABASE_TYPE="faiss" # faiss, pinecone, chromadb, etc.
export FAISS_INDEX_PATH="./embeddings_db"
# Document Processing
export CHUNK_SIZE=512
export CHUNKING_STRATEGY="semantic" # semantic, fixed_size, sentence_aware
export BATCH_SIZE=10
export MAX_WORKERS=4
Quick Provider Setup
OpenAI:
export OPENAI_API_KEY="your-api-key"
export OPENAI_ORGANIZATION="your-org-id" # Optional
Pinecone:
export PINECONE_API_KEY="your-api-key"
export PINECONE_ENVIRONMENT="your-environment"
export PINECONE_INDEX_NAME="your-index"
ChromaDB:
export CHROMA_HOST="localhost"
export CHROMA_PORT=8000
For detailed configuration options, advanced settings, and comprehensive setup guides, see the Advanced Configuration section below.
Advanced Configuration
This section covers advanced configuration options for production deployments, performance optimization, and specialized use cases.
Rate Limiting Configuration
Rate limiting is configured separately for embedding providers and vector databases to prevent API quota exhaustion and optimize performance:
Embedding Providers:
# Enable rate limiting for embedding providers
RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=sliding_window
RATE_LIMIT_REQUESTS_PER_MINUTE=500
RATE_LIMIT_TOKENS_PER_MINUTE=350000
RATE_LIMIT_MAX_BURST=100
Vector Databases:
# Enable rate limiting for vector store operations
VECTOR_STORE_RATE_LIMIT_ENABLED=true
VECTOR_STORE_RATE_LIMIT_STRATEGY=sliding_window
VECTOR_STORE_RATE_LIMIT_REQUESTS_PER_MINUTE=300
VECTOR_STORE_RATE_LIMIT_MAX_BURST=50
Pre-configured Rate Limit Presets
nBedR includes optimized presets for popular services:
Service | Preset | RPM | TPM | Strategy | Description |
---|---|---|---|---|---|
OpenAI Tier 1 | openai_embeddings_tier1 |
500 | 350,000 | sliding_window | Standard OpenAI limits |
OpenAI Tier 2 | openai_embeddings_tier2 |
5,000 | 2,000,000 | sliding_window | Higher tier limits |
Azure OpenAI | azure_openai_standard |
120 | 240,000 | sliding_window | Standard deployment |
AWS Bedrock | aws_bedrock_titan |
2,000 | 400,000 | sliding_window | Titan embedding limits |
Google Vertex | google_vertex_gecko |
600 | 1,000,000 | sliding_window | Gecko model limits |
Local Providers | local_providers |
1,000 | N/A | sliding_window | Conservative local limits |
Best Practices for Rate Limiting Configuration
For Production Workloads:
- Use
sliding_window
strategy for accuracy - Set rates to 80% of your actual limits
- Enable burst handling for peak loads
- Monitor rate limit statistics regularly
For Development:
- Use
conservative
preset for safety - Enable detailed logging for debugging
- Test with small document sets first
For High-Volume Processing:
- Use
adaptive
strategy for auto-tuning - Configure multiple workers with shared rate limits
- Monitor response times and adjust accordingly
Cost Optimization:
- Set token limits to control embedding costs
- Use local providers for development
- Batch documents efficiently to minimize API calls
Advanced Configuration
This section covers advanced configuration options for power users who need fine-grained control over nBedR's behavior, performance tuning, and enterprise deployment scenarios.
Advanced Configuration Topics
- Rate Limiting Configuration
- Parallel Processing and Multi-Instance Deployment
- Detailed Embedding Provider Configurations
- Advanced Vector Database Configurations
- Advanced Chunking Strategies
Rate Limiting Configuration
Rate Limiting Configuration Examples
Conservative Setup (Safe for Testing):
RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=sliding_window
RATE_LIMIT_REQUESTS_PER_MINUTE=60
RATE_LIMIT_TOKENS_PER_MINUTE=50000
RATE_LIMIT_MAX_BURST=10
Production Setup (OpenAI Tier 1):
RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=sliding_window
RATE_LIMIT_REQUESTS_PER_MINUTE=400 # 80% of 500 limit
RATE_LIMIT_TOKENS_PER_MINUTE=280000 # 80% of 350k limit
RATE_LIMIT_MAX_BURST=80
RATE_LIMIT_TARGET_RESPONSE_TIME=2.0
High-Volume Setup (Adaptive):
RATE_LIMIT_ENABLED=true
RATE_LIMIT_STRATEGY=adaptive
RATE_LIMIT_REQUESTS_PER_MINUTE=1000
RATE_LIMIT_TOKENS_PER_MINUTE=800000
RATE_LIMIT_TARGET_RESPONSE_TIME=1.5
RATE_LIMIT_MAX_RESPONSE_TIME=5.0
Monitoring Rate Limiting
nBedR provides detailed rate limiting statistics accessible through the application:
- Total Requests: Number of API calls made
- Rate Limit Hits: How many times rate limiting was applied
- Average Response Time: Performance monitoring
- Current Rate: Real-time rate limiting status
- Wait Time: Total time spent waiting due to rate limits
Use these metrics to optimize your rate limiting configuration for your specific workload and API tier.
Parallel Processing and Multi-Instance Deployment
nBedR supports running multiple instances in parallel to dramatically speed up document processing for large datasets. The application includes sophisticated coordination mechanisms to prevent conflicts and ensure safe concurrent operation.
Why Run Multiple Instances?
When processing thousands of documents, a single instance can become a bottleneck. Multiple instances provide:
- Faster Processing: Parallel document processing across multiple CPU cores
- Higher Throughput: Multiple embedding API calls running simultaneously
- Fault Tolerance: If one instance fails, others continue processing
- Resource Utilization: Better utilization of available CPU, memory, and network bandwidth
Instance Coordination System
nBedR automatically coordinates multiple instances to prevent conflicts:
Conflict Detection:
- Detects when multiple instances would write to the same output paths
- Prevents concurrent access to the same vector database files
- Validates configuration compatibility between instances
Automatic Path Separation:
- Generates instance-specific output directories
- Creates separate vector database paths for each instance
- Ensures no file conflicts between concurrent instances
Resource Coordination:
- Distributes rate limits fairly across all running instances
- Coordinates API quota usage to prevent rate limit violations
- Shares performance metrics for optimal load balancing
Running Multiple Instances
Basic Parallel Deployment:
# Terminal 1 - Instance 1
nbedr create-embeddings --datapath ./docs1 --output ./output1
# Terminal 2 - Instance 2
nbedr create-embeddings --datapath ./docs2 --output ./output2
# Terminal 3 - Instance 3
nbedr create-embeddings --datapath ./docs3 --output ./output3
Shared Dataset Processing:
# All instances process the same dataset with automatic coordination
# Instance paths are automatically separated
# Terminal 1
nbedr create-embeddings --datapath ./large_dataset
# Terminal 2
nbedr create-embeddings --datapath ./large_dataset
# Terminal 3
nbedr create-embeddings --datapath ./large_dataset
Custom Instance Configuration:
# Disable coordination for specific use cases
nbedr create-embeddings --disable-coordination --datapath ./docs
# List all active instances
nbedr create-embeddings --list-instances
# Use specific instance ID
nbedr create-embeddings --instance-id my-custom-instance --datapath ./docs
Instance Management
Monitor Active Instances:
# List all running instances
nbedr create-embeddings --list-instances
Environment Variables for Coordination:
# Disable coordination system
NBEDR_DISABLE_COORDINATION=true
# Custom coordination directory
NBEDR_COORDINATION_DIR=/tmp/nbedr_coordination
# Instance heartbeat interval (seconds)
NBEDR_HEARTBEAT_INTERVAL=60
Rate Limiting with Multiple Instances
When multiple instances run simultaneously, rate limits are automatically distributed:
Single Instance:
- 500 requests per minute → 500 RPM for the instance
Three Instances:
- 500 requests per minute → 166 RPM per instance (500/3)
- Prevents collective rate limit violations
- Ensures fair resource distribution
Manual Rate Limit Override:
# Set per-instance rate limits manually
RATE_LIMIT_REQUESTS_PER_MINUTE=100
RATE_LIMIT_TOKENS_PER_MINUTE=50000
Best Practices for Parallel Processing
Data Organization:
- Split large datasets into balanced chunks for each instance
- Use different source directories to avoid file locking conflicts
- Consider document types and sizes when distributing work
Resource Planning:
- Monitor CPU usage - optimal is typically 2-4 instances per CPU core
- Watch memory consumption - each instance loads its own models
- Consider network bandwidth for API-heavy operations
Error Handling:
- Each instance fails independently without affecting others
- Use consistent configuration across all instances
- Monitor logs from all instances for comprehensive debugging
Production Deployment:
# Use process managers like systemd or supervisor
systemctl start nbedr-instance-1
systemctl start nbedr-instance-2
systemctl start nbedr-instance-3
# Or container orchestration
docker run -d nbedr:latest --datapath /data/batch1
docker run -d nbedr:latest --datapath /data/batch2
docker run -d nbedr:latest --datapath /data/batch3
Troubleshooting Parallel Execution
Common Issues:
-
Path Conflicts
# Error: Multiple instances writing to same path # Solution: Use automatic coordination or specify different paths
-
Rate Limit Violations
# Error: Combined instances exceed API limits # Solution: Reduce per-instance rate limits or number of instances
-
Vector Database Locks
# Error: FAISS index file locked # Solution: Ensure each instance uses separate index paths
Debugging Commands:
# Check active instances
nbedr create-embeddings --list-instances
# View coordination logs
tail -f /tmp/nbedr_coordination/coordination.log
# Test configuration without running
nbedr create-embeddings --validate --datapath ./docs
Detailed Embedding Provider Configurations
For comprehensive configuration options for all 7 embedding providers, see the Embedding Providers section below.
Advanced Vector Database Configurations
For detailed vector database configuration options and selection guidance, see the Vector Databases section below.
Advanced Chunking Strategies
For detailed chunking configuration and optimization, see the Understanding Chunking section below.
Architecture
graph TD
A[Document Sources] --> B[Document Service]
B --> C[Text Chunking]
C --> D[Embedding Generation]
D --> E[Vector Database]
A1[Local Files] --> A
A2[S3 Bucket] --> A
A3[SharePoint] --> A
E1[FAISS] --> E
E2[Pinecone] --> E
E3[ChromaDB] --> E
E4[Azure AI Search] --> E
E5[AWS Elasticsearch] --> E
E6[PGVector] --> E
F[Configuration] --> B
G[Rate Limiter] --> D
Using Your Embedding Database
Once you've created your embedding database with NBEDR, you can integrate it into RAG applications and chatbots. Here's how to query and utilize your embeddings effectively.
🔍 RAG Query Flow
graph LR
A[User Question] --> B[Generate Query Embedding]
B --> C[Search Vector Database]
C --> D[Retrieve Similar Chunks]
D --> E[Add to LLM Context]
E --> F[Generate Answer]
F --> G[Return Response]
🚀 Direct Search with NBEDR CLI
# Search your embedding database
python nbedr.py search \
--query "How do I configure SSL certificates?" \
--vector-db faiss \
--index-path ./embeddings_db \
--top-k 5
# Advanced search with filters
python nbedr.py search \
--query "database optimization techniques" \
--vector-db pgvector \
--filters '{"source": "technical-docs"}' \
--top-k 10
💻 Programmatic Integration Examples
Simple RAG Pipeline
from core.vector_stores import FAISSVectorStore
from core.clients import create_provider_from_config
from core.config import get_config
# Load configuration and initialize components
config = get_config()
embedding_provider = create_provider_from_config(config)
vector_store = FAISSVectorStore({'faiss_index_path': './embeddings_db'})
async def answer_question(question: str) -> str:
# 1. Generate embedding for the question
result = await embedding_provider.generate_embeddings([question])
query_embedding = result.embeddings[0]
# 2. Search for similar documents
search_results = await vector_store.search(
query_embedding=query_embedding,
top_k=5
)
# 3. Combine context for LLM
context = "\n\n".join([result.content for result in search_results])
# 4. Generate answer with your LLM
prompt = f"Context:\n{context}\n\nQuestion: {question}\nAnswer:"
# (Add your LLM call here)
return answer
Chatbot Integration
import asyncio
from typing import List, Dict
class RAGChatbot:
def __init__(self, vector_store, embedding_provider):
self.vector_store = vector_store
self.embedding_provider = embedding_provider
self.conversation_history = []
async def chat(self, message: str) -> str:
# Generate embedding for user message
embedding_result = await self.embedding_provider.generate_embeddings([message])
query_embedding = embedding_result.embeddings[0]
# Search for relevant context
search_results = await self.vector_store.search(
query_embedding=query_embedding,
top_k=3
)
# Build context with conversation history
context_chunks = [f"Document: {r.content}" for r in search_results]
recent_history = self.conversation_history[-4:] # Last 2 exchanges
# Combine for LLM prompt
context = "\n".join(context_chunks)
history = "\n".join([f"{h['role']}: {h['content']}" for h in recent_history])
# Generate response (add your LLM integration)
response = await self.generate_llm_response(context, history, message)
# Update conversation history
self.conversation_history.extend([
{"role": "user", "content": message},
{"role": "assistant", "content": response}
])
return response
Batch Document Processing
async def process_user_queries(queries: List[str]) -> List[Dict]:
"""Process multiple queries efficiently"""
# Generate embeddings for all queries at once
embedding_result = await embedding_provider.generate_embeddings(queries)
results = []
for i, query_embedding in enumerate(embedding_result.embeddings):
# Search for each query
search_results = await vector_store.search(
query_embedding=query_embedding,
top_k=5
)
results.append({
'query': queries[i],
'matches': [
{
'content': r.content,
'source': r.source,
'similarity': r.similarity_score
}
for r in search_results
]
})
return results
🎯 Database-Specific Usage
FAISS (Local)
# Load and search FAISS index
from core.vector_stores import FAISSVectorStore
store = FAISSVectorStore({'faiss_index_path': './my_embeddings'})
await store.initialize()
results = await store.search(query_embedding, top_k=10)
Pinecone (Cloud)
# Search Pinecone index
from core.vector_stores import PineconeVectorStore
store = PineconeVectorStore({
'pinecone_api_key': 'your-key',
'pinecone_environment': 'your-env',
'pinecone_index_name': 'rag-embeddings'
})
results = await store.search(
query_embedding=query_embedding,
top_k=5,
filters={'source': 'documentation'}
)
PGVector (SQL)
# Combine vector search with SQL queries
from core.vector_stores import PGVectorStore
store = PGVectorStore({
'pgvector_host': 'localhost',
'pgvector_database': 'vectordb',
'pgvector_user': 'postgres',
'pgvector_password': 'password'
})
# Search with metadata filters
results = await store.search(
query_embedding=query_embedding,
top_k=10,
filters={'metadata.document_type': 'manual'}
)
🔧 Advanced Usage Patterns
Hybrid Search (Keyword + Semantic)
async def hybrid_search(query: str, keywords: List[str]) -> List[Dict]:
# Semantic search
embedding_result = await embedding_provider.generate_embeddings([query])
semantic_results = await vector_store.search(
query_embedding=embedding_result.embeddings[0],
top_k=20
)
# Keyword filtering
keyword_filtered = [
r for r in semantic_results
if any(keyword.lower() in r.content.lower() for keyword in keywords)
]
return keyword_filtered[:10]
Contextual Chunk Assembly
async def get_expanded_context(query: str, expand_chunks: int = 2) -> str:
# Find relevant chunks
embedding_result = await embedding_provider.generate_embeddings([query])
results = await vector_store.search(
query_embedding=embedding_result.embeddings[0],
top_k=5
)
# Group by source and expand context
context_blocks = []
for result in results:
# Get neighboring chunks for better context
expanded_context = await get_neighboring_chunks(
result.source,
result.id,
expand_chunks
)
context_blocks.append(expanded_context)
return "\n\n---\n\n".join(context_blocks)
📊 Performance Optimization
Embedding Caching
from functools import lru_cache
import hashlib
class CachedEmbeddingProvider:
def __init__(self, provider):
self.provider = provider
self.cache = {}
async def generate_embeddings(self, texts: List[str]):
# Hash texts for cache key
cache_key = hashlib.md5('|'.join(texts).encode()).hexdigest()
if cache_key in self.cache:
return self.cache[cache_key]
result = await self.provider.generate_embeddings(texts)
self.cache[cache_key] = result
return result
🎨 Integration with Popular Frameworks
LangChain Integration
from langchain.embeddings.base import Embeddings
from langchain.vectorstores import VectorStore
class NBEDREmbeddings(Embeddings):
def __init__(self, provider):
self.provider = provider
def embed_documents(self, texts: List[str]) -> List[List[float]]:
result = asyncio.run(self.provider.generate_embeddings(texts))
return result.embeddings
def embed_query(self, text: str) -> List[float]:
result = asyncio.run(self.provider.generate_embeddings([text]))
return result.embeddings[0]
Quick Start
Installation
# Clone the repository
git clone https://github.com/your-org/nbedr.git
cd nbedr
# Install dependencies
pip install -r requirements.txt
# Or install with optional dependencies
pip install -e .[cloud,dev]
Basic Usage
- Create embeddings from local documents:
python nbedr.py create-embeddings \
--source local \
--source-path ./documents \
--vector-db faiss \
--output-path ./embeddings_db
- Process documents from S3:
python nbedr.py create-embeddings \
--source s3 \
--source-path s3://my-bucket/documents/ \
--vector-db pinecone \
--pinecone-index my-index
- Use Azure AI Search:
python nbedr.py create-embeddings \
--source local \
--source-path ./documents \
--vector-db azure_ai_search \
--azure-search-service your-service-name \
--azure-search-index rag-embeddings
- Use AWS Elasticsearch:
python nbedr.py create-embeddings \
--source local \
--source-path ./documents \
--vector-db aws_elasticsearch \
--aws-elasticsearch-endpoint https://your-domain.region.es.amazonaws.com
- Use PGVector:
python nbedr.py create-embeddings \
--source local \
--source-path ./documents \
--vector-db pgvector \
--pgvector-host localhost \
--pgvector-database vectordb
- Search for similar documents:
python nbedr.py search \
--query "machine learning algorithms" \
--vector-db faiss \
--index-path ./embeddings_db \
--top-k 5
Configuration
Create a .env
file with your configuration:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_DIMENSIONS=1536
# Vector Database Configuration
VECTOR_DB_TYPE=faiss
FAISS_INDEX_PATH=./embeddings_db
# Pinecone Configuration (if using Pinecone)
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_environment
PINECONE_INDEX_NAME=your_index_name
# ChromaDB Configuration (if using ChromaDB)
CHROMA_HOST=localhost
CHROMA_PORT=8000
# Azure AI Search Configuration (if using Azure AI Search)
AZURE_SEARCH_SERVICE_NAME=your_search_service
AZURE_SEARCH_API_KEY=your_api_key
AZURE_SEARCH_INDEX_NAME=rag-embeddings
# AWS Elasticsearch Configuration (if using AWS Elasticsearch)
AWS_ELASTICSEARCH_ENDPOINT=https://your-domain.region.es.amazonaws.com
AWS_ELASTICSEARCH_REGION=us-east-1
AWS_ELASTICSEARCH_INDEX_NAME=rag-embeddings
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
# PGVector Configuration (if using PGVector)
PGVECTOR_HOST=localhost
PGVECTOR_PORT=5432
PGVECTOR_DATABASE=vectordb
PGVECTOR_USER=postgres
PGVECTOR_PASSWORD=your_postgres_password
PGVECTOR_TABLE_NAME=rag_embeddings
# Processing Configuration
CHUNK_SIZE=512
CHUNKING_STRATEGY=semantic
BATCH_SIZE=10
MAX_WORKERS=4
Advanced Usage
Custom Chunking Strategies
# Semantic chunking (uses embeddings to determine boundaries)
python nbedr.py create-embeddings \
--chunking-strategy semantic \
--chunk-size 512 \
--source local \
--source-path ./docs
# Fixed-size chunking
python nbedr.py create-embeddings \
--chunking-strategy fixed \
--chunk-size 1000 \
--chunk-overlap 100 \
--source local \
--source-path ./docs
Batch Processing
# Process large document collections
python nbedr.py create-embeddings \
--source s3 \
--source-path s3://large-corpus/ \
--batch-size 50 \
--max-workers 8 \
--rate-limit-requests 100 \
--rate-limit-period 60
Preview Mode
# Preview what will be processed without actually doing it
python nbedr.py create-embeddings \
--source local \
--source-path ./documents \
--preview
API Usage
from core.services.document_service import DocumentService
from core.config import EmbeddingConfig
# Initialize configuration
config = EmbeddingConfig()
# Create document service
service = DocumentService(config)
# Process documents
results = await service.process_documents(
source_path="./documents",
source_type="local"
)
print(f"Processed {len(results.chunks)} chunks")
print(f"Generated {len(results.embeddings)} embeddings")
Understanding Chunking: The Art of Breaking Down Documents
Why Chunking Matters
Think of chunking like cutting a pizza into slices - you need pieces that are the right size to be useful. If the slices are too big, they're hard to handle and contain too much mixed information. If they're too small, you lose important context and meaning.
When processing documents for AI, chunking determines how well your AI system can find and use relevant information. Good chunking means better, more accurate AI responses.
Chunking Strategies Explained
🎯 Semantic Chunking (Recommended for Most Use Cases)
What it does: Uses AI to understand the meaning and flow of your text, then creates natural breakpoints where topics change.
Think of it like: A smart editor who reads your document and says "this paragraph is about marketing, but this next section switches to finance" and makes a cut there.
Best for:
- Mixed content (reports, manuals, articles)
- Documents with varying topic sections
- When you want the highest quality results
Configuration:
CHUNKING_STRATEGY=semantic
CHUNK_SIZE=512
📏 Fixed-Size Chunking (Most Predictable)
What it does: Creates chunks of exactly the same size, like cutting a rope into equal lengths.
Think of it like: Using a ruler to mark off exact measurements - every piece is the same size.
Best for:
- Consistent document types (legal docs, technical manuals)
- When you need predictable processing times
- Large volumes of similar content
Configuration:
CHUNKING_STRATEGY=fixed
CHUNK_SIZE=1000
CHUNK_OVERLAP=100
📝 Sentence-Aware Chunking (Natural Boundaries)
What it does: Breaks text at sentence endings, keeping complete thoughts together.
Think of it like: A careful reader who never cuts off someone mid-sentence.
Best for:
- Narrative content (stories, case studies)
- Interview transcripts
- Conversational content
Configuration:
CHUNKING_STRATEGY=sentence
CHUNK_SIZE=500
Chunk Size and Overlap: Finding the Sweet Spot
Chunk Size Guidelines
Content Type | Recommended Size | Why |
---|---|---|
Technical Documentation | 800-1200 tokens | Complex concepts need more context |
Marketing Content | 400-600 tokens | Concise, focused messages |
Legal Documents | 1000-1500 tokens | Detailed context is crucial |
News Articles | 300-500 tokens | Quick, digestible information |
Academic Papers | 600-1000 tokens | Balance between detail and focus |
Token Rule of Thumb: 1 token ≈ 0.75 words in English, so 500 tokens ≈ 375 words
Overlap: The Safety Net
What is overlap?: When chunks share some content at their boundaries, like overlapping roof tiles.
Why use overlap?:
- Prevents Context Loss: Important information spanning chunk boundaries isn't lost
- Improves Search: Better chance of finding relevant information
- Maintains Meaning: Keeps related concepts together
Overlap Guidelines:
- Standard: 10-20% of chunk size
- High Precision Needed: 20-30% overlap
- Performance Focused: 5-10% overlap
# Example: 1000 token chunks with 20% overlap
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
Configuration Impact Guide
Chunk Size Impact
Larger Chunks (1000+ tokens):
- ✅ Pros: More context, better for complex topics, fewer total chunks
- ❌ Cons: Less precise retrieval, higher costs, slower processing
Smaller Chunks (300-500 tokens):
- ✅ Pros: More precise retrieval, faster processing, lower costs
- ❌ Cons: May lose context, more chunks to manage
Overlap Impact
High Overlap (25%+):
- ✅ Pros: Better information preservation, improved search accuracy
- ❌ Cons: More storage needed, increased processing time
Low Overlap (5-10%):
- ✅ Pros: Efficient storage, faster processing
- ❌ Cons: Risk of losing information at boundaries
Recommended Configurations by Use Case
Customer Support Knowledge Base
CHUNKING_STRATEGY=semantic
CHUNK_SIZE=600
CHUNK_OVERLAP=120
Why: Balances quick answers with sufficient context
Legal Document Analysis
CHUNKING_STRATEGY=fixed
CHUNK_SIZE=1200
CHUNK_OVERLAP=300
Why: Maintains legal context integrity with high overlap
Product Documentation
CHUNKING_STRATEGY=semantic
CHUNK_SIZE=800
CHUNK_OVERLAP=160
Why: Keeps procedures and concepts together
News and Media Content
CHUNKING_STRATEGY=sentence
CHUNK_SIZE=400
CHUNK_OVERLAP=80
Why: Preserves story flow and readability
Performance Considerations
Cost Optimization:
- Smaller chunks = Lower embedding costs
- Less overlap = Lower storage costs
- Batch processing = Better rate limits
Quality Optimization:
- Semantic chunking = Best understanding
- Higher overlap = Better information retention
- Larger chunks = More context for complex topics
Speed Optimization:
- Fixed chunking = Fastest processing
- Smaller chunks = Faster search
- Lower overlap = Less processing time
Configuration Options
Embedding Providers: Choose Your AI Platform
NBEDR supports 7 different embedding providers, from major cloud platforms to local solutions. This gives you complete flexibility to choose the right solution for your needs, budget, and privacy requirements.
🌟 Provider Overview
Provider | Type | Best For | Cost | Privacy | Setup |
---|---|---|---|---|---|
OpenAI | Cloud | Production, quality | Pay-per-use | Shared | Easy |
Azure OpenAI | Cloud | Enterprise, compliance | Pay-per-use | Enterprise | Medium |
AWS Bedrock | Cloud | AWS ecosystem | Pay-per-use | Enterprise | Medium |
Google Vertex AI | Cloud | Google ecosystem | Pay-per-use | Enterprise | Medium |
LMStudio | Local | Development, testing | Free | Complete | Easy |
Ollama | Local | Privacy, offline use | Free | Complete | Easy |
Llama.cpp | Local | Custom models, research | Free | Complete | Hard |
🚀 Quick Start by Provider
OpenAI (Recommended for Most Users)
# Set your API key
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=your_api_key_here
export EMBEDDING_MODEL=text-embedding-3-small
# Run embedding generation
python nbedr.py create-embeddings --source local --source-path ./documents
Azure OpenAI (Enterprise)
# Configure Azure OpenAI
export EMBEDDING_PROVIDER=azure_openai
export AZURE_OPENAI_API_KEY=your_api_key
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
export AZURE_OPENAI_DEPLOYMENT_NAME=your-embedding-deployment
python nbedr.py create-embeddings --source local --source-path ./documents
Ollama (Local & Free)
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
# Pull an embedding model
ollama pull nomic-embed-text
# Configure NBEDR
export EMBEDDING_PROVIDER=ollama
export EMBEDDING_MODEL=nomic-embed-text
python nbedr.py create-embeddings --source local --source-path ./documents
📋 Complete Configuration Guide
OpenAI Configuration
# Provider selection
EMBEDDING_PROVIDER=openai
# Authentication
OPENAI_API_KEY=your_api_key_here
OPENAI_ORGANIZATION=your_org_id # Optional
# Model settings
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
# Performance
OPENAI_TIMEOUT=60
OPENAI_MAX_RETRIES=3
EMBEDDING_BATCH_SIZE=100
Available Models:
text-embedding-3-large
(3072 dims) - Highest quality, $0.00013/1K tokenstext-embedding-3-small
(1536 dims) - Best balance, $0.00002/1K tokenstext-embedding-ada-002
(1536 dims) - Legacy, $0.0001/1K tokens
Azure OpenAI Configuration
# Provider selection
EMBEDDING_PROVIDER=azure_openai
# Authentication
AZURE_OPENAI_API_KEY=your_api_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-01
# Deployment mapping
AZURE_OPENAI_DEPLOYMENT_NAME=your-embedding-deployment
# For multiple models (JSON format):
AZURE_OPENAI_DEPLOYMENT_MAPPING={"text-embedding-3-small": "embedding-small", "text-embedding-3-large": "embedding-large"}
# Model settings
EMBEDDING_MODEL=text-embedding-3-small
AWS Bedrock Configuration
# Provider selection
EMBEDDING_PROVIDER=aws_bedrock
# AWS credentials (or use IAM roles)
AWS_BEDROCK_REGION=us-east-1
AWS_BEDROCK_ACCESS_KEY_ID=your_access_key
AWS_BEDROCK_SECRET_ACCESS_KEY=your_secret_key
# Model settings
EMBEDDING_MODEL=amazon.titan-embed-text-v1
Available Models:
amazon.titan-embed-text-v1
(1536 dims) - Amazon's embedding modelamazon.titan-embed-text-v2:0
(1024 dims) - Latest Amazon modelcohere.embed-english-v3
(1024 dims) - Cohere English embeddingscohere.embed-multilingual-v3
(1024 dims) - Cohere multilingual
Google Vertex AI Configuration
# Provider selection
EMBEDDING_PROVIDER=google_vertex
# Google Cloud settings
GOOGLE_VERTEX_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Model settings
EMBEDDING_MODEL=textembedding-gecko@003
Available Models:
textembedding-gecko@003
(768 dims) - Latest Gecko modeltextembedding-gecko@002
(768 dims) - Previous versiontext-embedding-004
(768 dims) - Latest general modeltext-multilingual-embedding-002
(768 dims) - Multilingual support
LMStudio Configuration (Local)
# Provider selection
EMBEDDING_PROVIDER=lmstudio
# Server settings
LMSTUDIO_BASE_URL=http://localhost:1234
LMSTUDIO_API_KEY=optional_api_key # If you set one
# Model settings (use whatever model you loaded in LMStudio)
EMBEDDING_MODEL=your-loaded-model
Setup Steps:
- Download and install LMStudio
- Download an embedding model (like
nomic-ai/nomic-embed-text-v1.5-GGUF
) - Load the model and start the local server
- Configure NBEDR with the settings above
Ollama Configuration (Local)
# Provider selection
EMBEDDING_PROVIDER=ollama
# Server settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_TIMEOUT=120
# Model settings
EMBEDDING_MODEL=nomic-embed-text
Setup Steps:
- Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh
- Start Ollama:
ollama serve
- Pull an embedding model:
ollama pull nomic-embed-text
- Configure NBEDR with the settings above
Popular Embedding Models:
nomic-embed-text
(768 dims) - High-quality English embeddingsmxbai-embed-large
(1024 dims) - Large general-purpose modelsnowflake-arctic-embed
(1024 dims) - Snowflake's modelall-minilm
(384 dims) - Lightweight multilingual
Llama.cpp Configuration (Local)
# Provider selection
EMBEDDING_PROVIDER=llamacpp
# Server settings
LLAMACPP_BASE_URL=http://localhost:8000
LLAMACPP_MODEL_NAME=your-model-name
LLAMACPP_DIMENSIONS=4096 # Set based on your model
# Authentication (if needed)
LLAMACPP_API_KEY=optional_api_key
Setup Steps:
- Install llama-cpp-python:
pip install llama-cpp-python[server]
- Download a GGUF embedding model
- Start the server:
python -m llama_cpp.server --model path/to/model.gguf --embedding
- Configure NBEDR with the settings above
🎯 Provider Selection Guide
Choose OpenAI when:
- You want the highest quality embeddings
- Cost is not the primary concern
- You need reliable, proven performance
- You're building a production application
Choose Azure OpenAI when:
- You're in an enterprise environment
- You need compliance guarantees (SOC 2, HIPAA)
- You're already using Azure services
- You need dedicated capacity and SLAs
Choose AWS Bedrock when:
- You're already using AWS services
- You want access to multiple model providers
- You need enterprise-grade security
- You prefer AWS pricing models
Choose Google Vertex AI when:
- You're using Google Cloud Platform
- You need integration with other Google AI services
- You want access to Google's latest models
- You're building multilingual applications
Choose LMStudio when:
- You're developing and testing locally
- You want an easy GUI for model management
- You need to experiment with different models
- You want local processing without complexity
Choose Ollama when:
- Privacy is paramount (data never leaves your machine)
- You want completely free operation
- You need offline capabilities
- You're comfortable with command-line tools
Choose Llama.cpp when:
- You need maximum control and customization
- You're doing research or advanced development
- You want to use custom or fine-tuned models
- Performance optimization is critical
💰 Cost Comparison
Provider | Cost Model | Example Cost (1M tokens) |
---|---|---|
OpenAI | Pay-per-token | $20-130 depending on model |
Azure OpenAI | Pay-per-token | Similar to OpenAI |
AWS Bedrock | Pay-per-token | $10-100 depending on model |
Google Vertex | Pay-per-token | $25-200 depending on model |
LMStudio | Free | $0 |
Ollama | Free | $0 |
Llama.cpp | Free | $0 |
🔒 Privacy & Security
Cloud Providers (OpenAI, Azure, AWS, Google)
- Data is sent to external servers
- Subject to provider's privacy policies
- Enterprise options available with enhanced security
- Data retention policies vary by provider
Local Providers (LMStudio, Ollama, Llama.cpp)
- Data never leaves your machine
- Complete privacy and control
- No internet required for processing
- Ideal for sensitive or proprietary content
🚀 Performance Characteristics
Provider | Latency | Throughput | Reliability |
---|---|---|---|
OpenAI | Low | High | Very High |
Azure OpenAI | Low | High | Very High |
AWS Bedrock | Medium | Medium | High |
Google Vertex | Low | High | High |
LMStudio | Very Low | Medium | Medium |
Ollama | Very Low | Medium | Medium |
Llama.cpp | Very Low | Variable | Medium |
📝 Customizing Embedding Prompts
NBEDR allows you to customize the prompts used for generating embeddings to improve quality and relevance for your specific domain and use case.
Quick Start
-
Use Default Template: NBEDR includes a default embedding prompt template at
templates/embedding_prompt_template.txt
-
Set Custom Template Path:
export EMBEDDING_PROMPT_TEMPLATE="templates/my_custom_template.txt"
-
Or Configure in Environment:
EMBEDDING_PROMPT_TEMPLATE=/path/to/your/custom_template.txt
Creating Custom Prompt Templates
Example Medical Domain Template (templates/medical_template.txt
):
Generate embeddings for medical literature that capture clinical concepts effectively.
Focus on:
- Medical terminology and procedures: {content}
- Drug names, dosages, and interactions
- Symptoms, diagnoses, and treatment protocols
- Clinical outcomes and research findings
Document Type: {document_type}
Content: {content}
Metadata: {metadata}
Ensure embeddings enable accurate retrieval for medical information systems.
Example Legal Domain Template (templates/legal_template.txt
):
Generate embeddings for legal documents optimized for legal research and analysis.
Focus on:
- Legal terminology and concepts
- Case citations and precedents: {content}
- Statutory references and regulations
- Contractual terms and legal obligations
Document Type: {document_type}
Chunk: {chunk_index} of document
Content: {content}
Prioritize legal concepts and relationships for accurate legal document retrieval.
Available Template Variables
Use these variables in your custom templates:
{content}
: The document content to be embedded{document_type}
: File type (pdf, txt, json, pptx, etc.){metadata}
: Additional document metadata (file size, source, etc.){chunk_index}
: Index of the current chunk within the document{chunking_strategy}
: The chunking method used (semantic, fixed, sentence)
Custom Variables
Add your own variables using the EMBEDDING_CUSTOM_PROMPT_VARIABLES
environment variable:
export EMBEDDING_CUSTOM_PROMPT_VARIABLES='{"domain": "healthcare", "use_case": "clinical_research"}'
Then use them in your template:
Generate embeddings for {domain} content optimized for {use_case}.
Content: {content}
Configuration Examples
Using Environment Variables:
# Set custom template
export EMBEDDING_PROMPT_TEMPLATE="templates/technical_docs_template.txt"
# Add custom variables
export EMBEDDING_CUSTOM_PROMPT_VARIABLES='{"company": "TechCorp", "product": "API"}'
# Run with custom prompts
python nbedr.py create-embeddings --datapath ./docs --doctype pdf
Using CLI Arguments:
python nbedr.py create-embeddings \
--datapath ./documents \
--doctype pdf \
--embedding-prompt-template templates/my_template.txt
Template Best Practices
- Be Domain-Specific: Include terminology and concepts specific to your field
- Provide Context: Explain the intended use case for the embeddings
- Keep It Focused: Avoid overly long prompts that might confuse the model
- Test and Iterate: Experiment with different prompts and measure embedding quality
- Use Variables: Leverage template variables for dynamic content insertion
Template Examples by Domain
See the templates/
directory for example templates:
embedding_prompt_template.txt
- Default general-purpose templatetemplates/README.md
- Complete template documentation with examples
Quick Domain Templates:
Technical Documentation:
export EMBEDDING_PROMPT_TEMPLATE="templates/tech_docs_template.txt"
Academic Research:
export EMBEDDING_PROMPT_TEMPLATE="templates/academic_template.txt"
Business Content:
export EMBEDDING_PROMPT_TEMPLATE="templates/business_template.txt"
Vector Databases
FAISS (Facebook AI Similarity Search)
- Best for: Local development, high-performance searches, full control
- Pros: Free, very fast, runs locally
- Cons: Requires technical setup, no cloud features
Pinecone
- Best for: Production applications, scaling, managed service
- Pros: Fully managed, excellent performance, built-in scaling
- Cons: Cost increases with usage
ChromaDB
- Best for: Open-source preference, flexibility, development
- Pros: Open source, good documentation, easy to extend
- Cons: Requires more setup than managed services
Azure AI Search (Enterprise Search Platform)
- Best for: Enterprise applications, Microsoft ecosystem, hybrid search
- Pros:
- Enterprise-grade: Built for large-scale enterprise applications
- Hybrid Search: Combines keyword search, semantic search, and vector search
- Rich Filtering: Advanced filtering, faceting, and aggregation capabilities
- Security & Compliance: Enterprise security, compliance certifications (SOC 2, HIPAA)
- Multi-modal: Supports text, images, and structured data
- Built-in AI: Integrated with Azure Cognitive Services for text analysis
- High Availability: 99.9% SLA with automatic failover
- Cons:
- Cost: Can be expensive for large-scale deployments
- Microsoft Lock-in: Best when already using Azure ecosystem
- Complexity: More complex setup compared to simple vector databases
- Learning Curve: Requires understanding of Azure services
AWS Elasticsearch (Amazon OpenSearch Service)
- Best for: AWS ecosystem, complex analytics, multi-purpose search
- Pros:
- AWS Integration: Seamless integration with other AWS services
- Mature Platform: Built on proven Elasticsearch technology
- Analytics Capabilities: Advanced analytics, visualizations with Kibana
- Flexible Deployment: Multiple instance types and configurations
- Cost-Effective Scaling: Pay-as-you-scale model
- Multi-tenancy: Support for multiple applications/indices
- Real-time Processing: Near real-time indexing and search
- Cons:
- AWS Lock-in: Vendor lock-in to AWS ecosystem
- Operational Overhead: Requires monitoring and maintenance
- Cost Complexity: Pricing can be complex with multiple factors
- Version Lag: May not have latest Elasticsearch features immediately
PGVector (PostgreSQL with pgvector extension)
- Best for: PostgreSQL shops, relational data integration, cost-conscious deployments
- Pros:
- Familiar Technology: Built on PostgreSQL, widely known and trusted
- ACID Compliance: Full transactional support and data consistency
- Cost-Effective: Use existing PostgreSQL infrastructure
- Rich Querying: Combine vector search with SQL joins and filters
- Self-Hosted: Complete control over data and infrastructure
- Active Development: Growing ecosystem and community support
- Backup & Recovery: Leverage PostgreSQL's robust backup solutions
- Cons:
- Performance Limitations: May not match specialized vector databases at scale
- Manual Setup: Requires PostgreSQL and pgvector extension installation
- Operational Overhead: Need to manage PostgreSQL maintenance and tuning
- Limited Tooling: Less specialized tooling compared to purpose-built vector DBs
Choosing the Right Vector Database
Decision Matrix
Factor | FAISS | Pinecone | ChromaDB | Azure AI Search | AWS Elasticsearch | PGVector |
---|---|---|---|---|---|---|
Setup Complexity | High | Low | Medium | High | Medium | Medium |
Cost | Free | Pay-per-use | Free | High | Variable | Low |
Performance | Excellent | Excellent | Good | Very Good | Good | Good |
Scalability | Manual | Automatic | Manual | Automatic | Semi-automatic | Manual |
Enterprise Features | None | Some | None | Extensive | Extensive | Some |
Multi-modal Support | No | Limited | No | Yes | Limited | No |
Analytics | No | Limited | No | Yes | Excellent | Limited |
Use Case Recommendations
Choose FAISS when:
- Building a prototype or research project
- Need maximum performance and control
- Have technical expertise for setup and maintenance
- Budget is limited
Choose Pinecone when:
- Want a simple, managed vector database
- Need to get to market quickly
- Prefer specialized vector search capabilities
- Have predictable usage patterns
Choose ChromaDB when:
- Prefer open-source solutions
- Need customization flexibility
- Building internal tools
- Want to avoid vendor lock-in
Choose Azure AI Search when:
- Already using Microsoft/Azure ecosystem
- Need enterprise-grade security and compliance
- Require hybrid search (keyword + semantic + vector)
- Building customer-facing applications
- Need rich filtering and faceting capabilities
- Have complex data types (text, images, structured data)
Choose AWS Elasticsearch when:
- Already using AWS ecosystem
- Need comprehensive analytics and dashboarding
- Have diverse data sources and types
- Require complex aggregations and reporting
- Want mature, battle-tested search technology
- Need multi-tenancy support
Choose PGVector when:
- Already using PostgreSQL as primary database
- Need to combine vector search with relational data
- Want cost-effective solution with existing infrastructure
- Require ACID compliance and transactional consistency
- Prefer self-hosted solutions
- Have existing PostgreSQL expertise in your team
Development
Prerequisites
nBedR requires Python 3.11 or higher. For detailed development setup instructions, see docs/DEVELOPMENT.md.
Note: The project includes a .python-version
file that specifies Python 3.11 as the default. Tools like pyenv
will automatically use this version.
Quick Setup
# Check Python version
python3.11 --version
# Automated setup
./scripts/setup_dev.sh
# Or manual setup
python3.11 -m venv venv
source venv/bin/activate
pip install -e .[dev,all]
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=core --cov-report=html
# Run specific test modules
pytest tests/unit/test_models.py
pytest tests/integration/test_document_service.py
Code Quality
# Format code
black .
isort .
# Type checking
mypy core/
# Linting
flake8 core/
bandit -r core/
Build and Release
For comprehensive build instructions, CI/CD pipeline details, release procedures, and deployment guidelines, see the Build Documentation.
Quick Build Commands
# Local development setup
pip install -e .[dev,all]
# Run tests and quality checks
pytest tests/ -v --cov=core --cov=cli
black . && isort . && flake8 .
# Build Python package
python -m build
# Build Docker container
docker build -f deployment/docker/Dockerfile -t nbedr:local .
Release Process
Releases are managed through GitHub Actions workflows:
- Automated CI/CD: Every push triggers comprehensive testing and building
- Manual Releases: Use GitHub Actions UI to trigger releases with automatic version management
- Multiple Artifacts: Releases include PyPI packages and Docker containers
- Changelog Integration: Release notes automatically include changelog content
For detailed release procedures and troubleshooting, see the Build Documentation.
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on the foundation of the RAFT Toolkit
- Utilizes LangChain for text processing
- Powered by OpenAI embeddings
📚 Navigation: Home | Development | Deployment | Security
🔗 Links: Repository | Issues | Releases
Last updated: 2025-06-21 17:09 UTC