Vector Search Setup - nself-org/nchat GitHub Wiki
Vector Search Infrastructure - Setup Guide
Overview
The vector search infrastructure enables semantic search across messages using OpenAI embeddings and PostgreSQL's pgvector extension. This guide covers setup, configuration, and operation.
Architecture
Components
-
Database Layer (pgvector)
- Vector storage in PostgreSQL
- HNSW index for fast similarity search
- Automatic embedding triggers
-
Embedding Service (OpenAI)
- Text-to-vector conversion
- Batch processing support
- Cost tracking and caching
-
Vector Store (TypeScript)
- High-level API for vector operations
- Similarity search functions
- Batch operations
-
Embedding Pipeline
- Automatic embedding generation
- Retry logic for failures
- Progress tracking
-
Background Workers
- Continuous queue processing
- Periodic maintenance
- Index optimization
-
Admin Dashboard
- Monitoring and statistics
- Job management
- Performance metrics
Installation
1. Database Migration
Run the pgvector migration:
cd .backend
psql -U postgres -d nself_chat -f migrations/031_vector_search_infrastructure.sql
Or use your migration tool:
# Hasura CLI
hasura migrate apply --version 031
# Or nself CLI (if supported)
nself db migrate
2. Install pgvector Extension
The migration automatically installs pgvector, but you can verify:
-- Check if pgvector is installed
SELECT * FROM pg_extension WHERE extname = 'vector';
-- Manually install if needed
CREATE EXTENSION IF NOT EXISTS vector;
3. Environment Variables
Add to your .env.local:
# OpenAI API Key (required)
OPENAI_API_KEY=sk-...
# Embedding Model (optional, defaults to text-embedding-3-small)
EMBEDDING_MODEL=text-embedding-3-small
# Embedding Version (optional)
EMBEDDING_VERSION=1.0.0
4. Install Dependencies
The required dependencies are already in package.json:
pnpm install
Configuration
Embedding Models
Three OpenAI models are supported:
| Model | Dimensions | Cost (per 1M tokens) | Use Case |
|---|---|---|---|
| text-embedding-3-small | 1536 | $0.02 | Default, cost-effective |
| text-embedding-3-large | 3072 | $0.13 | Higher accuracy |
| text-embedding-ada-002 | 1536 | $0.10 | Legacy model |
Recommendation: Use text-embedding-3-small for most applications.
Vector Index Parameters
The HNSW index is optimized for performance:
- Distance Metric: Cosine (best for OpenAI embeddings)
- Index Type: HNSW (fast approximate nearest neighbor search)
- Dimensions: 1536 (for small/ada models)
To change index settings, modify the migration file before running.
Usage
Starting Background Workers
Embedding Worker
Processes the queue continuously:
# Start worker
node -r ts-node/register src/workers/embedding-worker.ts
# Or with pnpm script (add to package.json)
pnpm worker:embeddings
Maintenance Worker
Runs periodic maintenance:
# Start maintenance worker
node -r ts-node/register src/workers/embedding-maintenance-worker.ts
# Or with pnpm script
pnpm worker:maintenance
Production Deployment
Use a process manager like PM2:
# Install PM2
npm install -g pm2
# Start workers
pm2 start src/workers/embedding-worker.ts --name embedding-worker
pm2 start src/workers/embedding-maintenance-worker.ts --name embedding-maintenance
# Save configuration
pm2 save
# Auto-start on boot
pm2 startup
Admin Dashboard
Access the embeddings dashboard at:
http://localhost:3000/admin/embeddings
Features:
- Coverage statistics
- Index health metrics
- Job management
- Performance monitoring
- Cost tracking
API Endpoints
Generate Embeddings
# Start bulk generation
curl -X POST http://localhost:3000/api/admin/embeddings/generate \
-H "Content-Type: application/json" \
-d '{"type": "initial", "userId": "user-id"}'
# Retry failed embeddings
curl -X POST http://localhost:3000/api/admin/embeddings/generate \
-H "Content-Type: application/json" \
-d '{"type": "repair"}'
Get Job Status
# Get specific job
curl http://localhost:3000/api/admin/embeddings/status?jobId=job-uuid
# Get all recent jobs
curl http://localhost:3000/api/admin/embeddings/status?limit=10
Get Statistics
# Get comprehensive stats (last 30 days)
curl http://localhost:3000/api/admin/embeddings/stats
# Get stats for specific period
curl http://localhost:3000/api/admin/embeddings/stats?days=7
Cancel Job
curl -X POST http://localhost:3000/api/admin/embeddings/cancel \
-H "Content-Type: application/json" \
-d '{"jobId": "job-uuid"}'
Programmatic Usage
Generate Embedding
import { embeddingService } from '@/lib/ai/embedding-service'
const result = await embeddingService.generateEmbedding('Hello, world!')
console.log(result.embedding) // [0.123, -0.456, ...]
Batch Generate
const results = await embeddingService.batchGenerateEmbeddings([
{ text: 'First message', messageId: 'msg-1' },
{ text: 'Second message', messageId: 'msg-2' },
])
console.log(results.totalTokens) // 150
console.log(results.estimatedCost) // 0.000003
Search by Similarity
import { vectorStore } from '@/lib/database/vector-store'
const results = await vectorStore.search(queryEmbedding, {
threshold: 0.7,
limit: 10,
channelId: 'channel-uuid',
})
for (const result of results) {
console.log(`${result.similarity}: ${result.content}`)
}
Monitor Performance
import { embeddingMonitor } from '@/lib/ai/embedding-monitor'
// Get monitoring report
const report = await embeddingMonitor.getReport(24) // Last 24 hours
console.log(`Avg duration: ${report.performance.avgDuration}ms`)
console.log(`Success rate: ${report.performance.successRate}%`)
console.log(`Cache hit rate: ${report.cache.hitRate}%`)
console.log(`Total cost: $${report.cost.totalCost}`)
Monitoring
Key Metrics
- Coverage: Percentage of messages with embeddings
- Cache Hit Rate: Efficiency of embedding cache
- Success Rate: Percentage of successful embeddings
- Average Duration: Time to generate embeddings
- Total Cost: Cumulative API costs
Alerts
The monitor automatically detects:
- Low success rate (< 95%)
- Slow performance (> 5s average)
- High low-quality rate (> 10%)
- Index efficiency issues
Database Queries
-- Check coverage
SELECT * FROM nchat.get_embedding_coverage();
-- Check index health
SELECT * FROM nchat.get_embedding_index_health();
-- Recent failed embeddings
SELECT id, content, embedding_error, embedding_retry_count
FROM nchat.nchat_messages
WHERE embedding_error IS NOT NULL
ORDER BY updated_at DESC
LIMIT 10;
-- Queue status
SELECT COUNT(*) as pending FROM nchat.embedding_queue WHERE claimed_at IS NULL;
SELECT COUNT(*) as processing FROM nchat.embedding_queue WHERE claimed_at IS NOT NULL;
Maintenance
Periodic Tasks
The maintenance worker automatically:
- Cleans queue - Removes stale items (every hour)
- Cleans cache - Removes unused entries (every hour)
- Optimizes index - Rebuilds for performance (daily)
Manual Maintenance
-- Clean up queue
SELECT nchat.cleanup_embedding_queue();
-- Clean up cache (entries unused for 90+ days)
SELECT nchat.cleanup_embedding_cache(90);
-- Optimize index (may take time on large datasets)
SELECT nchat.optimize_embedding_index();
-- Vacuum analyze for better performance
VACUUM ANALYZE nchat.nchat_messages;
Backup Considerations
Vector data is large! Consider:
- Selective backups: Exclude embeddings from frequent backups
- Regeneration: Embeddings can be regenerated from messages
- Cache backups: Cache can be rebuilt, low priority
# Backup without embeddings
pg_dump -U postgres -d nself_chat \
--exclude-table-data=nchat.embedding_cache \
-f backup_no_embeddings.sql
Troubleshooting
Embeddings Not Generating
-
Check OpenAI API key:
echo $OPENAI_API_KEY -
Check worker status:
ps aux | grep embedding-worker -
Check queue:
SELECT COUNT(*) FROM nchat.embedding_queue; -
Check errors:
SELECT * FROM nchat.nchat_messages WHERE embedding_error IS NOT NULL ORDER BY updated_at DESC LIMIT 5;
Slow Search Performance
-
Check index usage:
EXPLAIN ANALYZE SELECT * FROM nchat.search_messages_by_embedding( '[0.1, 0.2, ...]'::vector(1536), 0.7, 10 ); -
Rebuild index:
SELECT nchat.optimize_embedding_index(); -
Check table statistics:
ANALYZE nchat.nchat_messages;
High API Costs
- Check cache hit rate (target: > 80%)
- Review duplicate messages
- Consider batch sizes (larger = more efficient)
- Monitor failed retries (avoid unnecessary API calls)
Low Quality Embeddings
- Check for very short messages
- Review content preprocessing
- Validate embedding dimensions
- Check for system messages (should not be embedded)
Performance Optimization
Indexing Strategy
-- Additional indexes for filtered searches
CREATE INDEX idx_messages_channel_embedding
ON nchat.nchat_messages (channel_id)
WHERE embedding IS NOT NULL;
CREATE INDEX idx_messages_user_embedding
ON nchat.nchat_messages (user_id)
WHERE embedding IS NOT NULL;
Query Optimization
// Use filters to reduce search space
const results = await vectorStore.search(embedding, {
threshold: 0.8, // Higher threshold = fewer results
limit: 10,
channelId: 'specific-channel', // Filter by channel
})
Batch Processing
// Process in larger batches for efficiency
const pipeline = new EmbeddingPipeline({
batchSize: 500, // Larger batches = fewer API calls
maxConcurrent: 10, // Parallel processing
})
Security
API Key Protection
- Store in environment variables
- Never commit to version control
- Rotate regularly
- Use read-only keys if available
Access Control
- Restrict admin endpoints to authenticated admins
- Validate user permissions before operations
- Audit embedding access logs
Data Privacy
- Embeddings contain semantic information
- Consider privacy implications of caching
- Implement data retention policies
Cost Management
Estimation
For 1 million messages (avg 50 tokens each):
- Small model: 50M tokens × $0.02/1M = $1.00
- Large model: 50M tokens × $0.13/1M = $6.50
- Ada model: 50M tokens × $0.10/1M = $5.00
Cost Reduction
- Enable caching (default)
- Use batch operations (up to 2048 per request)
- Filter messages (skip system messages, very short messages)
- Choose appropriate model (small model sufficient for most cases)
- Avoid unnecessary regeneration
Next Steps
- Set up monitoring - Configure alerts and dashboards
- Tune parameters - Adjust thresholds and batch sizes
- Integrate with search - Add semantic search to UI
- Scale workers - Add more workers for large datasets
- Optimize costs - Monitor and adjust based on usage
Support
For issues or questions:
- Check logs:
pm2 logs embedding-worker - Review metrics: Admin dashboard
- Database health:
SELECT * FROM nchat.get_embedding_index_health(); - Monitor API: OpenAI usage dashboard