Vector Search Setup - nself-org/nchat GitHub Wiki

Vector Search Infrastructure - Setup Guide

Overview

The vector search infrastructure enables semantic search across messages using OpenAI embeddings and PostgreSQL's pgvector extension. This guide covers setup, configuration, and operation.

Architecture

Components

Database Layer (pgvector)
- Vector storage in PostgreSQL
- HNSW index for fast similarity search
- Automatic embedding triggers
Embedding Service (OpenAI)
- Text-to-vector conversion
- Batch processing support
- Cost tracking and caching
Vector Store (TypeScript)
- High-level API for vector operations
- Similarity search functions
- Batch operations
Embedding Pipeline
- Automatic embedding generation
- Retry logic for failures
- Progress tracking
Background Workers
- Continuous queue processing
- Periodic maintenance
- Index optimization
Admin Dashboard
- Monitoring and statistics
- Job management
- Performance metrics

Installation

1. Database Migration

Run the pgvector migration:

cd .backend
psql -U postgres -d nself_chat -f migrations/031_vector_search_infrastructure.sql

Or use your migration tool:

# Hasura CLI
hasura migrate apply --version 031

# Or nself CLI (if supported)
nself db migrate

2. Install pgvector Extension

The migration automatically installs pgvector, but you can verify:

-- Check if pgvector is installed
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Manually install if needed
CREATE EXTENSION IF NOT EXISTS vector;

3. Environment Variables

Add to your .env.local:

# OpenAI API Key (required)
OPENAI_API_KEY=sk-...

# Embedding Model (optional, defaults to text-embedding-3-small)
EMBEDDING_MODEL=text-embedding-3-small

# Embedding Version (optional)
EMBEDDING_VERSION=1.0.0

4. Install Dependencies

The required dependencies are already in package.json:

pnpm install

Configuration

Embedding Models

Three OpenAI models are supported:

Model	Dimensions	Cost (per 1M tokens)	Use Case
text-embedding-3-small	1536	$0.02	Default, cost-effective
text-embedding-3-large	3072	$0.13	Higher accuracy
text-embedding-ada-002	1536	$0.10	Legacy model

Recommendation: Use text-embedding-3-small for most applications.

Vector Index Parameters

The HNSW index is optimized for performance:

Distance Metric: Cosine (best for OpenAI embeddings)
Index Type: HNSW (fast approximate nearest neighbor search)
Dimensions: 1536 (for small/ada models)

To change index settings, modify the migration file before running.

Usage

Starting Background Workers

Embedding Worker

Processes the queue continuously:

# Start worker
node -r ts-node/register src/workers/embedding-worker.ts

# Or with pnpm script (add to package.json)
pnpm worker:embeddings

Maintenance Worker

Runs periodic maintenance:

# Start maintenance worker
node -r ts-node/register src/workers/embedding-maintenance-worker.ts

# Or with pnpm script
pnpm worker:maintenance

Production Deployment

Use a process manager like PM2:

# Install PM2
npm install -g pm2

# Start workers
pm2 start src/workers/embedding-worker.ts --name embedding-worker
pm2 start src/workers/embedding-maintenance-worker.ts --name embedding-maintenance

# Save configuration
pm2 save

# Auto-start on boot
pm2 startup

Admin Dashboard

Access the embeddings dashboard at:

http://localhost:3000/admin/embeddings

Features:

Coverage statistics
Index health metrics
Job management
Performance monitoring
Cost tracking

API Endpoints

Generate Embeddings

# Start bulk generation
curl -X POST http://localhost:3000/api/admin/embeddings/generate \
  -H "Content-Type: application/json" \
  -d '{"type": "initial", "userId": "user-id"}'

# Retry failed embeddings
curl -X POST http://localhost:3000/api/admin/embeddings/generate \
  -H "Content-Type: application/json" \
  -d '{"type": "repair"}'

Get Job Status

# Get specific job
curl http://localhost:3000/api/admin/embeddings/status?jobId=job-uuid

# Get all recent jobs
curl http://localhost:3000/api/admin/embeddings/status?limit=10

Get Statistics

# Get comprehensive stats (last 30 days)
curl http://localhost:3000/api/admin/embeddings/stats

# Get stats for specific period
curl http://localhost:3000/api/admin/embeddings/stats?days=7

Cancel Job

curl -X POST http://localhost:3000/api/admin/embeddings/cancel \
  -H "Content-Type: application/json" \
  -d '{"jobId": "job-uuid"}'

Programmatic Usage

Generate Embedding

import { embeddingService } from '@/lib/ai/embedding-service'

const result = await embeddingService.generateEmbedding('Hello, world!')
console.log(result.embedding) // [0.123, -0.456, ...]

Batch Generate

const results = await embeddingService.batchGenerateEmbeddings([
  { text: 'First message', messageId: 'msg-1' },
  { text: 'Second message', messageId: 'msg-2' },
])

console.log(results.totalTokens) // 150
console.log(results.estimatedCost) // 0.000003

Search by Similarity

import { vectorStore } from '@/lib/database/vector-store'

const results = await vectorStore.search(queryEmbedding, {
  threshold: 0.7,
  limit: 10,
  channelId: 'channel-uuid',
})

for (const result of results) {
  console.log(`${result.similarity}: ${result.content}`)
}

Monitor Performance

import { embeddingMonitor } from '@/lib/ai/embedding-monitor'

// Get monitoring report
const report = await embeddingMonitor.getReport(24) // Last 24 hours

console.log(`Avg duration: ${report.performance.avgDuration}ms`)
console.log(`Success rate: ${report.performance.successRate}%`)
console.log(`Cache hit rate: ${report.cache.hitRate}%`)
console.log(`Total cost: $${report.cost.totalCost}`)

Monitoring

Key Metrics

Coverage: Percentage of messages with embeddings
Cache Hit Rate: Efficiency of embedding cache
Success Rate: Percentage of successful embeddings
Average Duration: Time to generate embeddings
Total Cost: Cumulative API costs

Alerts

The monitor automatically detects:

Low success rate (< 95%)
Slow performance (> 5s average)
High low-quality rate (> 10%)
Index efficiency issues

Database Queries

-- Check coverage
SELECT * FROM nchat.get_embedding_coverage();

-- Check index health
SELECT * FROM nchat.get_embedding_index_health();

-- Recent failed embeddings
SELECT id, content, embedding_error, embedding_retry_count
FROM nchat.nchat_messages
WHERE embedding_error IS NOT NULL
ORDER BY updated_at DESC
LIMIT 10;

-- Queue status
SELECT COUNT(*) as pending FROM nchat.embedding_queue WHERE claimed_at IS NULL;
SELECT COUNT(*) as processing FROM nchat.embedding_queue WHERE claimed_at IS NOT NULL;

Maintenance

Periodic Tasks

The maintenance worker automatically:

Cleans queue - Removes stale items (every hour)
Cleans cache - Removes unused entries (every hour)
Optimizes index - Rebuilds for performance (daily)

Manual Maintenance

-- Clean up queue
SELECT nchat.cleanup_embedding_queue();

-- Clean up cache (entries unused for 90+ days)
SELECT nchat.cleanup_embedding_cache(90);

-- Optimize index (may take time on large datasets)
SELECT nchat.optimize_embedding_index();

-- Vacuum analyze for better performance
VACUUM ANALYZE nchat.nchat_messages;

Backup Considerations

Vector data is large! Consider:

Selective backups: Exclude embeddings from frequent backups
Regeneration: Embeddings can be regenerated from messages
Cache backups: Cache can be rebuilt, low priority

# Backup without embeddings
pg_dump -U postgres -d nself_chat \
  --exclude-table-data=nchat.embedding_cache \
  -f backup_no_embeddings.sql

Troubleshooting

Embeddings Not Generating

Check OpenAI API key:
```
echo $OPENAI_API_KEY
```
Check worker status:
```
ps aux | grep embedding-worker
```

Check queue:

SELECT COUNT(*) FROM nchat.embedding_queue;

Check errors:

SELECT * FROM nchat.nchat_messages
WHERE embedding_error IS NOT NULL
ORDER BY updated_at DESC
LIMIT 5;

Slow Search Performance

Check index usage:

EXPLAIN ANALYZE
SELECT * FROM nchat.search_messages_by_embedding(
  '[0.1, 0.2, ...]'::vector(1536),
  0.7,
  10
);

Rebuild index:

SELECT nchat.optimize_embedding_index();

Check table statistics:
```
ANALYZE nchat.nchat_messages;
```

High API Costs

Check cache hit rate (target: > 80%)
Review duplicate messages
Consider batch sizes (larger = more efficient)
Monitor failed retries (avoid unnecessary API calls)

Low Quality Embeddings

Check for very short messages
Review content preprocessing
Validate embedding dimensions
Check for system messages (should not be embedded)

Performance Optimization

Indexing Strategy

-- Additional indexes for filtered searches
CREATE INDEX idx_messages_channel_embedding
ON nchat.nchat_messages (channel_id)
WHERE embedding IS NOT NULL;

CREATE INDEX idx_messages_user_embedding
ON nchat.nchat_messages (user_id)
WHERE embedding IS NOT NULL;

Query Optimization

// Use filters to reduce search space
const results = await vectorStore.search(embedding, {
  threshold: 0.8, // Higher threshold = fewer results
  limit: 10,
  channelId: 'specific-channel', // Filter by channel
})

Batch Processing

// Process in larger batches for efficiency
const pipeline = new EmbeddingPipeline({
  batchSize: 500, // Larger batches = fewer API calls
  maxConcurrent: 10, // Parallel processing
})

Security

API Key Protection

Store in environment variables
Never commit to version control
Rotate regularly
Use read-only keys if available

Access Control

Restrict admin endpoints to authenticated admins
Validate user permissions before operations
Audit embedding access logs

Data Privacy

Embeddings contain semantic information
Consider privacy implications of caching
Implement data retention policies

Cost Management

Estimation

For 1 million messages (avg 50 tokens each):

Small model: 50M tokens × $0.02/1M = $1.00
Large model: 50M tokens × $0.13/1M = $6.50
Ada model: 50M tokens × $0.10/1M = $5.00

Cost Reduction

Enable caching (default)
Use batch operations (up to 2048 per request)
Filter messages (skip system messages, very short messages)
Choose appropriate model (small model sufficient for most cases)
Avoid unnecessary regeneration

Next Steps

Set up monitoring - Configure alerts and dashboards
Tune parameters - Adjust thresholds and batch sizes
Integrate with search - Add semantic search to UI
Scale workers - Add more workers for large datasets
Optimize costs - Monitor and adjust based on usage

Support

For issues or questions:

Check logs: pm2 logs embedding-worker
Review metrics: Admin dashboard
Database health: SELECT * FROM nchat.get_embedding_index_health();
Monitor API: OpenAI usage dashboard