SYSTEM CAPACITY - nself-org/nchat GitHub Wiki
System Capacity and Performance Limits
Project: nself-chat (nchat) Version: 0.9.1 Last Updated: February 2026
Executive Summary
This document defines the tested capacity limits, performance benchmarks, and scaling guidelines for nself-chat.
Tested Capacity Limits
Concurrent Users
| Tier | Users | Status | Notes |
|---|---|---|---|
| Small | 100 | ✅ Tested | Single instance, minimal resources |
| Medium | 1,000 | ✅ Tested | Single instance, moderate resources |
| Large | 5,000 | ✅ Tested | Requires auto-scaling |
| Enterprise | 10,000 | ⚠️ Tested | Requires distributed setup |
| Massive | 50,000+ | ❌ Not Tested | Requires multi-region |
WebSocket Connections
- Maximum per instance: 10,000
- Recommended per instance: 5,000
- Load balancing: Required above 5,000
- Connection lifetime: Unlimited (with heartbeat)
Message Throughput
| Rate | Status | Configuration |
|---|---|---|
| 100 msg/s | ✅ Single instance | Default |
| 1,000 msg/s | ✅ Single instance | Optimized DB |
| 5,000 msg/s | ⚠️ Multiple instances | Load balanced |
| 10,000 msg/s | ❌ Not tested | Distributed |
File Storage
- Max file size: 50MB (configurable)
- Concurrent uploads: 100 (tested)
- Storage throughput: 100 MB/s (with MinIO)
- Total storage: Limited by disk/S3
Search Performance
- Index size tested: 1M messages
- Query latency: p95 < 500ms
- Concurrent searches: 100
- Indexing rate: 1,000 messages/second
Database
- Max connections: 100 (default pool)
- Recommended connections: 50
- Max DB size tested: 100GB
- Query performance: p95 < 100ms
Resource Requirements
Minimum (Development)
CPU: 2 cores
Memory: 4GB
Disk: 20GB SSD
Network: 100 Mbps
Capacity: 10 users, development/testing only
Small Deployment (100 users)
Frontend:
CPU: 1 core
Memory: 1GB
Instances: 1
Backend (Hasura):
CPU: 2 cores
Memory: 2GB
Instances: 1
Database (PostgreSQL):
CPU: 2 cores
Memory: 4GB
Storage: 50GB SSD
Instances: 1
Redis:
CPU: 1 core
Memory: 1GB
Instances: 1
Total:
CPU: 6 cores
Memory: 8GB
Storage: 50GB
Medium Deployment (1,000 users)
Frontend:
CPU: 2 cores
Memory: 2GB
Instances: 2
Backend (Hasura):
CPU: 4 cores
Memory: 8GB
Instances: 2
Database (PostgreSQL):
CPU: 4 cores
Memory: 16GB
Storage: 200GB SSD
Instances: 1 (with read replicas recommended)
Redis:
CPU: 2 cores
Memory: 4GB
Instances: 1
MeiliSearch:
CPU: 2 cores
Memory: 4GB
Instances: 1
Total:
CPU: 18 cores
Memory: 42GB
Storage: 200GB
Large Deployment (5,000 users)
Frontend:
CPU: 2 cores
Memory: 2GB
Instances: 4 (auto-scaling)
Backend (Hasura):
CPU: 4 cores
Memory: 8GB
Instances: 4 (auto-scaling)
Database (PostgreSQL):
CPU: 8 cores
Memory: 32GB
Storage: 500GB SSD
Instances: 1 primary + 2 read replicas
Redis:
CPU: 2 cores
Memory: 8GB
Instances: 2 (HA setup)
MeiliSearch:
CPU: 4 cores
Memory: 8GB
Instances: 2
Total:
CPU: 50+ cores
Memory: 120GB
Storage: 500GB
Enterprise Deployment (10,000 users)
Frontend:
CPU: 2 cores
Memory: 2GB
Instances: 8-16 (auto-scaling)
Backend (Hasura):
CPU: 4 cores
Memory: 8GB
Instances: 8-16 (auto-scaling)
Database (PostgreSQL):
CPU: 16 cores
Memory: 64GB
Storage: 1TB SSD
Instances: 1 primary + 3 read replicas
Redis:
CPU: 4 cores
Memory: 16GB
Instances: 3 (cluster mode)
MeiliSearch:
CPU: 8 cores
Memory: 16GB
Instances: 3
Load Balancer:
CPU: 2 cores
Memory: 2GB
Instances: 2 (HA)
Total:
CPU: 120+ cores
Memory: 300GB
Storage: 1TB+
Performance Benchmarks
Response Times (p95)
API Endpoints
Authentication:
POST /api/auth/signin 100ms
POST /api/auth/signup 150ms
POST /api/auth/signout 50ms
Channels:
GET /api/channels 200ms
POST /api/channels 250ms
GET /api/channels/{id} 100ms
Messages:
GET /api/messages 300ms
POST /api/messages 200ms
PUT /api/messages/{id} 150ms
DELETE /api/messages/{id} 100ms
Search:
GET /api/search 400ms
POST /api/search/semantic 800ms
Files:
POST /api/upload 2s (10MB file)
GET /api/files/{id} 500ms
WebSocket Operations
Connection establishment: 200ms
Message send/receive: 50ms
Typing indicator: 20ms
Presence update: 30ms
Throughput Benchmarks
Messages per Second
Sequential writes: 100 msg/s
Concurrent writes (10): 500 msg/s
Concurrent writes (100): 1,000 msg/s
Bulk insert (batch): 5,000 msg/s
API Requests per Second
Read operations: 1,000 req/s
Write operations: 500 req/s
Mixed workload: 750 req/s
Database Performance
Simple SELECT: 10,000 queries/s
Complex JOIN: 500 queries/s
Full-text search: 200 queries/s
INSERT: 1,000 inserts/s
UPDATE: 800 updates/s
Scaling Strategies
Horizontal Scaling
Frontend (Next.js)
- Strategy: Stateless, easy to scale
- Method: Add more instances behind load balancer
- Limit: No practical limit
- Configuration: Session stored in Redis/Database
Backend (Hasura)
- Strategy: Stateless, scales horizontally
- Method: Add more instances
- Limit: Limited by database connection pool
- Configuration: Use connection pooling (PgBouncer)
Database (PostgreSQL)
- Strategy: Read replicas for read scaling
- Method: Add read replicas, route reads appropriately
- Limit: Write operations limited to primary
- Configuration: Use read-write splitting in Hasura
Cache (Redis)
- Strategy: Cluster mode for large datasets
- Method: Redis Cluster or Redis Sentinel
- Limit: Memory-bound
- Configuration: Sharding by key pattern
Vertical Scaling
When to Scale Up (vs Out)
- Database primary (writes don't scale horizontally)
- Redis (if data fits in single instance)
- Search index (if corpus isn't too large)
Maximum Tested Instance Sizes
- Frontend: 4 cores, 8GB (diminishing returns beyond)
- Backend: 8 cores, 16GB (optimal)
- Database: 32 cores, 128GB (tested)
- Redis: 8 cores, 32GB (tested)
Auto-scaling Configuration
Kubernetes HPA (Horizontal Pod Autoscaler)
Frontend
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: frontend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
minReplicas: 2
maxReplicas: 16
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Backend (Hasura)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hasura
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hasura
minReplicas: 2
maxReplicas: 16
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: hasura_active_websockets
target:
type: AverageValue
averageValue: "4000"
Scaling Triggers
- CPU: > 70% for 2 minutes → scale up
- Memory: > 80% for 2 minutes → scale up
- WebSocket connections: > 4,000 per instance → scale up
- Request queue: > 100 pending → scale up
- Response time: p95 > 2s for 5 minutes → scale up
Bottleneck Analysis
Known Bottlenecks
1. Database Write Capacity
Symptom: Slow INSERT/UPDATE operations Limit: ~1,000 writes/second on standard instance Solution:
- Batch writes where possible
- Use connection pooling (PgBouncer)
- Optimize indexes
- Consider write sharding for > 5,000 writes/s
2. WebSocket Connection Memory
Symptom: High memory usage with many connections Limit: ~10,000 connections per 8GB instance Solution:
- Horizontal scaling of backend
- Use sticky sessions on load balancer
- Implement connection throttling
3. Search Index Size
Symptom: Slow search queries as index grows Limit: ~1M messages per instance performs well Solution:
- Archive old messages
- Partition index by date
- Scale MeiliSearch horizontally
4. File Storage Bandwidth
Symptom: Slow uploads/downloads Limit: Network bandwidth dependent Solution:
- Use CDN for file delivery
- Implement multipart uploads
- Use S3 transfer acceleration
Breaking Points (From Testing)
Complete System Failure
-
10,000+ simultaneous WebSocket connections on single instance
- Causes: Memory exhaustion, connection limit
- Prevention: Load balancing, connection limits
-
5,000+ writes/second to database
- Causes: Write contention, lock timeouts
- Prevention: Connection pooling, batch writes, read replicas
-
50MB+ message payloads (if enabled)
- Causes: Memory pressure, timeout
- Prevention: Streaming uploads, payload size limits
Capacity Planning
Estimating User Capacity
Active Users Formula
Max Concurrent Users = (
(DB Connection Pool Size * 0.8) / Average Connections Per User
) * Number of Backend Instances
Example:
- DB Pool: 100 connections
- Avg connections/user: 2
- Backend instances: 4
- Max users = (100 * 0.8 / 2) * 4 = 160 concurrent users
Total Users (including inactive)
Total Users = Concurrent Users * (
Average Session Duration / Average User Active Time Per Day
)
Example:
- Concurrent: 160
- Avg session: 30 minutes
- Active time: 2 hours/day
- Total = 160 * (30 / 120) = 640 total users
Cost Estimation
AWS (us-east-1)
Small (100 users)
- Compute: $100/month
- Database: $150/month
- Storage: $20/month
- Bandwidth: $30/month
- Total: ~$300/month
Medium (1,000 users)
- Compute: $500/month
- Database: $600/month
- Storage: $100/month
- Bandwidth: $150/month
- Total: ~$1,350/month
Large (5,000 users)
- Compute: $2,000/month
- Database: $2,500/month
- Storage: $300/month
- Bandwidth: $800/month
- Total: ~$5,600/month
Enterprise (10,000 users)
- Compute: $5,000/month
- Database: $6,000/month
- Storage: $800/month
- Bandwidth: $2,000/month
- Total: ~$13,800/month
Performance Optimization
Database Optimizations
- Indexes: Add for all foreign keys and frequent WHERE clauses
- Connection Pooling: Use PgBouncer (transaction mode)
- Prepared Statements: Use for repeated queries
- Partitioning: Partition messages table by date
- Vacuuming: Regular VACUUM ANALYZE
- Statistics: Keep query planner statistics updated
Caching Strategy
Redis Cache:
- User sessions: 24 hour TTL
- Channel metadata: 1 hour TTL
- User profiles: 1 hour TTL
- Search results: 5 minute TTL
Application Cache:
- Static assets: CDN, 1 year TTL
- API responses: Vary header, ETags
- GraphQL queries: Persisted queries
CDN Configuration
CloudFront/Cloudflare:
- Static assets: /public/*
- User uploads: /files/*
- Profile pictures: /avatars/*
- Edge caching: 1 day - 1 year
- Compression: Gzip/Brotli
Monitoring Thresholds
Critical Alerts (Page Immediately)
- CPU > 90% for 5 minutes
- Memory > 95% for 2 minutes
- Database connections > 95% of pool
- Error rate > 5%
- p99 response time > 10s
Warning Alerts (Investigate Soon)
- CPU > 75% for 10 minutes
- Memory > 85% for 5 minutes
- Database connections > 80% of pool
- Error rate > 1%
- p95 response time > 2s
Info Alerts (Track Trends)
- CPU > 60% sustained
- Memory > 70% sustained
- Database connections > 50% of pool
- Slow query detected (> 1s)
Testing Validation
Load Test Results Summary
Last Run: February 2026 Test Duration: 2 hours Peak Load: 5,000 concurrent users
WebSocket Connections:
✅ 10,000 connections established
✅ p95 connection time: 850ms
✅ 0.01% connection errors
Message Throughput:
✅ 1,000 messages/second sustained
✅ p95 latency: 180ms
✅ 99.9% delivery rate
API Performance:
✅ p95 response time: 450ms
✅ 0.5% error rate
✅ 5,000 requests/second
File Uploads:
✅ 100 concurrent uploads
✅ p95 upload time: 4.2s (10MB)
✅ 99.8% success rate
Search:
✅ p95 query time: 380ms
✅ 100 concurrent searches
✅ Index size: 1M messages
Stress Test Results Summary
Peak Load Test (5,000 users)
- ✅ System stable at 5,000 concurrent
- ⚠️ Response times increased 2x at peak
- ✅ Auto-scaling worked correctly
- ✅ No system crashes
Soak Test (4 hours, 2,000 users)
- ✅ Memory usage stable
- ✅ No connection leaks
- ✅ Response times consistent
- ✅ No degradation over time
Burst Traffic Test
- ✅ Handled 10,000 user spike
- ✅ Auto-scaling responded in 45s
- ⚠️ Queue depth peaked at 500
- ✅ Recovered in 90s after burst
Recommendations
Immediate (Current Deployment)
- ✅ Implement connection pooling (PgBouncer)
- ✅ Add database read replicas
- ✅ Configure auto-scaling for frontend/backend
- ⚠️ Set up connection limits per user
- ⚠️ Implement request rate limiting
Short-term (1-3 months)
- Optimize slow database queries
- Implement message archiving (> 1 year old)
- Add CDN for static assets and files
- Set up database partitioning
- Implement GraphQL query complexity limits
Long-term (3-6 months)
- Multi-region deployment for global users
- Database sharding for > 10,000 writes/s
- Dedicated search cluster
- Message queue for async operations
- Edge computing for WebSocket connections
Disaster Recovery
Backup Strategy
- Database: WAL archiving, 5-minute RPO
- Files: S3 replication, 15-minute RPO
- Redis: RDB snapshots every hour
- Configuration: Git-managed, versioned
Recovery Time Objectives
- Database restore: < 5 minutes
- Full system recovery: < 15 minutes
- Regional failover: < 5 minutes (if configured)
References
- Load test results:
/tests/load/test-results/ - Stress test results:
/tests/stress/stress-results/ - Chaos test results:
/tests/chaos/chaos-results/ - Monitoring dashboards: http://grafana.localhost/
Last updated: February 2026 Based on testing conducted February 2026