Scaling Strategy - yuzvak/flashsale-service GitHub Wiki
Scaling Strategy
Current Architecture Limits
Single Instance Design
- Target: 10,000 RPS for flash sale events
- Database: PostgreSQL with 100 max connections
- Cache: Redis with 200 connection pool
- Concurrency: Optimized for high concurrent checkout/purchase operations
Performance Optimizations
1. Connection Pool Configuration
// PostgreSQL
MaxOpenConns: 100
MaxIdleConns: 50
ConnMaxLifetime: 1 hour
ConnMaxIdleTime: 30 minutes
// Redis
PoolSize: 100
MaxIdleConns: 50
Timeout: 30 seconds
2. Database Optimizations
Indexing Strategy
-- Hot path indexes for performance
CREATE INDEX idx_items_sale_sold ON items(sale_id) WHERE sold = FALSE;
CREATE INDEX idx_sales_active ON sales(ended_at) WHERE ended_at > CURRENT_TIMESTAMP;
CREATE INDEX idx_checkout_code ON checkout_attempts(checkout_code);
Transaction Isolation
SERIALIZABLE
isolation for purchase operations- Atomic updates with conditional WHERE clauses
- Minimal transaction scope to reduce lock contention
3. Bloom Filter Performance
// Parameters for 10,000 items with 0.1% false positive rate
BloomSize: 128KB (1 << 17)
BloomHashes: 7
UpdateInterval: 10 seconds
Load Testing Results
Test Configuration
# Load testing commands available
make load-test-light # 200 users, 120s
make load-test # 400 users, 300s
make load-test-heavy # 800 users, 600s
make load-test-stress # 1500 users, 900s
Realistic Load Testing
- User Profiles: 10% aggressive, 60% normal, 30% browsers
- Behavior Simulation: Checkout probability, purchase delays
- Item Distribution: Popular vs normal item selection
- Database Integration: Real item availability tracking
Horizontal Scaling Considerations
1. Application Layer
Current Limitations for Multi-Instance:
- Bloom filter consistency across instances
- Distributed counter synchronization
- User checkout code sharing
Scaling Solutions:
- Redis as shared state store
- Database-driven item availability
- Distributed locking for purchases
2. Database Scaling
Current Approach:
- Single PostgreSQL instance with connection pooling
- Optimized queries with proper indexing
- Serializable transactions for data consistency
Future Options:
- Read replicas for sale/item queries
- Connection pooling with PgBouncer
- Partitioning by sale_id for historical data
3. Cache Layer Scaling
Current Redis Usage:
- Atomic operations via Lua scripts
- Distributed locks for purchase coordination
- Bloom filters for fast item rejection
Scaling Path:
- Redis Cluster for horizontal scaling
- Redis Sentinel for high availability
- Separate Redis instances by function
Monitoring and Observability
1. Performance Metrics
# Four Golden Signals implemented
http_request_duration_seconds # Latency
http_requests_total # Traffic
http_requests_total{status=~"5.."} # Errors
process_resident_memory_bytes # Saturation
2. Business Metrics
sale_items_sold_total # Items sold rate
checkout_attempts_total # Checkout funnel
purchase_success_total # Purchase conversion
redis_lock_duration_seconds # Lock contention
3. Database Performance
db_query_duration_seconds # Query performance
db_connections_active # Connection usage
db_connections_idle # Pool efficiency
Bottleneck Analysis
1. Database Write Operations
Current Mitigation:
- Batch item creation (1000 items per batch)
- Prepared statements for all queries
- Minimal transaction scope
2. Redis Lock Contention
Current Strategy:
- 3-second lock timeout
- Single purchase attempt per checkout code
- Immediate lock release after operation
3. Checkout Code Generation
Performance Approach:
- Cryptographic randomness for uniqueness
- 256-bit entropy in checkout codes
- In-memory code validation cache
Crisis Response Procedures
1. High Load Scenarios
- Monitor
/health
endpoint for service degradation - Scale database connections if CPU allows
- Increase Redis connection pool size
- Enable request rate limiting if needed
2. Database Issues
- Graceful degradation to read-only operations
- Bloom filter fallback for sold item checks
- Cache-first approach for user limits
3. Redis Failures
- Disable bloom filter optimization
- Direct database queries for all operations
- Maintain core functionality without cache
Future Scaling Paths
1. Microservice Decomposition
- Checkout Service: User cart management
- Purchase Service: Atomic transaction processing
- Inventory Service: Item availability tracking
- Analytics Service: Checkout attempt logging
2. Event-Driven Architecture
- Event sourcing for purchase history
- CQRS for read/write separation
- Message queues for async processing
3. Geographic Distribution
- Regional Redis clusters
- Database read replicas per region
- CDN for static content delivery