Deployment Infrastructure - sbuddharaju369/WebsiteAnalyzer GitHub Wiki
PoC/Demo Deployment Architecture
Lightweight Demo Infrastructure
Platform: Replit Autoscale Deployment
- Runtime: Python 3.11 with Nix package management
- Resources: 2 vCPU, 4GB RAM, 20GB storage
- Port Configuration: 5000 (optimized for Replit's network)
- Auto-scaling: Handles 0-100 concurrent users seamlessly
Database Strategy:
- Primary: Replit PostgreSQL (Neon-backed) for metadata and query history
- Vector Storage: Local ChromaDB with SQLite backend
- File Storage: Local filesystem with JSON cache files
- Backup: Automatic Replit snapshots
Configuration:
# .streamlit/config.toml
[server]
headless = true
address = "0.0.0.0"
port = 5000
maxUploadSize = 50
[theme]
base = "light"
primaryColor = "#1f77b4"
Demo Limitations:
- Maximum 50 pages per crawl session
- 24-hour cache retention
- Single-user concurrent sessions
- Basic rate limiting (1 request/second)
Estimated Costs: -25/month depending on usage
Production-Hardened Deployment Architecture
Multi-Tier Production Infrastructure
Primary Deployment Platform: AWS/GCP with Container Orchestration
Application Tier
Containerization Strategy:
# Production Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["streamlit", "run", "app.py", "--server.port=8080", "--server.address=0.0.0.0"]
Kubernetes Deployment:
- Pods: 3-5 replicas with horizontal auto-scaling
- Resources: 4 vCPU, 8GB RAM per pod
- Load Balancer: NGINX Ingress with SSL termination
- Health Checks: Liveness and readiness probes
Database Architecture
Primary Database:
PostgreSQL 14+ (AWS RDS/Google Cloud SQL)
- Configuration: db.r5.xlarge (4 vCPU, 32GB RAM)
- Storage: 500GB SSD with automated backups
- High Availability: Multi-AZ deployment with read replicas
- Connection Pooling: PgBouncer with 100 max connections
Vector Database:
Managed ChromaDB or Pinecone
- ChromaDB: Self-hosted on dedicated instances
-
- Specs: 8 vCPU, 32GB RAM, 1TB NVMe SSD
-
- Clustering: 3-node cluster with replication
- Alternative: Pinecone (managed service)
-
- Plan: Standard tier with 100M+ vectors
-
- Performance: Sub-50ms query latency
Caching Layer:
Redis Cluster
- Configuration: 3-node cluster with 16GB memory each
- Purpose: Session management, query caching, rate limiting
- Persistence: RDB + AOF for durability
Content Delivery & Storage
Object Storage:
AWS S3/Google Cloud Storage
- Cache Files: Compressed JSON with embeddings
- Static Assets: CSS, JS, images with CloudFront CDN
- Backup Strategy: Cross-region replication
- Lifecycle Policies: Automatic archival after 90 days
CDN Configuration:
- Primary: CloudFront/Cloud CDN
- Edge Locations: Global distribution
- Caching: Static assets (1 year), API responses (5 minutes)
- Compression: Gzip/Brotli for text content
Security & Monitoring
API Security:
# Rate limiting configuration
RATE_LIMITS = {
'crawling': '5 per hour per user',
'questions': '100 per hour per user',
'search': '1000 per hour per user'
}
# Authentication
AUTH_PROVIDERS = ['OAuth2', 'API_Key', 'JWT']
Infrastructure Security:
- WAF: AWS WAF/Google Cloud Armor with DDoS protection
- VPC: Private subnets for database and application tiers
- Secrets Management: AWS Secrets Manager/Google Secret Manager
- SSL/TLS: Let's Encrypt with automatic renewal
Monitoring Stack:
- Application Monitoring: DataDog/New Relic
- Infrastructure: Prometheus + Grafana
- Logging: ELK Stack (Elasticsearch, Logstash, Kibana)
- Alerting: PagerDuty integration for critical issues
Performance Optimization
Auto-Scaling Configuration:
# Kubernetes HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Database Optimization:
- Connection Pooling: PgBouncer with 500 max connections
- Query Optimization: Automated EXPLAIN analysis
- Indexing Strategy: Composite indexes on frequently queried columns
- Partitioning: Date-based partitioning for large tables
Disaster Recovery & Backup
Backup Strategy:
- Database: Automated daily backups with 30-day retention
- Vector Database: Weekly full backups with incremental daily
- Application State: Container image versioning and rollback capability
- File Storage: Cross-region replication with versioning
Recovery Procedures:
- RTO (Recovery Time Objective): 15 minutes
- RPO (Recovery Point Objective): 1 hour
- Failover: Automated with health check triggers
- Testing: Monthly disaster recovery drills
Scaling Scenarios & Resource Planning
Small Scale (100-500 users)
Infrastructure:
- 3 application pods (2 vCPU, 4GB RAM each)
- PostgreSQL: db.t3.large (2 vCPU, 8GB RAM)
- ChromaDB: Single instance (4 vCPU, 16GB RAM)
- Redis: Single instance (2GB memory)
Estimated Costs: 00-1,200/month
Medium Scale (500-5,000 users)
Infrastructure:
- 5-10 application pods with auto-scaling
- PostgreSQL: db.r5.xlarge with read replicas
- ChromaDB: 3-node cluster
- Redis: 3-node cluster
- CDN and advanced monitoring
Estimated Costs: $2,500-4,000/month
Large Scale (5,000+ users)
Infrastructure:
- 10-50 application pods across multiple zones
- PostgreSQL: Multi-AZ with multiple read replicas
- Managed vector database (Pinecone/Weaviate Cloud)
- Full observability and security stack
- Dedicated DevOps engineer
Estimated Costs: $8,000-15,000/month
DevOps & CI/CD Pipeline
Development Workflow
GitHub Actions Pipeline
name: Production Deployment
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Tests
run: |
python -m pytest tests/
python -m flake8 src/
build:
needs: test
runs-on: ubuntu-latest
steps:
- name: Build Docker Image
run: docker build -t web-analyzer:$GITHUB_SHA .
deploy:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to Kubernetes
run: kubectl apply -f k8s/
Environment Management
- Development: Local Docker Compose
- Staging: Kubernetes cluster with production data subset
- Production: Full production infrastructure with blue-green deployment
Performance Benchmarks
- Crawling: 50-100 pages/minute per worker
- Query Response: <2 seconds for most questions
- Concurrent Users: 1,000+ with proper scaling
- Embedding Processing: 10,000 chunks/minute
- Database Queries: <100ms average response time
This infrastructure design provides a clear evolution path from simple demo to enterprise-grade production deployment, with appropriate cost scaling and performance optimization at each tier.