Deployment Infrastructure - sbuddharaju369/WebsiteAnalyzer GitHub Wiki

PoC/Demo Deployment Architecture

Lightweight Demo Infrastructure

Platform: Replit Autoscale Deployment

Runtime: Python 3.11 with Nix package management
Resources: 2 vCPU, 4GB RAM, 20GB storage
Port Configuration: 5000 (optimized for Replit's network)
Auto-scaling: Handles 0-100 concurrent users seamlessly

Database Strategy:

Primary: Replit PostgreSQL (Neon-backed) for metadata and query history
Vector Storage: Local ChromaDB with SQLite backend
File Storage: Local filesystem with JSON cache files
Backup: Automatic Replit snapshots

Configuration:

# .streamlit/config.toml [server] headless = true address = "0.0.0.0" port = 5000 maxUploadSize = 50 [theme] base = "light" primaryColor = "#1f77b4"

Demo Limitations:

Maximum 50 pages per crawl session
24-hour cache retention
Single-user concurrent sessions
Basic rate limiting (1 request/second)

Estimated Costs: -25/month depending on usage

Production-Hardened Deployment Architecture

Multi-Tier Production Infrastructure

Primary Deployment Platform: AWS/GCP with Container Orchestration

Application Tier

Containerization Strategy:

# Production Dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8080 CMD ["streamlit", "run", "app.py", "--server.port=8080", "--server.address=0.0.0.0"]

Kubernetes Deployment:

Pods: 3-5 replicas with horizontal auto-scaling
Resources: 4 vCPU, 8GB RAM per pod
Load Balancer: NGINX Ingress with SSL termination
Health Checks: Liveness and readiness probes

Database Architecture

Primary Database:

PostgreSQL 14+ (AWS RDS/Google Cloud SQL)

Configuration: db.r5.xlarge (4 vCPU, 32GB RAM)
Storage: 500GB SSD with automated backups
High Availability: Multi-AZ deployment with read replicas
Connection Pooling: PgBouncer with 100 max connections

Vector Database:

Managed ChromaDB or Pinecone

ChromaDB: Self-hosted on dedicated instances
- Specs: 8 vCPU, 32GB RAM, 1TB NVMe SSD
- Clustering: 3-node cluster with replication
Alternative: Pinecone (managed service)
- Plan: Standard tier with 100M+ vectors
- Performance: Sub-50ms query latency

Caching Layer:

Redis Cluster

Configuration: 3-node cluster with 16GB memory each
Purpose: Session management, query caching, rate limiting
Persistence: RDB + AOF for durability

Content Delivery & Storage

Object Storage:

AWS S3/Google Cloud Storage

Cache Files: Compressed JSON with embeddings
Static Assets: CSS, JS, images with CloudFront CDN
Backup Strategy: Cross-region replication
Lifecycle Policies: Automatic archival after 90 days

CDN Configuration:

Primary: CloudFront/Cloud CDN
Edge Locations: Global distribution
Caching: Static assets (1 year), API responses (5 minutes)
Compression: Gzip/Brotli for text content

Security & Monitoring

API Security:

# Rate limiting configuration RATE_LIMITS = { 'crawling': '5 per hour per user', 'questions': '100 per hour per user', 'search': '1000 per hour per user' } # Authentication AUTH_PROVIDERS = ['OAuth2', 'API_Key', 'JWT']

Infrastructure Security:

WAF: AWS WAF/Google Cloud Armor with DDoS protection
VPC: Private subnets for database and application tiers
Secrets Management: AWS Secrets Manager/Google Secret Manager
SSL/TLS: Let's Encrypt with automatic renewal

Monitoring Stack:

Application Monitoring: DataDog/New Relic
Infrastructure: Prometheus + Grafana
Logging: ELK Stack (Elasticsearch, Logstash, Kibana)
Alerting: PagerDuty integration for critical issues

Performance Optimization

Auto-Scaling Configuration:

# Kubernetes HPA Configuration apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler spec: minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80

Database Optimization:

Connection Pooling: PgBouncer with 500 max connections
Query Optimization: Automated EXPLAIN analysis
Indexing Strategy: Composite indexes on frequently queried columns
Partitioning: Date-based partitioning for large tables

Disaster Recovery & Backup

Backup Strategy:

Database: Automated daily backups with 30-day retention
Vector Database: Weekly full backups with incremental daily
Application State: Container image versioning and rollback capability
File Storage: Cross-region replication with versioning

Recovery Procedures:

RTO (Recovery Time Objective): 15 minutes
RPO (Recovery Point Objective): 1 hour
Failover: Automated with health check triggers
Testing: Monthly disaster recovery drills

Scaling Scenarios & Resource Planning

Small Scale (100-500 users)

Infrastructure:

3 application pods (2 vCPU, 4GB RAM each)
PostgreSQL: db.t3.large (2 vCPU, 8GB RAM)
ChromaDB: Single instance (4 vCPU, 16GB RAM)
Redis: Single instance (2GB memory)

Estimated Costs: 00-1,200/month

Medium Scale (500-5,000 users)

Infrastructure:

5-10 application pods with auto-scaling
PostgreSQL: db.r5.xlarge with read replicas
ChromaDB: 3-node cluster
Redis: 3-node cluster
CDN and advanced monitoring

Estimated Costs: $2,500-4,000/month

Large Scale (5,000+ users)

Infrastructure:

10-50 application pods across multiple zones
PostgreSQL: Multi-AZ with multiple read replicas
Managed vector database (Pinecone/Weaviate Cloud)
Full observability and security stack
Dedicated DevOps engineer

Estimated Costs: $8,000-15,000/month

DevOps & CI/CD Pipeline

Development Workflow

GitHub Actions Pipeline

name: Production Deployment on: push: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run Tests run: | python -m pytest tests/ python -m flake8 src/

build: needs: test runs-on: ubuntu-latest steps: - name: Build Docker Image run: docker build -t web-analyzer:$GITHUB_SHA .

deploy: needs: build runs-on: ubuntu-latest steps: - name: Deploy to Kubernetes run: kubectl apply -f k8s/

Environment Management

Development: Local Docker Compose
Staging: Kubernetes cluster with production data subset
Production: Full production infrastructure with blue-green deployment

Performance Benchmarks

Crawling: 50-100 pages/minute per worker
Query Response: <2 seconds for most questions
Concurrent Users: 1,000+ with proper scaling
Embedding Processing: 10,000 chunks/minute
Database Queries: <100ms average response time

This infrastructure design provides a clear evolution path from simple demo to enterprise-grade production deployment, with appropriate cost scaling and performance optimization at each tier.