Infrastructure & DevOps - sgajbi/portfolio-analytics-system GitHub Wiki

📄 Updated Page: Infrastructure-&-DevOps.md


Overview

This phase focuses on productionizing the Portfolio Analytics System with containerization, continuous integration/deployment (CI/CD), observability, monitoring, and scalable deployments.

Recent improvements include standardized logging with correlation IDs and enhanced operational traceability across services.


Key Responsibilities

  • Dockerize all microservices with optimized, production-ready Dockerfiles.
  • Set up a CI/CD pipeline for automated builds, tests, and deployments.
  • Implement centralized logging with correlation IDs for distributed tracing.
  • Implement correlation ID persistence in processed_events for operational debugging.
  • Deploy services on Kubernetes with proper manifests and Helm charts.
  • Enable autoscaling based on Kafka event load.

Technology Stack & Tools

  • Docker: Containerize services using multi-stage builds.
  • CI/CD: GitHub Actions, GitLab CI, or Jenkins pipelines.
  • Logging: Splunk or ELK stack (Elasticsearch, Logstash, Kibana), all logs enriched with correlation_id.
  • Monitoring: Prometheus and Grafana.
  • Tracing: Correlation ID flow through Kafka headers and logs.
  • Container Orchestration: Kubernetes (AKS, EKS, or GKE).
  • Autoscaling: KEDA for Kafka event-driven scaling.
  • Secrets Management: Vault or cloud provider secrets manager.
  • Infrastructure as Code: Helm charts and Kubernetes manifests.

Implementation Details

Dockerization

  • Each service has a Dockerfile optimized for minimal image size and faster builds.
  • Use environment variables and secrets injection for configuration.

CI/CD Pipeline

  • Automatically build Docker images on commit.
  • Run unit and integration tests.
  • Push images to container registry.
  • Deploy to Kubernetes clusters using Helm or kubectl.

Logging & Observability

  • Correlation ID Standardization:

    • All services generate or propagate correlation_id in the format <svc-shortname>:<uuid>.

    • Logged in every service with the pattern:

      [LEVEL] [corr_id=ING:uuid] ServiceName - Message
      
    • Correlation ID flows via Kafka headers and is included in API responses as X-Correlation-ID.

  • Centralized Logging:

    • All logs are shipped to Splunk/ELK with correlation_id field indexed.
    • Allows querying an event across multiple services by a single correlation ID.
  • Debugging via Database:

    • processed_events table stores correlation_id for every processed event.

    • Ops can trace event status and logs via:

      SELECT * FROM processed_events WHERE correlation_id='ING:uuid';

Monitoring

  • Collect service metrics with Prometheus exporters.

  • Configure Grafana dashboards for:

    • Event processing latency.
    • Kafka topic lag.
    • Error rate by service.

Kubernetes Deployment

  • Define manifests for Deployments, Services, ConfigMaps, and Secrets.
  • Use Helm charts for templated deployments.
  • Configure Kafka consumer autoscaling with KEDA based on queue lag.

Testing & Validation

  • Load testing of entire analytics pipeline.
  • Failover and recovery drills.
  • Monitoring alert rules for SLA adherence.
  • Validation of correlation ID propagation and log visibility in Splunk/ELK.
⚠️ **GitHub.com Fallback** ⚠️