Infrastructure & DevOps - sgajbi/portfolio-analytics-system GitHub Wiki

📄 Updated Page: Infrastructure-&-DevOps.md

Overview

This phase focuses on productionizing the Portfolio Analytics System with containerization, continuous integration/deployment (CI/CD), observability, monitoring, and scalable deployments.

Recent improvements include standardized logging with correlation IDs and enhanced operational traceability across services.

Key Responsibilities

Dockerize all microservices with optimized, production-ready Dockerfiles.
Set up a CI/CD pipeline for automated builds, tests, and deployments.
Implement centralized logging with correlation IDs for distributed tracing.
Implement correlation ID persistence in processed_events for operational debugging.
Deploy services on Kubernetes with proper manifests and Helm charts.
Enable autoscaling based on Kafka event load.

Technology Stack & Tools

Docker: Containerize services using multi-stage builds.
CI/CD: GitHub Actions, GitLab CI, or Jenkins pipelines.
Logging: Splunk or ELK stack (Elasticsearch, Logstash, Kibana), all logs enriched with correlation_id.
Monitoring: Prometheus and Grafana.
Tracing: Correlation ID flow through Kafka headers and logs.
Container Orchestration: Kubernetes (AKS, EKS, or GKE).
Autoscaling: KEDA for Kafka event-driven scaling.
Secrets Management: Vault or cloud provider secrets manager.
Infrastructure as Code: Helm charts and Kubernetes manifests.

Implementation Details

Dockerization

Each service has a Dockerfile optimized for minimal image size and faster builds.
Use environment variables and secrets injection for configuration.

CI/CD Pipeline

Automatically build Docker images on commit.
Run unit and integration tests.
Push images to container registry.
Deploy to Kubernetes clusters using Helm or kubectl.

Logging & Observability

Correlation ID Standardization:
- All services generate or propagate correlation_id in the format <svc-shortname>:<uuid>.
- Logged in every service with the pattern:
```
[LEVEL] [corr_id=ING:uuid] ServiceName - Message
```
- Correlation ID flows via Kafka headers and is included in API responses as X-Correlation-ID.
Centralized Logging:
- All logs are shipped to Splunk/ELK with correlation_id field indexed.
- Allows querying an event across multiple services by a single correlation ID.
Debugging via Database:
- processed_events table stores correlation_id for every processed event.
- Ops can trace event status and logs via:
```
SELECT * FROM processed_events WHERE correlation_id='ING:uuid';
```

Monitoring

Collect service metrics with Prometheus exporters.
Configure Grafana dashboards for:
- Event processing latency.
- Kafka topic lag.
- Error rate by service.

Kubernetes Deployment

Define manifests for Deployments, Services, ConfigMaps, and Secrets.
Use Helm charts for templated deployments.
Configure Kafka consumer autoscaling with KEDA based on queue lag.

Testing & Validation

Load testing of entire analytics pipeline.
Failover and recovery drills.
Monitoring alert rules for SLA adherence.
Validation of correlation ID propagation and log visibility in Splunk/ELK.