Volnux Batch Pipeline Distributed Backend Architecture - nshaibu/volnux GitHub Wiki
Volnux Batch Pipeline Distributed Backend Architecture
- Document Version: 1.0
- Author: nshaibu
- Purpose: This document provides a comprehensive technical overview of the framework's distributed batch pipeline subsystem.
The core of the distributed system is built around three main components: the Orchestrator/Producer, the Message Broker, and the Worker/Consumer Pool.
1. The Volnux Orchestrator (Producer)
This is the main Python application containing the Volnux Adaptive Scaling Engine.
- Role: Acts as the Task Producer. It calculates the optimal Batch Size and pushes tasks onto the message broker queue.
- Key Action: Batching Control. Instead of submitting tasks to a local
Executor.submit(), it serialises tasks and sends them as messages to the Message Broker. The Adaptive Scaling Engine's primary job here is to manage the rate and size of these message bursts to avoid overwhelming the workers or the broker itself. - Libraries: Uses language-specific client libraries (e.g.,
confluent-kafka-python,pika) to communicate with the broker.
2. The Message Broker (Queue/Bus)
This layer decouples the orchestration logic from the execution environment.
- Role: Provides a reliable, durable queue for tasks awaiting processing.
- Examples: Kafka (high throughput streaming), RabbitMQ (flexible routing and guaranteed delivery), or cloud services like AWS SQS/SNS or Google Cloud Pub/Sub.
- Key Function: Load Balancing. It automatically distributes the task messages across all available workers in the consumer pool.
- Volnux Interface: The Orchestrator pushes messages to a specific Task Topic/Queue, and the Workers pull from it.
3. The Worker/Consumer Pool
These are independent, external compute instances that perform the actual work.
- Role: Acts as the Task Consumer. Each worker runs a light process that constantly listens to the Task Queue. When it receives a task message, it executes the corresponding function.
- Implementation: These workers are often stateless containerised applications (e.g., Docker containers running a Python script).
- Key Action: Executor Pool Sizing ($W_{max}$) Control. This is the critical difference. The Volnux Engine itself does not create or destroy these processes. Instead, it manages the external Autoscaling Mechanism that controls the worker pool size.
External Autoscaling Mechanisms:
| Component | Mechanism | Volnux Engine's Role |
|---|---|---|
| VMs/Cloud | Autoscaling Groups (e.g., AWS ASG, Azure VMSS) | Adjusts the Min/Max capacity settings based on Queue Length or CPU Load metrics. |
| Containers | Kubernetes Horizontal Pod Autoscaler (HPA) | Sets the HPA rules, often based on Custom Metrics like the number of messages in the queue. |
The Volnux Adaptive Scaling Engine calculates $W_{max}$ (the maximum worker count allowed by the resource quota) and sets this as the maximum size limit for the external autoscaler. This ensures the total resource consumption remains within the user's budget $Q_{max}$.
Adaptive Scaling in the Distributed Context
The Adaptive Scaling Engine uses two primary feedback loops to ensure efficiency and quota adherence:
A. Quota Enforcement (Executor Pool Sizing)
- Engine Calculates $W_{max}$: Based on the user quota ($Q_{max}$) and estimated worker resources ($R_w$), the engine determines the safe maximum number of workers.
- Engine Updates Autoscaler: It configures the external autoscaler (e.g., the Kubernetes HPA's
maxReplicasor the cloud ASG'sMaxSize) to equal $W_{max}$. - Real-time Scale Out/In: The autoscaler automatically adjusts the current worker count based on the load (queue depth) but is capped by the $W_{max}$ set by the Volnux Engine. This is how the quota is enforced.
B. Flow Control (Batch Size)
- Engine Monitors Queue Depth: The Orchestrator periodically queries the message broker for the current Queue Length (how many tasks are pending).
- Adaptive Batching:
- If the Queue Length is low (workers are running idle), the Engine increases the Batch Size and submission rate to ensure all $W_{max}$ workers stay busy (maximizing throughput).
- If the Queue Length is high (workers are falling behind) or the overall cluster CPU usage approaches $Q_{max}$, the Engine pauses or significantly reduces the Batch Size and submission rate, effectively applying backpressure to prevent the system from collapsing under excessive load.