Volnux Adaptive Scaling - nshaibu/volnux GitHub Wiki

Volnux Adaptive Scaling Engine Algorithms

Document Version: 1.0
Author: nshaibu
Purpose: This document provides a comprehensive technical overview of the framework's batch pipeline resource utilisation autoscaling.

The goal of these algorithms is to maximise throughput (running as many tasks as possible) while strictly adhering to the user-defined resource quota.

1. Executor Pool Size Calculation (Resource-Driven)

This algorithm determines the number of active workers/processes, typically focusing on CPU core utilisation, as that's often the hardest limit.

Algorithm: Resource-to-Worker Mapping

The engine needs to know the estimated or average resource demand of a single worker.

Variables:
- $Q_{max}$: User-defined total CPU Quota (e.g., 8 cores).
- $R_w$: Estimated average CPU requirement per Worker/Process (e.g., 0.8 cores/worker).
- $W_{max}$: Maximum Executor Pool Size.

$$W_{max} = \lfloor \frac{Q_{max}}{R_w} \rfloor$$

Example: If $Q_{max} = 8$ cores and $R_w = 0.8$ cores/worker, then $W_{max} = \lfloor 8 / 0.8 \rfloor = \lfloor 10 \rfloor$. The engine can safely maintain 10 workers.

Dynamic Adjustment (Elasticity)

The engine doesn't just launch $W_{max}$ workers at once. It continuously monitors the queue length and current worker activity:

Scale Up: If the Task Queue Length is high (many pending tasks) and the Active Worker Count is less than $W_{max}$, launch new workers until $W_{max}$ is reached or the queue is drained.
Scale Down: If a worker has been idle for a defined timeout period (and the Task Queue Length is zero), gracefully terminate it to save resources.

2. Optimal Batch Size Calculation (Throughput-Driven)

The Batch Size determines how many tasks are packaged together and submitted to the execution layer (or message broker) at one time. This is less about preventing quota overage and more about efficiency and reducing orchestration overhead.

Algorithm: Batch Size based on Overhead

Too small a batch size increases the overhead of submission/distribution per task. Too large a batch size can lead to resource contention if the batch takes too long to complete.

Variables:
- $T_{exec}$: Estimated average execution time per task (e.g., 30 seconds).
- $T_{overhead}$: Time cost of submitting one batch (e.g., 500 milliseconds/batch).
- $B_{size}$: The Optimal Batch Size.

An ideal batch size minimises the ratio of overhead to execution time. A common heuristic is to aim for submission overhead to be a small fraction (e.g., 1-5%) of the total execution time of the batch.

$$B_{size} = \text{Target Multiple} \times \frac{T_{overhead}}{\text{Orchestration Cycle Time}}$$

A simpler approach for initial calculation is based on the available parallelism ($W_{max}$):

$$B_{size} = \text{Parallelism Multiplier} \times W_{max}$$

Rationale: Submitting a batch that is 2 or 3 times the size of the maximum parallelism ($W_{max}$) ensures that workers always have tasks to pull, even if some tasks finish quickly.
Example: If $W_{max} = 10$, a Parallelism Multiplier of 3 suggests an $B_{size} = 30$. This batch size ensures all 10 workers can immediately pick up a task, with 20 tasks queued up for them, minimising idle time.

Dynamic Adjustment (Queue Feedback)

In a truly adaptive system, the batch size is adjusted based on the current Task Queue Length ($L_q$):

If $L_q$ is very low (e.g., $< W_{max}$), the system may temporarily increase $B_{size}$ to load the queue faster.
If $L_q$ is very high (indicating workers can't keep up), the system may temporarily decrease $B_{size}$ or simply pause batch submission until the queue stabilises, preventing resource exhaustion in the queue itself.

This combination of resource-driven worker sizing and throughput-driven batch sizing forms the core of the volnux adaptive scaling.