Why another worflow orchestrator? - nshaibu/volnux GitHub Wiki

Volnux Framework: The Rationale for a New Orchestrator

This document outlines the core features of the Volnux framework, demonstrating how its design uniquely addresses the performance, usability, and collaboration demands of modern, event-driven data and ML pipelines. The central rationale for Volnux is to provide Python's productivity with cloud-native scale and architectural resilience.

I. Volnux Core Ambition

To build a High-Performance, Highly Resilient, and Collaboration-Fostering event-driven workflow orchestrator, specifically optimised for MLOps, Computer Vision, and modern data engineering workloads.

Volnux's Architecture is built on 3 pillars to overcome the limitations of traditional orchestrators/schedulers:

  1. High Performance: Achieved through intelligent hybrid concurrency, bypassing GIL.
  2. Resilience and Scalability: Delivered by adaptive resource allocation and robust distributed execution models.
  3. Collaboration and Simplicity: Enabled by intuitive Pointy-Lang and decoupled task management

II. Feature Breakdown and Ambition Fulfilment

A. Feature 1: Declarative Workflow DSL (Pointy-lang)

Description Ambition Fulfilled Practical Use Cases
Graph-Based Definition: Uses simple and intuitive syntax to define sequence, parallel execution, and conditional branching. Collaboration & Ease of Use: Hides the complexity of distributed computing behind an intuitive, readable language. Non-technical users (e.g., Business Analysts) can define complex business rules without writing Python code.
Input Schema Injection: The Pipeline class defines required workflow inputs, which are automatically injected into tasks (Dependency Injection). Productivity & Resilience: Provides strong data typing and validation upfront, preventing runtime errors. Cleans up task signatures, focusing engineers purely on business logic. API/Webhook Integration: Defines a strict, self-documenting contract for external services calling the workflow via a webhook.

B. Feature 2: Execution Engine and Performance Backends

Description Ambition Fulfilled Practical Use Cases
Hybrid Execution Layer: Tasks are submitted to various backends: Python's ThreadPoolExecutor (I/O-bound) or ProcessPoolExecutor (CPU-bound), C/Rust bindings, or remote execution engines (Kafka, Kubernetes). High Performance & Resilience: Bypasses the Python Global Interpreter Lock (GIL) for true parallelism in CPU-intensive tasks (e.g., model training). Utilises optimised bindings for extreme performance where needed. MLOps Training: Use ProcessPoolExecutor for parallel model training across cores, while using ThreadPoolExecutor for concurrent metadata logging (I/O) to an external database.
GPU/CPU Fallback: Tasks can be flagged for GPU execution, but automatically and gracefully degrade to CPU execution if the GPU resource is unavailable or unhealthy. Resilience & Portability: Ensures that workflows never fail due to hardware absence. The same workflow definition runs on a GPU-enabled production cluster for high performance and a CPU-only staging environment for testing. Cost Optimisation: Non-critical inference jobs can run on cheaper CPU workers during off-peak hours, preserving expensive GPU resources for priority tasks.

C. Feature 3: Scaling and Operational Intelligence

Description Ambition Fulfilled Practical Use Cases
Adaptive Scaling: Runtime monitor tracks CPU and memory utilisation against a predefined quota, automatically adjusting worker and queue sizes. High Performance & Resilience: Provides self-tuning capability. Eliminates resource over-provisioning (cost savings) and under-provisioning (performance loss). Guarantees stability during unpredictable traffic spikes. Event Burst Handling: Handles massive, sudden influx of events (e.g., marketing campaign launch) by rapidly scaling workers up, then scaling down gradually after the peak load subsides, optimising cloud costs.
Batch Pipeline Execution: A declarative field on the Pipeline class defines a batch/chunk size. Volnux automatically splits large input data and spawns multiple, parallel workflow instances (submitted via pools or Kafka). Scalability & Specialised Fit (Data/ML): Turns a large single-workflow job into many smaller, fault-tolerant, concurrent jobs. This is essential for processing massive datasets found in MLOps and Big Data ETL. Parallel Inference: Processing 1 million customer records by splitting them into 1,000 batches, allowing the entire model inference workflow to run 1,000 times in parallel.

D. Feature 4: Collaboration and Reusability

Description Ambition Fulfilled Practical Use Cases
External Task Hosting: Tasks do not have to be implemented locally; they can be pulled and instantiated from external, versioned repositories (PyPI, GitHub). Collaboration & Ease of Use: Enforces a clean separation between Task Implementation (Engineers) and Workflow Design (Analysts). Promotes code reusability across the entire organisation. Reusable Task Libraries: A Platform Team publishes a standardised pypi::db_connector task. Data Scientists consume this task in their Pointy-lang workflows, guaranteeing consistency in database access across all projects.
Triggers and Nested Workflows: Tasks can trigger the execution of other workflows (Child Workflows), with the option to wait synchronously or execute asynchronously/in parallel. Resilience & Clarity: Enables the breakdown of massive processes into small, reusable modules. The use of triggers supports both synchronous orchestration (Saga pattern) and asynchronous event-driven choreography. Payment Retry System: A parent order task triggers a child workflow to handle payment retries and waits for the child workflow's final success/failure, ensuring robust transaction management.

III. Conclusion: The Volnux Rationale

Volnux is not simply another Python wrapper for a scheduler; it is a purpose-built, hybrid orchestrator designed to meet the extreme demands of modern AI-driven enterprises. It addresses the fundamental flaws of existing tools:

  • Airflow's Rigidity: Volnux moves beyond scheduled-only batch processing to be truly event-driven.
  • General Python's GIL: Volnux's Hybrid Executor and Adaptive Scaling ensure high performance and resilience under heavy load.
  • Complexity: The Pointy-lang DSL and External Task Model make workflow creation accessible to the entire organization, fostering cross-functional collaboration.

Volnux delivers a powerful combination of simplicity, speed, and reliability required to build the next generation of scalable, automated MLOps and Data Systems.