Execution Model Documentation - nshaibu/volnux GitHub Wiki
Execution Model Documentation: Volnux Framework
- Document Version: 1.0
- Date: November 08, 2025
- Author: nshaibu
- Purpose: This document provides a comprehensive technical overview of the framework's execution model for contributors.
Introduction
The Volnux Orchestration Framework is a modular, graph-based system designed for executing complex, dependency-aware workflows in distributed environments. It supports both local and remote execution, emphasizing loose coupling, fault tolerance, and observability. The model draws inspiration from patterns like Strategy (for pluggable execution), Factory (for dynamic resolution), and event-driven resilience, while addressing limitations in existing tools like Prefect (e.g., rigid DAGs) by prioritizing dynamic adaptation and secure remotes.
At its core, the framework treats workflows as Declarative Control-flow Graph (CFG) of Events—self-contained units of work. Execution proceeds top-down via topological traversal, with adaptive delegation to execution strategies based on runtime context. This enables efficient handling of linear sequences, parallel branches, and distributed scaling without tight coupling between orchestration and implementation. Graph traversal employs a Breadth-First Search (BFS) to control task execution flow, allowing precise sequencing and early termination in conditional paths.
Workflow graphs are declaratively defined using Pointy-Lang, a domain-specific language (DSL) tailored for graph construction. Pointy-Lang provides a concise, expressive syntax for specifying nodes (Events), edges (dependencies), and metadata (configurations, policies), abstracting away boilerplate while ensuring type safety and validation at parse time. The DSL compiles to an in-memory Event graph, which the Engine consumes directly. It natively supports advanced constructs like conditional branches, parallel execution groups, and scoped groups for modular composition.
Key design principles:
- Modularity: Components (e.g., selectors, flows, executors) are pluggable interfaces, allowing runtime swaps without altering core logic.
- Decoupling: Logical concerns (e.g., dependency resolution, retry logic, remote submission) are isolated to specific layers.
- Resilience: Failures are contained at the unit level (Tasks), with built-in policies; no global state pollution.
- Security: Transport-level encryption (SSL/TLS), payload integrity (checksums), and optional sandboxing prevent tampering and unauthorized access.
- Observability: Integrated metrics collection (via Prometheus) for latencies, success rates, and retry counts, influencing adaptive decisions.
The model supports hybrid deployments: Local for development/testing, remote (gRPC, TCP, XML-RPC,) for production scaling.
Graph Definition via Pointy-Lang DSL
Pointy-Lang is the framework's embedded DSL for authoring workflows, implemented as a graph language extension that generates Event objects and dependency graphs. It promotes readability and maintainability by treating graphs as first-class constructs, with compile-time checks for cycles(if DAG is enforced), missing deps, and policy validity. Workflows are defined in .pty files or inline Python blocks, parsed into a standard Event graph for the Engine.
Core Syntax Elements
- Directional Operator (
->): The->operator is used to define a sequential flow of events. It represents the execution of one event followed by another. It indicates that the first event must be completed before the second event begins.
A -> B # Execute event A, then execute event B
- Parallel Operator (
||): The||operator is used to execute two or more events in parallel. The events are executed concurrently, allowing for parallel execution.
A || B # Execute event A and event B in parallel
- Pipe Result Operator (
|->): The|->operator is used to pipe the result of one event to another. It can be used in conjunction with sequential or parallel operations. This allows the output of one event to be passed as input to another event.
A |-> B # Pipe the result of event A into event B
- Conditional Branching (
(0 -> X, 1 -> Y)): Conditional branching is used to define different execution paths based on the success or failure of an event. The condition is checked after the event's execution:0represents failure, and1represents success. Based on these outcomes, the next event(s) are chosen.
A -> B (0 -> C, 1 -> D) # If B fails (0), execute C; if B succeeds (1), execute D
Parsing and Integration
- Compilation: A lightweight pointy_parser transforms Pointy-Lang into a in-memory graph, representing the workflow.
- Runtime Loading: Engine loads and traverse the workflow base on dependencies.
- Error Handling: Parse failures raise descriptive exceptions (e.g., "Cycle detected when in DAG mode: A → B → A"); runtime mismatches (e.g., unresolved Event name) defer to Manager errors.
Pointy-Lang ensures workflows are declarative and reproducible, decoupling definition from execution while supporting IDE autocompletion through schema annotations. BFS traversal aligns naturally with its constructs, enabling breadth-prioritized control (e.g., early evaluation of conditionals).
Architecture Overview
The architecture is layered, with the Engine at the top coordinating the graph, descending to fine-grained execution via Flows and Executors. Data flows through an ExecutionContext—a mutable dataclass aggregating runtime state (e.g., ready nodes, partial results, metrics) for informed decisions. Pointy-Lang graphs feed directly into this as the initial Event set.
High-level layers:
- Orchestration Layer: Engine manages graph traversal and dependency resolution.
- Adaptation Layer: Flow Selector dispatches strategies based on context.
- Coordination Layer: Flows handle batch setup, delegation, and result aggregation.
- Execution Layer: Executors submit work (local or remote), returning futures for asynchronous resolution.
- Remote Layer: Managers on servers resolve, instantiate, and execute Events in isolated environments.
Communication between layers uses futures (inspired by concurrent.futures) for non-blocking waits, ensuring scalability in parallel modes.
| Layer | Responsibility | Key Interactions |
|---|---|---|
| Orchestration | Graph stepping, dep resolution | Converts Pointy-Lang Events to Tasks; loops on ready batches via BFS |
| Adaptation | Runtime strategy selection | Inspects Context (e.g., node count, success rates) |
| Coordination | Batch lifecycle management | Delegates to Tasks; aggregates metrics |
| Execution | Work submission | Local run or remote protocol dispatch |
| Remote | Server-side resolution/execution | Event instantiation, sandboxing, result serialization |
Detailed Components
Engine/Scheduler
The Engine is the central coordinator, responsible for maintaining the workflow's topological order and driving execution iterations. It performs a Breadth-First Search (BFS) traversal to control task execution flow, prioritizing linear paths for sequential/conditional control while respecting dependencies.
- Responsibilities:
- Parse Pointy-Lang definitions into Event structures.
- Convert initial Events into TaskNodes.
- Maintain a node registry (dict of TaskNodes by ID).
- Compute traversal paths using BFS stack.
- Iterate until exhaustion: Build Context (set immutable info part), invoke Selector, process batch via Flow, update globals.
- State Management: Tracks overall results (dict of node IDs to outcomes) and Context evolution; BFS stack for path tracking.
ExecutionContext
A lightweight, extensible dataclass serving as the inter-component communication vessel. It encapsulates batch-specific and cumulative state to enable context-aware adaptations. To support concurrency in parallel and group flows, the Context is partitioned into two logical parts: an immutable part (set once by the Engine before passing to the Selector and Flows) and a mutable part (managed by a dedicated State Manager). Each Context instance maintains its own independent State Manager, ensuring isolation: Tasks bound to non-shared Contexts are unaffected by updates from unrelated Contexts, preventing cross-contamination in multi-workflow or nested group scenarios.
-
Info Part (Immutable):
- Set by the Engine prior to delegation (e.g.,
ready_nodes,available_workers,task_tags,graph_depth). - Remains unchanged throughout the batch lifecycle, ensuring consistent decision-making in the Selector and read-only access in Flows.
- Promotes thread-safety by design—no races on static batch descriptors.
- Set by the Engine prior to delegation (e.g.,
-
Update Part (Mutable):
- Includes dynamic fields like
metrics(e.g.,{'batch_latency': 1.2, 'success_rate': 0.95}) andpartial_results(resolved outputs). - Managed by the State Manager, which leverages Python's
multiprocessing.Manager(or equivalent shared-memory primitives) to synchronize updates across parallel workers. This prevents data races in ParallelFlow or group executions, where multiple Tasks/Futures resolve concurrently. - Per-Context Isolation: Each Context spawns its own Manager instance, scoping shared state to that Context's lifecycle. This guarantees that Tasks in disjoint Contexts (e.g., separate Pointy-Lang groups or independent DAGs) operate on isolated update parts, avoiding interference even in multi-threaded or multi-process environments.
- The State Manager acts as a proxy: Flows/Tasks acquire locks or use atomic operations for writes (e.g., appending to partial_results), with eventual consistency for metrics aggregation post-batch.
- Optimization for SingleFlow: In non-concurrent modes (e.g., SingleFlow), the State Manager is disabled, falling back to direct dict access for the update part. Since no parallelism exists, races are impossible, eliminating unnecessary proxy overhead and improving throughput.
- Includes dynamic fields like
-
Usage: Passed immutably where possible; the update part is accessed via State Manager proxies. Influences Selector heuristics (e.g., fallback to sequential if
success_rate < 0.8). In single-threaded modes, the Manager degenerates to a simple dict for overhead minimization.
This partitioning ensures scalability: Info for fast reads in decisions, updates for safe concurrency without global locks.
Event
The atomic unit of work, defined declaratively as a self-contained descriptor of executable logic. Events are graph nodes, with metadata for routing and resilience. In Pointy-Lang, they are the primary primitive.
- Attributes:
name: Unique identifier (string) for remote resolution (e.g.,"ProcessDataEvent").executor: Callable representing the core logic (resolved on server).executor_config: Dict of parameters.stop condition: The condition at which process for the workflow can stop.options: configuration from pointy scriptretry_policy: Dict specifying resilience (e.g.,{'max_attempts': 3,}).
- Lifecycle: Defined client-side for local; name/args only for remote. Server-side Events are registered in a module/registry for dynamic import.
TaskNode
The runtime encapsulation of an Event, adding execution state and resilience logic. Tasks bridge declarative Events to imperative runs, ensuring atomicity.
- Responsibilities:
- Maintain state (
Pending,Running,Completed,Failed). - Inject resolved inputs (merging Event inputs with upstream partials).
Run with retry: Orchestrates attempts via policy—submit to Executor, await Future, apply backoff on transient fails, cap at max_attempts.
- Maintain state (
- Integration: Executor is injected (local or remote); retries are fully internal (no external signals), with exponential/linear backoff calculated per attempt. Updates to Context (e.g., results) route through the State Manager.
- Metrics: Emits per-attempt counters/histograms (e.g.,
task_retry_attempts_total[task_id]).
Flow Selector
A decision engine (implemented as a function or interface) that dynamically routes batches to execution strategies. It embodies the Factory pattern for runtime polymorphism.
- Decision Logic: Heuristics on Context's info part (e.g.,
len(task_profiles) > 1 → ParallelFlow; fallback to SingleFlow for low throughput or flakiness). - Extensibility: Configurable thresholds; pluggable rules (e.g., ML-based predictors on historical metrics).
- Output: Concrete Flow instance, preserving loose coupling.
Flow
Abstract coordinator for batch execution, focusing on setup, delegation, and aggregation. Flows implement the Strategy pattern, with concrete variants for modes.
- Abstract Responsibilities:
- Resource setup (e.g., worker pools, env vars).
- Iterate Tasks: Invoke
run with retry, await Futures, handle terminal exceptions. - Aggregate: Compute batch metrics (latency, success rate), update Context partials/states via State Manager.
- Concrete Variants:
- SingleFlow: Sequential processing; ideal for I/O-bound or dep-heavy batches (uses InlineExecutor). State Manager disabled for direct updates, as no concurrency risks exist.
- ParallelFlow: Concurrent via thread/process pools; caps at
available_workers(uses ThreadExecutor or remote equivalents). Relies on State Manager for race-free updates in group/parallel constructs.
- Boundaries: No retry/remote involvement—processes only resolved results/Futures.
Executor
Pluggable interface for work submission, abstracting local and distributed execution. Submits Event names/args (remote) or callables (local), returning a Future for async resolution.
- Abstract Methods:
submit(event_name: str, args: Dict) → Future: Dispatches and yields promise-like object.execute(...): Synchronous wrapper (awaits Future).
- Local Variants:
- DefaultExecutor: Synchronous, in-process runs.
- ThreadExecutor: Concurrent via thread pools (I/O-bound).
- Remote Variants (inherit
RemoteExecutor):- TcpExecutor: JSON-serialized payloads over SSL sockets.
- GRPCExecutor: Protobuf-structured RPCs with async stubs.
- XMLRPCExecutor: XML-RPC submissions to YARN for big data.
- Serialization: Protocol-specific (JSON for TCP/XML-RPC/Hadoop; protobuf for gRPC); metadata-only (name/args + checksum); no code transmission.
Manager (Remote Server)
The server-side counterpart to remote Executors, managing secure receipt, resolution, and execution. Operates as a long-lived service (e.g., daemon or Kubernetes pod).
- Responsibilities:
- Listen on encrypted channels (SSL/TLS mutual auth).
- Receive/validate payloads: Deserialize, verify SHA-256 checksums for integrity.
- Resolve Event: Dynamically import class by name from configured module/registry (e.g.,
importlib.getattr(events_module, event_name)). - Instantiate:
EventClass(**args); wrap as TaskNode. - Execute: Delegate to local Executor; optional sandboxing (restricted globals or containerization).
- Respond: Serialize result/error + checksum via matching protocol.
- Configuration:
sandbox_mode: Boolean—restricts to whitelisted builtins/resources (e.g., via RestrictedPython) or spawns isolated containers.events_module: String path to server-side Event registry.allowed_events: List of whitelisted names for security.
- Security: Prevents tampering (checksums), unauthorized imports (whitelists), and resource abuse (sandboxing).
Serializer
Protocol-agnostic adapters for payload packing/unpacking, ensuring portability and security.
- Variants:
- JsonSerializer: UTF-8 text for flexible channels; handles primitives/dicts with custom encoders.
- ProtobufSerializer: Binary schemas for gRPC; evolvable fields (e.g.,
event_name,argsmap).
- Common Features: Envelope with inner payload + checksum; metadata-focused (no serializable callables).
Execution Flow
The lifecycle is iterative, with the Engine driving phases until graph completion. Traversal uses DFS to descend paths, enabling control over flow (e.g., evaluate conditionals deeply before siblings) while queuing siblings for parallelism.
- Graph Loading: Parse Pointy-Lang workflow (
.ptyfile or inline) into Event structures; validate topology. - Encapsulation: Convert Events to TaskNodes (inject base Executor, copy policies).
- Initialization: Topological validation; initiate BFS for initial path computation (ready set as leaf/conditional nodes).
- Iteration Loop:
- Readiness Check: During BFS descent, filter pending Tasks where all deps resolved in partial_results; push siblings to queue for parallel consideration.
- Context Build: Engine sets immutable info part; initialize update part via State Manager (per-Context instance).
- Selection: Flow Selector evaluates heuristics → Instantiates Flow.
- Batch Coordination: Flow configures resources → Delegates to each Task's
run_with_retry(). - Task Execution: Submit to Executor → If remote: Serialize name/args → Manager resolves/instantiates/executes → Deserialize Future.
- Resolution: Await Futures; apply internal retries (backoff loops); update Task states/results via State Manager.
- Aggregation: Flow computes batch metrics → Updates Context partials via State Manager.
- Convergence: Re-check via DFS resumption; exit on exhaustion.
- Finalization: Collate results; expose Prometheus endpoint for scrape.
Parallelism Handling: In ParallelFlow, a coordinator pool submits concurrent Futures from BFS-queued siblings; deps resolved post-batch via partials. Groups treated as BFS subtrees. State Manager ensures thread-safe updates.
Distributed Flow: Client Executors fan out to multiple Managers (load-balanced via config); results merge asynchronously.
Security and Resilience
Security Measures
- Transport: SSL/TLS for all remotes (mutual cert auth; server verifies clients).
- Integrity: SHA-256 checksums on payloads/results—validated bidirectionally.
- Resolution: Whitelisted Event names; dynamic imports scoped to trusted modules.
- Isolation: Sandbox mode restricts execution (e.g., no file/network access); system-wide for trusted Events.
- Serialization: No code over wire—name/args only; JSON/protobuf prevent injection.
Resilience Mechanisms
- Dependencies: Strict topo enforcement via DFS; partial failures skip non-dependent branches (configurable).
- Retries: Per-Task, policy-driven (exponential backoff capped at 60s); transient errors (e.g., network) retry, permanent propagate.
- Futures: Timeouts/cancellation support; exceptions wrapped for tracing.
- Fault Isolation: Batch-level success rates influence future selections; no cascading re-queues. State Manager adds concurrency safety and per-Context isolation.
Observability
Metrics are collected hierarchically: Tasks emit granular (attempts, durations per ID), Flows aggregate batch-level (latency, rates), Engine exposes globals. Use Prometheus client for counters (retry_attempts_total), histograms (task_duration_seconds), and gauges (success_rate).
- Export: HTTP endpoint (e.g.,
/metrics) for scraping; labels for slicing (e.g., by task_id, flow_type). - Influence: Cumulative metrics in Context drive adaptive behaviors (e.g., throttle parallelism on high variance).
Logging: Structured (JSON) at DEBUG/INFO levels; trace IDs for distributed spans.
Extensibility and Deployment
- Pluggability:
- Add Flows/Executors: Implement abstract methods; register in config.
- Custom Selectors: Extend heuristics (e.g., integrate ML for prediction).
- Event Registry: Dynamic loading from DB/artifact store for hot-swaps.
- Pointy-Lang Extensions: Add keywords/macros via parser plugins; support custom traversals (e.g., hybrid DFS/BFS).
- Deployment:
- Local: Single-process Engine.
- Distributed: Engine as coordinator; Managers as microservices (Docker/K8s); queues (Redis) for ready nodes.
- Scaling: Horizontal Managers; Executor load-balancing via DNS round-robin.
- Testing: Unit (mock Futures/Contexts); integration (end-to-end DAGs in Pointy-Lang); chaos (inject remote fails).
Glossary
- DAG: Directed Acyclic Graph—workflow topology.
- CFG: Declarative Control-flow Graph
- DSL: Domain-Specific Language—Pointy-Lang for graph authoring.
- DFS: Depth-First Search—traversal algorithm for flow control.
- Future: Asynchronous result proxy (awaitable).
- Heuristic: Rule-based decision (e.g., threshold on metrics).
- State Manager: Concurrency primitive for Context updates (e.g., multiprocessing.Manager).
- Topo Sort: Algorithm ensuring dep order (integrated with DFS).
- Backoff: Progressive delay in retries (e.g., exponential: 1s, 2s, 4s).
References
- Design Patterns: Gamma et al., Design Patterns (Strategy, Factory).
- Distributed Systems: Tanenbaum, Distributed Systems (RPC, fault tolerance).
- Graph Algorithms: Cormen et al., Introduction to Algorithms (DFS and topo sort).
- DSL Design: van Wijngaarden, Orthogonal Design and Description of PL/I (inspirational for Pointy-Lang).
- Prefect v3 Docs: Task Runners and Executors (inspirational baseline).
- Prometheus: Client Python library for metrics.
For questions or refinements, reference conversation logs or ping the design lead. Contributions welcome via PRs focusing on one layer at a time.