ADAPTIVE_BATCH - cyberofficial/Synthalingua GitHub Wiki

Adaptive Batch Processing

Overview

Adaptive Batch Processing is an intelligent job allocation system that dynamically distributes audio transcription tasks between GPU and CPU for optimal performance. The system automatically detects your hardware capabilities, learns from job performance, and makes smart decisions about where to process each audio segment.

Key Features

🎯 Automatic Hardware Detection

GPU VRAM Detection: Automatically calculates how many concurrent GPU jobs your system can handle
RAM-Based CPU Suggestions: Suggests optimal CPU batch slots based on available system memory
Zero Configuration: Works out-of-the-box with sensible defaults

📊 Performance Learning

Historical Tracking: Records processing times for GPU and CPU jobs
Predictive Allocation: Uses past performance to predict which device is best for each job
Continuous Improvement: Gets smarter as it processes more segments

🧠 Smart Job Sorting

Priority-Based Allocation:
- Longest jobs → GPU (maximum performance benefit)
- Shortest jobs → CPU (minimal speed loss)
Dynamic Queue Management: Fills available slots optimally
Max CPU Time Limits: Prevents CPU from being overwhelmed by long jobs

🎮 Endgame Strategy

80% Rule (configurable): Stops allocating to CPU near completion
Predictable Finish Times: Ensures last jobs complete on faster GPU
Prevents Bottlenecks: Avoids waiting for slow CPU jobs at the end

💡 Optimization Suggestions

Real-Time Analysis: Monitors system performance during processing
Actionable Recommendations: Suggests configuration improvements
Learn and Adapt: Helps you tune settings for your specific hardware

How It Works

Phase 1: Hardware Detection

System Analysis:
├─ GPU: 12GB VRAM → Max 2 concurrent jobs (auto-detected)
├─ RAM: 32GB     → Suggests 3 CPU slots (user can override)
└─ Total Capacity: 5 concurrent jobs

Phase 2: Learning Phase

The first few jobs are used to learn your system's performance characteristics:

Job 1: 3.2s audio on GPU → took 8 seconds   (ratio: 2.5x)
Job 2: 1.5s audio on CPU → took 36 seconds  (ratio: 24x)
→ System learns: CPU is ~10x slower than GPU

Phase 3: Smart Allocation

Jobs are sorted by "GPU benefit" (how much faster they'd be on GPU):

Queue sorted by duration:
1. 26.8s segment → GPU (predict: 35s on CPU, 7s on GPU, benefit: 28s saved)
2. 15.3s segment → GPU (predict: 25s on CPU, 5s on GPU, benefit: 20s saved)
3. 0.7s segment  → CPU (predict: 17s on CPU, 2s on GPU, benefit: 15s saved)
4. 1.8s segment  → CPU (predict: 43s on CPU, 4s on GPU, benefit: 39s saved)
5. 2.1s segment  → CPU (predict: 50s on CPU, 5s on GPU, benefit: 45s saved)

Allocation:
🎮 GPU Slot 1: Segment 1 (longest)
🎮 GPU Slot 2: Segment 2 (2nd longest)
💻 CPU Slot 1: Segment 3 (shortest)
💻 CPU Slot 2: Segment 4 (2nd shortest)
💻 CPU Slot 3: Segment 5 (3rd shortest)

Phase 4: Endgame (80%+ Complete)

Progress: 82% complete
→ Stop allocating to CPU
→ Wait for GPU slots only
→ Ensures predictable completion time

Requirements

System Requirements

Model Source: FasterWhisper (--model_source fasterwhisper)
Device: GPU required (--device cuda or auto-detect, NOT --device cpu)
Mode: Caption generation (--makecaptions)

Important: Adaptive batch processing is designed to intelligently distribute work between GPU and CPU. If you only have CPU available, all jobs will run on CPU anyway, making adaptive batch unnecessary. Use regular --batchmode instead.

Usage

Custom Configuration

Override CPU batch slots:

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --cpu_batches 4 --file_input video.mp4

Set maximum CPU time per job (5 minutes):

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --max_cpu_time 300 --file_input video.mp4

Adjust endgame threshold (stop CPU at 70% instead of 80%):

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --stop_cpu_at 0.7 --file_input video.mp4

Full Example with All Options

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --model_source fasterwhisper \
    --cpu_batches 3 \
    --max_cpu_time 300 \
    --stop_cpu_at 0.8 \
    --file_input video.mp4 \
    --ram 11gb-v3 \
    --silent_detect

Command-Line Arguments

`--adaptive_batch`

Type: Flag (no value needed)
Default: Disabled
Description: Enable intelligent adaptive batch processing

Requirements:

Must be used with --makecaptions
Requires --model_source fasterwhisper
Requires GPU (cannot use --device cpu)
Overrides --batchmode if both are specified

Example:

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --file_input video.mp4

`--cpu_batches N`

Type: Integer
Default: Auto-detected based on RAM

<16GB RAM → 1 CPU slot
16-32GB RAM → 2 CPU slots
32GB RAM → 3 CPU slots

Description: Number of concurrent CPU batch processing slots

Recommendations:

Conservative (1-2): Safest, minimal system impact
Balanced (3-4): Good throughput, recommended for most systems
Aggressive (5+): Maximum speed but may cause system slowdown

Example:

# Use 4 CPU slots for high-RAM systems
python synthalingua.py --makecaptions --adaptive_batch --cpu_batches 4 --file_input video.mp4

`--max_cpu_time SECONDS`

Type: Integer
Default: 300 (5 minutes)
Range: 60-600 seconds recommended

Description: Maximum time a job can run on CPU before being forced to wait for GPU

Use Cases:

Lower values (60-120s): Prioritize GPU for more jobs
Default (300s): Balanced approach
Higher values (400-600s): Allow more CPU usage

Example:

# Limit CPU jobs to 2 minutes max
python synthalingua.py --makecaptions --adaptive_batch --max_cpu_time 120 --file_input video.mp4

`--batchjobsize SIZE`

Type: Float
Default: 4.0 (GB)
Range: 0.1-12.0

Description: Model size in GB used for GPU capacity calculation

Purpose: Tells the system how much VRAM each concurrent job requires, allowing accurate calculation of how many jobs fit in available VRAM.

Model Size Guidelines:

0.1-0.9 GB: Tiny models in optimized modes (e.g., 3GB mode using ~800MB)
1-2 GB: Tiny, base models
3-4 GB: Small, medium models (default)
6-7 GB: Large models, turbo
10-11 GB: Large-v2, large-v3 models

Example:

# Using 3GB model in optimized mode (~800MB actual usage)
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 3gb --batchjobsize 0.8 --file_input video.mp4

# Using 11GB model (large-v3) with 12GB VRAM → allows 1 GPU slot
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 11gb-v3 --batchjobsize 11 --file_input video.mp4

`--stop_cpu_at RATIO`

Type: Float
Default: 0.8 (80%)
Range: 0.6-0.95

Description: Progress threshold at which to stop allocating new jobs to CPU

The Endgame Strategy:

Lower values (0.6-0.7): Finish faster, more GPU-focused
Default (0.8): Balanced predictability
Higher values (0.85-0.95): Maximize CPU utilization

Example:

# Stop CPU allocation at 70% progress
python synthalingua.py --makecaptions --adaptive_batch --stop_cpu_at 0.7 --file_input video.mp4

Configuration Examples

Conservative Setup (Low RAM System)

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --cpu_batches 1 \
    --max_cpu_time 180 \
    --stop_cpu_at 0.75 \
    --file_input video.mp4

Best for: Systems with <16GB RAM, want minimal system impact

Balanced Setup (Recommended)

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --cpu_batches 3 \
    --max_cpu_time 300 \
    --stop_cpu_at 0.8 \
    --file_input video.mp4

Best for: Most systems, good balance of speed and stability

Aggressive Setup (High-End System)

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --cpu_batches 5 \
    --max_cpu_time 400 \
    --stop_cpu_at 0.85 \
    --file_input video.mp4

Best for: Systems with >32GB RAM and powerful CPU

Performance Comparison

Traditional Batch Mode

Time: ~8 minutes
Approach: Fixed batch size, no device awareness
Bottleneck: May overflow to CPU unpredictably

Example with 15 segments:
├─ Batch size: 3
├─ Device: Whatever's available
└─ No optimization

Adaptive Batch Mode

Time: ~5 minutes (38% faster!)
Approach: Smart GPU/CPU allocation
Benefits: 
  ✓ Auto-detected capacity (2 GPU + 3 CPU = 5 concurrent)
  ✓ Longest jobs to GPU, shortest to CPU
  ✓ Endgame strategy prevents slowdowns
  ✓ Continuous learning improves allocation

Example with 15 segments:
├─ Max parallel: 5 jobs (2 GPU + 3 CPU)
├─ Smart sorting by duration
├─ Performance prediction
└─ Optimization suggestions

Optimization Suggestions

The system analyzes performance and provides actionable recommendations:

Example Suggestion 1: Increase CPU Batches

💡 OPTIMIZATION SUGGESTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CPU jobs finishing quickly, can handle more parallel work

Current: 3 CPU batches
Suggested: 4 CPU batches
Benefit: ~1-2 minutes faster completion

Example Suggestion 2: CPU Performance Analysis

💡 OPTIMIZATION SUGGESTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CPU performance is good (only 8.2x slower than GPU)

Current: 300s max CPU time
Suggested: 360s max CPU time
Benefit: More efficient job distribution

Technical Details

Hardware Detection

GPU Capacity Calculation:

Total VRAM: 12 GB
OS Reserved: ~0.2 GB (auto-detected or 0.5 GB fallback)
Available VRAM: 11.8 GB
Model Size per Job: 4 GB
Max GPU Batches: 11.8 ÷ 4 = 2 (rounded down)

CPU Capacity Suggestion:

if RAM < 16 GB:
    suggest 1 CPU slot  # Conservative
elif RAM < 32 GB:
    suggest 2 CPU slots  # Moderate
else:
    suggest 3 CPU slots  # Balanced

Performance Tracking

The system maintains two performance logs:

GPU Jobs: [(audio_length, processing_time), ...]
CPU Jobs: [(audio_length, processing_time), ...]

Prediction Formula:

# After 3+ jobs, use average ratio
average_ratio = sum(processing_time / audio_length) / count

# Predict new job
predicted_time = audio_length * average_ratio

# Until 3 jobs, use estimates:
GPU: audio_length * 2.0   (rough estimate)
CPU: audio_length * 24.0  (12x slower than GPU)

Job Sorting Algorithm

for each segment:
    gpu_time = predict_time(segment, "gpu")
    cpu_time = predict_time(segment, "cpu")
    gpu_benefit = cpu_time - gpu_time
    
# Sort by gpu_benefit (descending)
# Highest benefit = longest segments = allocated to GPU first
# Lowest benefit = shortest segments = allocated to CPU

Troubleshooting

Issue: "Not enough VRAM for adaptive batch"

Solution: System has low GPU memory

Try with CPU-only: --cpu_batches 3 and no GPU
Or reduce model size: --ram 6gb instead of --ram 11gb-v3

Issue: "System becomes unresponsive"

Solution: Too many CPU batches for your system

Reduce CPU slots: --cpu_batches 1 or --cpu_batches 2
Lower max CPU time: --max_cpu_time 120

Issue: "GPU slots underutilized"

Solution: System is too conservative

Increase CPU batches: --cpu_batches 4
Raise stop threshold: --stop_cpu_at 0.9

Issue: "Jobs taking too long on CPU"

Solution: CPU time limit too high

Lower max CPU time: --max_cpu_time 180
Lower stop threshold: --stop_cpu_at 0.7

Best Practices

Start with defaults - Let the system auto-detect optimal settings first
Monitor first run - Watch the allocation patterns and suggestions
Adjust gradually - Make small changes based on recommendations
Consider your use case:
- Fast turnaround: Use aggressive settings
- System stability: Use conservative settings
- Batch processing: Use balanced settings

Module Architecture

The adaptive batch system consists of four main classes:

`BatchConfig`

Manages configuration parameters
Auto-detects GPU capacity
Suggests CPU capacity based on RAM
Displays formatted configuration

`PerformanceTracker`

Records completed job metrics
Predicts processing times
Calculates CPU/GPU speed ratios

`JobScheduler`

Manages job queues
Allocates jobs to GPU/CPU slots
Implements endgame strategy
Tracks completion progress

`OptimizationSuggester`

Analyzes performance data
Generates actionable recommendations
Displays suggestions to user

Requirements

Python 3.7+
PyTorch (for GPU detection)
psutil (for system resource detection)
CUDA-capable GPU (optional, falls back to CPU-only mode)

All dependencies are already included in Synthalingua.

Limitations

Only works with --makecaptions mode
Requires --silent_detect for optimal performance (optional but recommended)
GPU detection requires CUDA-capable device
First batch may not be optimally allocated (learning phase)

Future Enhancements

Potential improvements for future versions:

Dynamic batch size adjustment during runtime
Model size auto-detection based on --ram setting
Support for multiple GPU devices
Persistent performance history across sessions
Web UI integration for real-time monitoring

Credits

Implemented as part of Synthalingua 1.2.5 by the Synthalingua development team.

Need Help? Check out the main documentation or open an issue on GitHub.