ADAPTIVE_BATCH - cyberofficial/Synthalingua GitHub Wiki
Adaptive Batch Processing
Overview
Adaptive Batch Processing is an intelligent job allocation system that dynamically distributes audio transcription tasks between GPU and CPU for optimal performance. The system automatically detects your hardware capabilities, learns from job performance, and makes smart decisions about where to process each audio segment.
Key Features
๐ฏ Automatic Hardware Detection
- GPU VRAM Detection: Automatically calculates how many concurrent GPU jobs your system can handle
- RAM-Based CPU Suggestions: Suggests optimal CPU batch slots based on available system memory
- Zero Configuration: Works out-of-the-box with sensible defaults
๐ Performance Learning
- Historical Tracking: Records processing times for GPU and CPU jobs
- Predictive Allocation: Uses past performance to predict which device is best for each job
- Continuous Improvement: Gets smarter as it processes more segments
๐ง Smart Job Sorting
- Priority-Based Allocation:
- Longest jobs โ GPU (maximum performance benefit)
- Shortest jobs โ CPU (minimal speed loss)
- Dynamic Queue Management: Fills available slots optimally
- Max CPU Time Limits: Prevents CPU from being overwhelmed by long jobs
๐ฎ Endgame Strategy
- 80% Rule (configurable): Stops allocating to CPU near completion
- Predictable Finish Times: Ensures last jobs complete on faster GPU
- Prevents Bottlenecks: Avoids waiting for slow CPU jobs at the end
๐ก Optimization Suggestions
- Real-Time Analysis: Monitors system performance during processing
- Actionable Recommendations: Suggests configuration improvements
- Learn and Adapt: Helps you tune settings for your specific hardware
How It Works
Phase 1: Hardware Detection
System Analysis:
โโ GPU: 12GB VRAM โ Max 2 concurrent jobs (auto-detected)
โโ RAM: 32GB โ Suggests 3 CPU slots (user can override)
โโ Total Capacity: 5 concurrent jobs
Phase 2: Learning Phase
The first few jobs are used to learn your system's performance characteristics:
Job 1: 3.2s audio on GPU โ took 8 seconds (ratio: 2.5x)
Job 2: 1.5s audio on CPU โ took 36 seconds (ratio: 24x)
โ System learns: CPU is ~10x slower than GPU
Phase 3: Smart Allocation
Jobs are sorted by "GPU benefit" (how much faster they'd be on GPU):
Queue sorted by duration:
1. 26.8s segment โ GPU (predict: 35s on CPU, 7s on GPU, benefit: 28s saved)
2. 15.3s segment โ GPU (predict: 25s on CPU, 5s on GPU, benefit: 20s saved)
3. 0.7s segment โ CPU (predict: 17s on CPU, 2s on GPU, benefit: 15s saved)
4. 1.8s segment โ CPU (predict: 43s on CPU, 4s on GPU, benefit: 39s saved)
5. 2.1s segment โ CPU (predict: 50s on CPU, 5s on GPU, benefit: 45s saved)
Allocation:
๐ฎ GPU Slot 1: Segment 1 (longest)
๐ฎ GPU Slot 2: Segment 2 (2nd longest)
๐ป CPU Slot 1: Segment 3 (shortest)
๐ป CPU Slot 2: Segment 4 (2nd shortest)
๐ป CPU Slot 3: Segment 5 (3rd shortest)
Phase 4: Endgame (80%+ Complete)
Progress: 82% complete
โ Stop allocating to CPU
โ Wait for GPU slots only
โ Ensures predictable completion time
Requirements
System Requirements
- Model Source: FasterWhisper (
--model_source fasterwhisper) - Device: GPU required (
--device cudaor auto-detect, NOT--device cpu) - Mode: Caption generation (
--makecaptions)
Important: Adaptive batch processing is designed to intelligently distribute work between GPU and CPU. If you only have CPU available, all jobs will run on CPU anyway, making adaptive batch unnecessary. Use regular --batchmode instead.
Usage
Custom Configuration
Override CPU batch slots:
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --cpu_batches 4 --file_input video.mp4
Set maximum CPU time per job (5 minutes):
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --max_cpu_time 300 --file_input video.mp4
Adjust endgame threshold (stop CPU at 70% instead of 80%):
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --stop_cpu_at 0.7 --file_input video.mp4
Full Example with All Options
python synthalingua.py \
--makecaptions \
--adaptive_batch \
--model_source fasterwhisper \
--cpu_batches 3 \
--max_cpu_time 300 \
--stop_cpu_at 0.8 \
--file_input video.mp4 \
--ram 11gb-v3 \
--silent_detect
Command-Line Arguments
--adaptive_batch
Type: Flag (no value needed)
Default: Disabled
Description: Enable intelligent adaptive batch processing
Requirements:
- Must be used with
--makecaptions - Requires
--model_source fasterwhisper - Requires GPU (cannot use
--device cpu) - Overrides
--batchmodeif both are specified
Example:
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --file_input video.mp4
--cpu_batches N
Type: Integer
Default: Auto-detected based on RAM
- <16GB RAM โ 1 CPU slot
- 16-32GB RAM โ 2 CPU slots
-
32GB RAM โ 3 CPU slots
Description: Number of concurrent CPU batch processing slots
Recommendations:
- Conservative (1-2): Safest, minimal system impact
- Balanced (3-4): Good throughput, recommended for most systems
- Aggressive (5+): Maximum speed but may cause system slowdown
Example:
# Use 4 CPU slots for high-RAM systems
python synthalingua.py --makecaptions --adaptive_batch --cpu_batches 4 --file_input video.mp4
--max_cpu_time SECONDS
Type: Integer
Default: 300 (5 minutes)
Range: 60-600 seconds recommended
Description: Maximum time a job can run on CPU before being forced to wait for GPU
Use Cases:
- Lower values (60-120s): Prioritize GPU for more jobs
- Default (300s): Balanced approach
- Higher values (400-600s): Allow more CPU usage
Example:
# Limit CPU jobs to 2 minutes max
python synthalingua.py --makecaptions --adaptive_batch --max_cpu_time 120 --file_input video.mp4
--batchjobsize SIZE
Type: Float
Default: 4.0 (GB)
Range: 0.1-12.0
Description: Model size in GB used for GPU capacity calculation
Purpose: Tells the system how much VRAM each concurrent job requires, allowing accurate calculation of how many jobs fit in available VRAM.
Model Size Guidelines:
- 0.1-0.9 GB: Tiny models in optimized modes (e.g., 3GB mode using ~800MB)
- 1-2 GB: Tiny, base models
- 3-4 GB: Small, medium models (default)
- 6-7 GB: Large models, turbo
- 10-11 GB: Large-v2, large-v3 models
Example:
# Using 3GB model in optimized mode (~800MB actual usage)
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 3gb --batchjobsize 0.8 --file_input video.mp4
# Using 11GB model (large-v3) with 12GB VRAM โ allows 1 GPU slot
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 11gb-v3 --batchjobsize 11 --file_input video.mp4
--stop_cpu_at RATIO
Type: Float
Default: 0.8 (80%)
Range: 0.6-0.95
Description: Progress threshold at which to stop allocating new jobs to CPU
The Endgame Strategy:
- Lower values (0.6-0.7): Finish faster, more GPU-focused
- Default (0.8): Balanced predictability
- Higher values (0.85-0.95): Maximize CPU utilization
Example:
# Stop CPU allocation at 70% progress
python synthalingua.py --makecaptions --adaptive_batch --stop_cpu_at 0.7 --file_input video.mp4
Configuration Examples
Conservative Setup (Low RAM System)
python synthalingua.py \
--makecaptions \
--adaptive_batch \
--cpu_batches 1 \
--max_cpu_time 180 \
--stop_cpu_at 0.75 \
--file_input video.mp4
Best for: Systems with <16GB RAM, want minimal system impact
Balanced Setup (Recommended)
python synthalingua.py \
--makecaptions \
--adaptive_batch \
--cpu_batches 3 \
--max_cpu_time 300 \
--stop_cpu_at 0.8 \
--file_input video.mp4
Best for: Most systems, good balance of speed and stability
Aggressive Setup (High-End System)
python synthalingua.py \
--makecaptions \
--adaptive_batch \
--cpu_batches 5 \
--max_cpu_time 400 \
--stop_cpu_at 0.85 \
--file_input video.mp4
Best for: Systems with >32GB RAM and powerful CPU
Performance Comparison
Traditional Batch Mode
Time: ~8 minutes
Approach: Fixed batch size, no device awareness
Bottleneck: May overflow to CPU unpredictably
Example with 15 segments:
โโ Batch size: 3
โโ Device: Whatever's available
โโ No optimization
Adaptive Batch Mode
Time: ~5 minutes (38% faster!)
Approach: Smart GPU/CPU allocation
Benefits:
โ Auto-detected capacity (2 GPU + 3 CPU = 5 concurrent)
โ Longest jobs to GPU, shortest to CPU
โ Endgame strategy prevents slowdowns
โ Continuous learning improves allocation
Example with 15 segments:
โโ Max parallel: 5 jobs (2 GPU + 3 CPU)
โโ Smart sorting by duration
โโ Performance prediction
โโ Optimization suggestions
Optimization Suggestions
The system analyzes performance and provides actionable recommendations:
Example Suggestion 1: Increase CPU Batches
๐ก OPTIMIZATION SUGGESTION
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
CPU jobs finishing quickly, can handle more parallel work
Current: 3 CPU batches
Suggested: 4 CPU batches
Benefit: ~1-2 minutes faster completion
Example Suggestion 2: CPU Performance Analysis
๐ก OPTIMIZATION SUGGESTION
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
CPU performance is good (only 8.2x slower than GPU)
Current: 300s max CPU time
Suggested: 360s max CPU time
Benefit: More efficient job distribution
Technical Details
Hardware Detection
GPU Capacity Calculation:
Total VRAM: 12 GB
OS Reserved: ~0.2 GB (auto-detected or 0.5 GB fallback)
Available VRAM: 11.8 GB
Model Size per Job: 4 GB
Max GPU Batches: 11.8 รท 4 = 2 (rounded down)
CPU Capacity Suggestion:
if RAM < 16 GB:
suggest 1 CPU slot # Conservative
elif RAM < 32 GB:
suggest 2 CPU slots # Moderate
else:
suggest 3 CPU slots # Balanced
Performance Tracking
The system maintains two performance logs:
- GPU Jobs:
[(audio_length, processing_time), ...] - CPU Jobs:
[(audio_length, processing_time), ...]
Prediction Formula:
# After 3+ jobs, use average ratio
average_ratio = sum(processing_time / audio_length) / count
# Predict new job
predicted_time = audio_length * average_ratio
# Until 3 jobs, use estimates:
GPU: audio_length * 2.0 (rough estimate)
CPU: audio_length * 24.0 (12x slower than GPU)
Job Sorting Algorithm
for each segment:
gpu_time = predict_time(segment, "gpu")
cpu_time = predict_time(segment, "cpu")
gpu_benefit = cpu_time - gpu_time
# Sort by gpu_benefit (descending)
# Highest benefit = longest segments = allocated to GPU first
# Lowest benefit = shortest segments = allocated to CPU
Troubleshooting
Issue: "Not enough VRAM for adaptive batch"
Solution: System has low GPU memory
- Try with CPU-only:
--cpu_batches 3and no GPU - Or reduce model size:
--ram 6gbinstead of--ram 11gb-v3
Issue: "System becomes unresponsive"
Solution: Too many CPU batches for your system
- Reduce CPU slots:
--cpu_batches 1or--cpu_batches 2 - Lower max CPU time:
--max_cpu_time 120
Issue: "GPU slots underutilized"
Solution: System is too conservative
- Increase CPU batches:
--cpu_batches 4 - Raise stop threshold:
--stop_cpu_at 0.9
Issue: "Jobs taking too long on CPU"
Solution: CPU time limit too high
- Lower max CPU time:
--max_cpu_time 180 - Lower stop threshold:
--stop_cpu_at 0.7
Best Practices
- Start with defaults - Let the system auto-detect optimal settings first
- Monitor first run - Watch the allocation patterns and suggestions
- Adjust gradually - Make small changes based on recommendations
- Consider your use case:
- Fast turnaround: Use aggressive settings
- System stability: Use conservative settings
- Batch processing: Use balanced settings
Module Architecture
The adaptive batch system consists of four main classes:
BatchConfig
- Manages configuration parameters
- Auto-detects GPU capacity
- Suggests CPU capacity based on RAM
- Displays formatted configuration
PerformanceTracker
- Records completed job metrics
- Predicts processing times
- Calculates CPU/GPU speed ratios
JobScheduler
- Manages job queues
- Allocates jobs to GPU/CPU slots
- Implements endgame strategy
- Tracks completion progress
OptimizationSuggester
- Analyzes performance data
- Generates actionable recommendations
- Displays suggestions to user
Requirements
- Python 3.7+
- PyTorch (for GPU detection)
- psutil (for system resource detection)
- CUDA-capable GPU (optional, falls back to CPU-only mode)
All dependencies are already included in Synthalingua.
Limitations
- Only works with
--makecaptionsmode - Requires
--silent_detectfor optimal performance (optional but recommended) - GPU detection requires CUDA-capable device
- First batch may not be optimally allocated (learning phase)
Future Enhancements
Potential improvements for future versions:
- Dynamic batch size adjustment during runtime
- Model size auto-detection based on
--ramsetting - Support for multiple GPU devices
- Persistent performance history across sessions
- Web UI integration for real-time monitoring
Credits
Implemented as part of Synthalingua 1.2.5 by the Synthalingua development team.
Need Help? Check out the main documentation or open an issue on GitHub.