API Reference - travisvn/chatterbox-tts-api GitHub Wiki
Chatterbox TTS FastAPI
This API provides a FastAPI-based web service for the Chatterbox TTS text-to-speech system, designed to be compatible with OpenAI's TTS API format.
Features
- OpenAI-compatible API: Uses similar endpoint structure to OpenAI's text-to-speech API
- FastAPI Performance: High-performance async API with automatic documentation
- Type Safety: Full Pydantic validation for requests and responses
- Interactive Documentation: Automatic Swagger UI and ReDoc generation
- Automatic text chunking: Automatically breaks long text into manageable chunks to handle character limits
- Voice cloning: Uses the pre-specified
voice-sample.mp3
file for voice conditioning - Async Support: Non-blocking request handling with better concurrency
- Error handling: Comprehensive error handling with appropriate HTTP status codes
- Health monitoring: Health check endpoint for monitoring service status
- Environment-based configuration: Fully configurable via environment variables
- Docker support: Ready for containerized deployment
Setup
Prerequisites
-
Ensure you have the Chatterbox TTS package installed:
pip install chatterbox-tts
-
Install FastAPI and other required dependencies:
pip install fastapi uvicorn[standard] torchaudio requests python-dotenv
-
Ensure you have a
voice-sample.mp3
file in the project root directory for voice conditioning
Configuration
Copy the example environment file and customize it:
cp .env.example .env
nano .env # Edit with your preferred settings
Key environment variables:
PORT=4123
- API server portEXAGGERATION=0.5
- Default emotion intensity (0.25-2.0)CFG_WEIGHT=0.5
- Default pace control (0.0-1.0)TEMPERATURE=0.8
- Default sampling temperature (0.05-5.0)VOICE_SAMPLE_PATH=./voice-sample.mp3
- Path to voice sample fileDEVICE=auto
- Device selection (auto/cuda/mps/cpu)
See .env.example
for all available options.
Running the API
Start the API server:
# Method 1: Direct uvicorn (recommended for development)
uvicorn app.main:app --host 0.0.0.0 --port 4123
# Method 2: Using the main script
python main.py
# Method 3: With auto-reload for development
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload
The server will:
- Automatically detect the best available device (CUDA, MPS, or CPU)
- Load the Chatterbox TTS model asynchronously
- Start the FastAPI server on
http://localhost:4123
(or your configured port) - Provide interactive documentation at
/docs
and/redoc
API Documentation
Once running, you can access:
- Interactive API Docs (Swagger UI): http://localhost:4123/docs
- Alternative Documentation (ReDoc): http://localhost:4123/redoc
- OpenAPI Schema: http://localhost:4123/openapi.json
API Endpoints
1. Text-to-Speech Generation
POST /v1/audio/speech
Generate speech from text using the Chatterbox TTS model.
Request Body (Pydantic Model):
{
"input": "Text to convert to speech",
"voice": "alloy", // OpenAI voice name or custom voice library name
"response_format": "wav", // Ignored - always returns WAV
"speed": 1.0, // Ignored - use model's built-in parameters
"exaggeration": 0.7, // Optional - override default (0.25-2.0)
"cfg_weight": 0.4, // Optional - override default (0.0-1.0)
"temperature": 0.9 // Optional - override default (0.05-5.0)
}
Validation:
input
: Required, 1-3000 characters, automatically trimmedexaggeration
: Optional, 0.25-2.0 range validationcfg_weight
: Optional, 0.0-1.0 range validationtemperature
: Optional, 0.05-5.0 range validation
Response:
- Content-Type:
audio/wav
- Binary audio data in WAV format via StreamingResponse
Example:
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello, this is a test of the text to speech system."}' \
--output speech.wav
With custom parameters:
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3}' \
--output dramatic.wav
Using a voice from the voice library:
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello with custom voice!", "voice": "my-uploaded-voice"}' \
--output custom_voice.wav
Note: See Voice Library Management Documentation for complete voice management API details.
2. Health Check
GET /health
Check if the API is running and the model is loaded.
Response (HealthResponse model):
{
"status": "healthy",
"model_loaded": true,
"device": "cuda",
"config": {
"max_chunk_length": 280,
"max_total_length": 3000,
"voice_sample_path": "./voice-sample.mp3",
"default_exaggeration": 0.5,
"default_cfg_weight": 0.5,
"default_temperature": 0.8
}
}
3. List Models
GET /v1/models
List available models (OpenAI API compatibility).
Response (ModelsResponse model):
{
"object": "list",
"data": [
{
"id": "chatterbox-tts-1",
"object": "model",
"created": 1677649963,
"owned_by": "resemble-ai"
}
]
}
4. Configuration Info
GET /config
Get current configuration (useful for debugging).
Response (ConfigResponse model):
{
"server": {
"host": "0.0.0.0",
"port": 4123
},
"model": {
"device": "cuda",
"voice_sample_path": "./voice-sample.mp3",
"model_cache_dir": "./models"
},
"defaults": {
"exaggeration": 0.5,
"cfg_weight": 0.5,
"temperature": 0.8,
"max_chunk_length": 280,
"max_total_length": 3000
}
}
5. API Documentation Endpoints
GET /docs
- Interactive Swagger UI documentation
GET /redoc
- Alternative ReDoc documentation
GET /openapi.json
- OpenAPI schema specification
Text Processing
Automatic Chunking
The API automatically handles long text inputs by:
- Character limit: Splits text longer than the configured chunk size (default: 280 characters)
- Sentence preservation: Attempts to split at sentence boundaries (
.
,!
,?
) - Fallback splitting: If sentences are too long, splits at commas, semicolons, or other natural breaks
- Audio concatenation: Seamlessly combines audio from multiple chunks
Maximum Limits
- Soft limit: Configurable characters per chunk (default: 280)
- Hard limit: Configurable total characters (default: 3000)
- Automatic processing: No manual intervention required
Error Handling
FastAPI provides enhanced error handling with automatic validation:
- 422 Unprocessable Entity: Invalid input validation (Pydantic errors)
- 400 Bad Request: Business logic errors (text too long, etc.)
- 500 Internal Server Error: Model or processing errors
Error Response Format:
{
"error": {
"message": "Missing required field: 'input'",
"type": "invalid_request_error"
}
}
Validation Error Example:
{
"detail": [
{
"type": "greater_equal",
"loc": ["body", "exaggeration"],
"msg": "Input should be greater than or equal to 0.25",
"input": 0.1
}
]
}
Testing
Use the enhanced test script to verify the API functionality:
python tests/test_api.py
The test script will:
- Test health check endpoint
- Test models endpoint
- Test API documentation endpoints (new!)
- Generate speech for various text lengths
- Test custom parameter validation
- Test error handling with validation
- Save generated audio files as
test_output_*.wav
Configuration
You can configure the API through environment variables or by modifying .env.example
:
# Server Configuration
PORT=4123
HOST=0.0.0.0
# TTS Model Settings
EXAGGERATION=0.5 # Emotion intensity (0.25-2.0)
CFG_WEIGHT=0.5 # Pace control (0.0-1.0)
TEMPERATURE=0.8 # Sampling temperature (0.05-5.0)
# Text Processing
MAX_CHUNK_LENGTH=280 # Characters per chunk
MAX_TOTAL_LENGTH=3000 # Total character limit
# Voice and Model Settings
VOICE_SAMPLE_PATH=./voice-sample.mp3
VOICE_LIBRARY_DIR=./voices
DEVICE=auto # auto/cuda/mps/cpu
MODEL_CACHE_DIR=./models
Parameter Effects
Exaggeration (0.25-2.0):
0.3-0.4
: Very neutral, professional0.5
: Neutral (default)0.7-0.8
: More expressive1.0+
: Very dramatic (may be unstable)
CFG Weight (0.0-1.0):
0.2-0.3
: Faster speech0.5
: Balanced (default)0.7-0.8
: Slower, more deliberate
Temperature (0.05-5.0):
0.4-0.6
: More consistent0.8
: Balanced (default)1.0+
: More creative/random
Docker Deployment
For Docker deployment, see DOCKER_README.md for complete instructions.
Quick start with Docker Compose:
cp .env.example .env # Customize as needed
docker compose up -d
Quick start with Docker:
docker build -t chatterbox-tts .
docker run -d -p 4123:4123 \
-v ./voice-sample.mp3:/app/voice-sample.mp3:ro \
-e EXAGGERATION=0.7 \
chatterbox-tts
Performance Notes
FastAPI Benefits:
- Async performance: Better handling of concurrent requests
- Faster JSON serialization: ~25% faster than Flask
- Type validation: Prevents invalid requests at the API level
- Auto documentation: No manual API doc maintenance
Hardware Recommendations:
- Model loading: The model is loaded once at startup (can take 30-60 seconds)
- First request: May be slower due to initial model warm-up
- Subsequent requests: Should be faster due to model caching
- Memory usage: Varies by device (GPU recommended for best performance)
- Concurrent requests: FastAPI async support allows better multi-request handling
Integration Examples
Python with requests
import requests
# Basic request
response = requests.post(
"http://localhost:4123/v1/audio/speech",
json={"input": "Hello world!"}
)
with open("output.wav", "wb") as f:
f.write(response.content)
# With custom parameters and validation
response = requests.post(
"http://localhost:4123/v1/audio/speech",
json={
"input": "Exciting news!",
"exaggeration": 0.8,
"cfg_weight": 0.4,
"temperature": 1.0
}
)
# Handle validation errors
if response.status_code == 422:
print("Validation error:", response.json())
JavaScript/Node.js
const response = await fetch('http://localhost:4123/v1/audio/speech', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
input: 'Hello world!',
exaggeration: 0.7,
}),
});
if (response.status === 422) {
const error = await response.json();
console.log('Validation error:', error);
} else {
const audioBuffer = await response.arrayBuffer();
// Save or play the audio buffer
}
cURL
# Basic usage
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Your text here"}' \
--output output.wav
# With custom parameters
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Dramatic text!", "exaggeration": 1.0, "cfg_weight": 0.3}' \
--output dramatic.wav
# Test the interactive documentation
curl http://localhost:4123/docs
Development Features
FastAPI Development Tools
- Auto-reload: Use
--reload
flag for development - Interactive testing: Use
/docs
for live API testing - Type hints: Full IDE support with Pydantic models
- Validation: Automatic request/response validation
- OpenAPI: Machine-readable API specification
Development Mode
# Start with auto-reload
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload
# Or with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug
Troubleshooting
Common Issues
- Model not loading: Ensure Chatterbox TTS is properly installed
- Voice sample missing: Verify
voice-sample.mp3
exists at the configured path - CUDA out of memory: Try using CPU device (
DEVICE=cpu
) - Slow performance: GPU recommended; ensure CUDA/MPS is available
- Port conflicts: Change
PORT
environment variable to an available port - Uvicorn not found: Install with
pip install uvicorn[standard]
FastAPI Specific Issues
Startup Issues:
# Check if uvicorn is installed
uvicorn --version
# Run with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug
# Alternative startup method
python main.py
Validation Errors:
Visit /docs
to see the interactive API documentation and test your requests.
Checking Configuration
# Check if API is running
curl http://localhost:4123/health
# View current configuration
curl http://localhost:4123/config
# Check API documentation
curl http://localhost:4123/openapi.json
# Test with simple text
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Test"}' \
--output test.wav
Migration from Flask
If you're migrating from the previous Flask version:
- Dependencies: Update to
fastapi
anduvicorn
instead offlask
- Startup: Use
uvicorn app.main:app
instead ofpython api.py
- Documentation: Visit
/docs
for interactive API testing - Validation: Error responses now use HTTP 422 for validation errors
- Performance: Expect 25-40% better performance for JSON responses
All existing API endpoints and request/response formats remain compatible.