API Reference - travisvn/chatterbox-tts-api GitHub Wiki

Chatterbox TTS FastAPI

This API provides a FastAPI-based web service for the Chatterbox TTS text-to-speech system, designed to be compatible with OpenAI's TTS API format.

Features

OpenAI-compatible API: Uses similar endpoint structure to OpenAI's text-to-speech API
FastAPI Performance: High-performance async API with automatic documentation
Type Safety: Full Pydantic validation for requests and responses
Interactive Documentation: Automatic Swagger UI and ReDoc generation
Automatic text chunking: Automatically breaks long text into manageable chunks to handle character limits
Voice cloning: Uses the pre-specified voice-sample.mp3 file for voice conditioning
Async Support: Non-blocking request handling with better concurrency
Error handling: Comprehensive error handling with appropriate HTTP status codes
Health monitoring: Health check endpoint for monitoring service status
Environment-based configuration: Fully configurable via environment variables
Docker support: Ready for containerized deployment

Setup

Prerequisites

Ensure you have the Chatterbox TTS package installed:
```
pip install chatterbox-tts
```

Install FastAPI and other required dependencies:

pip install fastapi uvicorn[standard] torchaudio requests python-dotenv

Ensure you have a voice-sample.mp3 file in the project root directory for voice conditioning

Configuration

Copy the example environment file and customize it:

cp .env.example .env
nano .env  # Edit with your preferred settings

Key environment variables:

PORT=4123 - API server port
EXAGGERATION=0.5 - Default emotion intensity (0.25-2.0)
CFG_WEIGHT=0.5 - Default pace control (0.0-1.0)
TEMPERATURE=0.8 - Default sampling temperature (0.05-5.0)
VOICE_SAMPLE_PATH=./voice-sample.mp3 - Path to voice sample file
DEVICE=auto - Device selection (auto/cuda/mps/cpu)

See .env.example for all available options.

Running the API

Start the API server:

# Method 1: Direct uvicorn (recommended for development)
uvicorn app.main:app --host 0.0.0.0 --port 4123

# Method 2: Using the main script
python main.py

# Method 3: With auto-reload for development
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload

The server will:

Automatically detect the best available device (CUDA, MPS, or CPU)
Load the Chatterbox TTS model asynchronously
Start the FastAPI server on http://localhost:4123 (or your configured port)
Provide interactive documentation at /docs and /redoc

API Documentation

Once running, you can access:

Interactive API Docs (Swagger UI): http://localhost:4123/docs
Alternative Documentation (ReDoc): http://localhost:4123/redoc
OpenAPI Schema: http://localhost:4123/openapi.json

API Endpoints

1. Text-to-Speech Generation

POST /v1/audio/speech

Generate speech from text using the Chatterbox TTS model.

Request Body (Pydantic Model):

{
  "input": "Text to convert to speech",
  "voice": "alloy", // OpenAI voice name or custom voice library name
  "response_format": "wav", // Ignored - always returns WAV
  "speed": 1.0, // Ignored - use model's built-in parameters
  "exaggeration": 0.7, // Optional - override default (0.25-2.0)
  "cfg_weight": 0.4, // Optional - override default (0.0-1.0)
  "temperature": 0.9 // Optional - override default (0.05-5.0)
}

Validation:

input: Required, 1-3000 characters, automatically trimmed
exaggeration: Optional, 0.25-2.0 range validation
cfg_weight: Optional, 0.0-1.0 range validation
temperature: Optional, 0.05-5.0 range validation

Response:

Content-Type: audio/wav
Binary audio data in WAV format via StreamingResponse

Example:

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello, this is a test of the text to speech system."}' \
  --output speech.wav

With custom parameters:

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3}' \
  --output dramatic.wav

Using a voice from the voice library:

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello with custom voice!", "voice": "my-uploaded-voice"}' \
  --output custom_voice.wav

Note: See Voice Library Management Documentation for complete voice management API details.

2. Health Check

GET /health

Check if the API is running and the model is loaded.

Response (HealthResponse model):

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda",
  "config": {
    "max_chunk_length": 280,
    "max_total_length": 3000,
    "voice_sample_path": "./voice-sample.mp3",
    "default_exaggeration": 0.5,
    "default_cfg_weight": 0.5,
    "default_temperature": 0.8
  }
}

3. List Models

GET /v1/models

List available models (OpenAI API compatibility).

Response (ModelsResponse model):

{
  "object": "list",
  "data": [
    {
      "id": "chatterbox-tts-1",
      "object": "model",
      "created": 1677649963,
      "owned_by": "resemble-ai"
    }
  ]
}

4. Configuration Info

GET /config

Get current configuration (useful for debugging).

Response (ConfigResponse model):

{
  "server": {
    "host": "0.0.0.0",
    "port": 4123
  },
  "model": {
    "device": "cuda",
    "voice_sample_path": "./voice-sample.mp3",
    "model_cache_dir": "./models"
  },
  "defaults": {
    "exaggeration": 0.5,
    "cfg_weight": 0.5,
    "temperature": 0.8,
    "max_chunk_length": 280,
    "max_total_length": 3000
  }
}

5. API Documentation Endpoints

GET /docs - Interactive Swagger UI documentation
GET /redoc - Alternative ReDoc documentation
GET /openapi.json - OpenAPI schema specification

Text Processing

Automatic Chunking

The API automatically handles long text inputs by:

Character limit: Splits text longer than the configured chunk size (default: 280 characters)
Sentence preservation: Attempts to split at sentence boundaries (., !, ?)
Fallback splitting: If sentences are too long, splits at commas, semicolons, or other natural breaks
Audio concatenation: Seamlessly combines audio from multiple chunks

Maximum Limits

Soft limit: Configurable characters per chunk (default: 280)
Hard limit: Configurable total characters (default: 3000)
Automatic processing: No manual intervention required

Error Handling

FastAPI provides enhanced error handling with automatic validation:

422 Unprocessable Entity: Invalid input validation (Pydantic errors)
400 Bad Request: Business logic errors (text too long, etc.)
500 Internal Server Error: Model or processing errors

Error Response Format:

{
  "error": {
    "message": "Missing required field: 'input'",
    "type": "invalid_request_error"
  }
}

Validation Error Example:

{
  "detail": [
    {
      "type": "greater_equal",
      "loc": ["body", "exaggeration"],
      "msg": "Input should be greater than or equal to 0.25",
      "input": 0.1
    }
  ]
}

Testing

Use the enhanced test script to verify the API functionality:

python tests/test_api.py

The test script will:

Test health check endpoint
Test models endpoint
Test API documentation endpoints (new!)
Generate speech for various text lengths
Test custom parameter validation
Test error handling with validation
Save generated audio files as test_output_*.wav

Configuration

You can configure the API through environment variables or by modifying .env.example:

# Server Configuration
PORT=4123
HOST=0.0.0.0

# TTS Model Settings
EXAGGERATION=0.5          # Emotion intensity (0.25-2.0)
CFG_WEIGHT=0.5            # Pace control (0.0-1.0)
TEMPERATURE=0.8           # Sampling temperature (0.05-5.0)

# Text Processing
MAX_CHUNK_LENGTH=280      # Characters per chunk
MAX_TOTAL_LENGTH=3000     # Total character limit

# Voice and Model Settings
VOICE_SAMPLE_PATH=./voice-sample.mp3
VOICE_LIBRARY_DIR=./voices
DEVICE=auto               # auto/cuda/mps/cpu
MODEL_CACHE_DIR=./models

Parameter Effects

Exaggeration (0.25-2.0):

0.3-0.4: Very neutral, professional
0.5: Neutral (default)
0.7-0.8: More expressive
1.0+: Very dramatic (may be unstable)

CFG Weight (0.0-1.0):

0.2-0.3: Faster speech
0.5: Balanced (default)
0.7-0.8: Slower, more deliberate

Temperature (0.05-5.0):

0.4-0.6: More consistent
0.8: Balanced (default)
1.0+: More creative/random

Docker Deployment

For Docker deployment, see DOCKER_README.md for complete instructions.

Quick start with Docker Compose:

cp .env.example .env  # Customize as needed
docker compose up -d

Quick start with Docker:

docker build -t chatterbox-tts .
docker run -d -p 4123:4123 \
  -v ./voice-sample.mp3:/app/voice-sample.mp3:ro \
  -e EXAGGERATION=0.7 \
  chatterbox-tts

Performance Notes

FastAPI Benefits:

Async performance: Better handling of concurrent requests
Faster JSON serialization: ~25% faster than Flask
Type validation: Prevents invalid requests at the API level
Auto documentation: No manual API doc maintenance

Hardware Recommendations:

Model loading: The model is loaded once at startup (can take 30-60 seconds)
First request: May be slower due to initial model warm-up
Subsequent requests: Should be faster due to model caching
Memory usage: Varies by device (GPU recommended for best performance)
Concurrent requests: FastAPI async support allows better multi-request handling

Integration Examples

Python with requests

import requests

# Basic request
response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={"input": "Hello world!"}
)

with open("output.wav", "wb") as f:
    f.write(response.content)

# With custom parameters and validation
response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "Exciting news!",
        "exaggeration": 0.8,
        "cfg_weight": 0.4,
        "temperature": 1.0
    }
)

# Handle validation errors
if response.status_code == 422:
    print("Validation error:", response.json())

JavaScript/Node.js

const response = await fetch('http://localhost:4123/v1/audio/speech', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    input: 'Hello world!',
    exaggeration: 0.7,
  }),
});

if (response.status === 422) {
  const error = await response.json();
  console.log('Validation error:', error);
} else {
  const audioBuffer = await response.arrayBuffer();
  // Save or play the audio buffer
}

cURL

# Basic usage
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Your text here"}' \
  --output output.wav

# With custom parameters
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Dramatic text!", "exaggeration": 1.0, "cfg_weight": 0.3}' \
  --output dramatic.wav

# Test the interactive documentation
curl http://localhost:4123/docs

Development Features

FastAPI Development Tools

Auto-reload: Use --reload flag for development
Interactive testing: Use /docs for live API testing
Type hints: Full IDE support with Pydantic models
Validation: Automatic request/response validation
OpenAPI: Machine-readable API specification

Development Mode

# Start with auto-reload
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload

# Or with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug

Troubleshooting

Common Issues

Model not loading: Ensure Chatterbox TTS is properly installed
Voice sample missing: Verify voice-sample.mp3 exists at the configured path
CUDA out of memory: Try using CPU device (DEVICE=cpu)
Slow performance: GPU recommended; ensure CUDA/MPS is available
Port conflicts: Change PORT environment variable to an available port
Uvicorn not found: Install with pip install uvicorn[standard]

FastAPI Specific Issues

Startup Issues:

# Check if uvicorn is installed
uvicorn --version

# Run with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug

# Alternative startup method
python main.py

Validation Errors:

Visit /docs to see the interactive API documentation and test your requests.

Checking Configuration

# Check if API is running
curl http://localhost:4123/health

# View current configuration
curl http://localhost:4123/config

# Check API documentation
curl http://localhost:4123/openapi.json

# Test with simple text
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Test"}' \
  --output test.wav

Migration from Flask

If you're migrating from the previous Flask version:

Dependencies: Update to fastapi and uvicorn instead of flask
Startup: Use uvicorn app.main:app instead of python api.py
Documentation: Visit /docs for interactive API testing
Validation: Error responses now use HTTP 422 for validation errors
Performance: Expect 25-40% better performance for JSON responses

All existing API endpoints and request/response formats remain compatible.