Kokoro TTS API Integration - bigsk1/voice-chat-ai GitHub Wiki

What is Kokoro TTS?

Kokoro TTS is a local Text-to-Speech (TTS) service that allows you to run high-quality voice generation on your own hardware. Unlike cloud-based services like ElevenLabs or OpenAI TTS, Kokoro runs entirely on your local machine, providing:

  • Privacy: All text and generated audio stays on your device
  • Cost efficiency: No usage-based billing or API key requirements
  • Lower latency: No network-dependent delays
  • Offline capability: Works without internet connection

Kokoro TTS supports multiple languages and offers a variety of voices with different accents and characteristics.

Setup and Installation

Kokoro TTS is a separate project from voice-chat-ai. To use Kokoro with voice-chat-ai, you need to:

  1. Install and run Kokoro TTS on your local machine first
  2. Configure voice-chat-ai to use your Kokoro instance

Installing Kokoro TTS

The Kokoro TTS server can be installed from its GitHub repository. Please refer to the official Kokoro TTS documentation for detailed installation instructions:

Kokoro TTS GitHub Repository

Note: For any issues related to Kokoro TTS installation or functionality, please refer to the official Kokoro GitHub repository or support channels.

Configuring voice-chat-ai to use Kokoro

  1. Make sure Kokoro TTS is running on your machine

  2. In your voice-chat-ai .env file, set the following variables:

    # Set TTS provider to kokoro
    TTS_PROVIDER=kokoro
    
    # Kokoro API base URL - default is localhost, change if running on another machine
    KOKORO_BASE_URL=http://localhost:8880/v1
    
    # Select your preferred Kokoro voice
    KOKORO_TTS_VOICE=af_bella
    
    # Speed setting (0.7 to 1.2 range)
    VOICE_SPEED=1.0
    
  3. Restart voice-chat-ai if it's already running

Kokoro API Endpoints

Kokoro TTS exposes several API endpoints that voice-chat-ai uses:

Speech Generation

POST /v1/audio/speech

Parameters:

  • model: The TTS model to use (usually "kokoro")
  • voice: The voice ID to use (e.g., "af_bella")
  • input: The text to convert to speech
  • response_format: Audio format (usually "wav")
  • speed: Speech speed (0.7-1.2 range)

Example request:

{
  "model": "kokoro",
  "voice": "af_bella",
  "input": "Hello, this is a test of Kokoro Text to Speech.",
  "response_format": "wav",
  "speed": 1.0
}

Available Voices

GET /v1/audio/voices

Returns a list of available voices:

{
  "voices": [
    "af_alloy",
    "af_bella",
    "af_nova",
    "am_adam",
    "am_echo",
    "bf_emma",
    "bm_lewis",
    ...
  ]
}

Voice-Chat-AI Integration

The voice-chat-ai application integrates with Kokoro TTS in several ways:

  1. Voice Selection: The UI provides a dropdown to select from available Kokoro voices
  2. Speed Control: You can adjust the speech speed using the global speed slider
  3. TTS Provider Selection: Kokoro can be selected as the TTS provider alongside other options

Voice Naming Convention

Kokoro voices follow a naming pattern that helps identify their characteristics:

  • First letter: Language/accent

    • a: American English
    • b: British English
    • e: European Spanish
    • j: Japanese
    • z: Chinese
    • And others...
  • Second letter: Gender

    • f: Female
    • m: Male
  • Remaining part: Voice name

    • Example: af_bella = American Female "Bella"

The voice-chat-ai application organizes these voices by language group in the dropdown menu for easier selection.

Testing and Web UI

Kokoro comes with its own web interface for testing and exploring voices:

http://localhost:8880/web/

Note: Some users have reported browser compatibility issues with the Kokoro web UI. If you encounter problems, try using Brave browser which has shown better compatibility.

Best Practices

  1. Hardware Requirements: Kokoro TTS works best on systems with a dedicated GPU. CPU-only operation is possible but will be slower.

  2. Voice Selection: Different voices may have varying quality levels. If one voice doesn't sound good, try others.

  3. Speed Settings: While voice-chat-ai allows speed adjustments between 0.7-1.2, some voices may sound better at certain speeds.

  4. Text Length: For best results, keep text segments under 200-300 characters. Very long passages may affect quality.

Troubleshooting

If you encounter issues with Kokoro TTS in voice-chat-ai:

  1. Verify Kokoro is Running: Make sure the Kokoro service is running locally

    curl http://localhost:8880/v1/audio/voices
    
  2. Check Logs: Examine both voice-chat-ai and Kokoro logs for errors

  3. Restart Services: Sometimes restarting both services can resolve connection issues

  4. Port Conflicts: Ensure nothing else is using port 8880

  5. Docker Users: If running in Docker, update your URLs to http://host.docker.internal:8880/v1

Limitations

  1. Kokoro TTS is a third-party service and not developed or maintained by the voice-chat-ai team

  2. Audio quality may vary compared to cloud services like ElevenLabs

  3. Resource usage can be high on less powerful systems

  4. Limited multilingual support compared to some commercial services

Support

For issues with the Kokoro TTS service itself, please refer to:

  • The official Kokoro GitHub repository
  • Kokoro documentation and community forums