Kokoro TTS API Integration - bigsk1/voice-chat-ai GitHub Wiki
What is Kokoro TTS?
Kokoro TTS is a local Text-to-Speech (TTS) service that allows you to run high-quality voice generation on your own hardware. Unlike cloud-based services like ElevenLabs or OpenAI TTS, Kokoro runs entirely on your local machine, providing:
- Privacy: All text and generated audio stays on your device
- Cost efficiency: No usage-based billing or API key requirements
- Lower latency: No network-dependent delays
- Offline capability: Works without internet connection
Kokoro TTS supports multiple languages and offers a variety of voices with different accents and characteristics.
Setup and Installation
Kokoro TTS is a separate project from voice-chat-ai. To use Kokoro with voice-chat-ai, you need to:
- Install and run Kokoro TTS on your local machine first
- Configure voice-chat-ai to use your Kokoro instance
Installing Kokoro TTS
The Kokoro TTS server can be installed from its GitHub repository. Please refer to the official Kokoro TTS documentation for detailed installation instructions:
Note: For any issues related to Kokoro TTS installation or functionality, please refer to the official Kokoro GitHub repository or support channels.
Configuring voice-chat-ai to use Kokoro
-
Make sure Kokoro TTS is running on your machine
-
In your voice-chat-ai
.env
file, set the following variables:# Set TTS provider to kokoro TTS_PROVIDER=kokoro # Kokoro API base URL - default is localhost, change if running on another machine KOKORO_BASE_URL=http://localhost:8880/v1 # Select your preferred Kokoro voice KOKORO_TTS_VOICE=af_bella # Speed setting (0.7 to 1.2 range) VOICE_SPEED=1.0
-
Restart voice-chat-ai if it's already running
Kokoro API Endpoints
Kokoro TTS exposes several API endpoints that voice-chat-ai uses:
Speech Generation
POST /v1/audio/speech
Parameters:
model
: The TTS model to use (usually "kokoro")voice
: The voice ID to use (e.g., "af_bella")input
: The text to convert to speechresponse_format
: Audio format (usually "wav")speed
: Speech speed (0.7-1.2 range)
Example request:
{
"model": "kokoro",
"voice": "af_bella",
"input": "Hello, this is a test of Kokoro Text to Speech.",
"response_format": "wav",
"speed": 1.0
}
Available Voices
GET /v1/audio/voices
Returns a list of available voices:
{
"voices": [
"af_alloy",
"af_bella",
"af_nova",
"am_adam",
"am_echo",
"bf_emma",
"bm_lewis",
...
]
}
Voice-Chat-AI Integration
The voice-chat-ai application integrates with Kokoro TTS in several ways:
- Voice Selection: The UI provides a dropdown to select from available Kokoro voices
- Speed Control: You can adjust the speech speed using the global speed slider
- TTS Provider Selection: Kokoro can be selected as the TTS provider alongside other options
Voice Naming Convention
Kokoro voices follow a naming pattern that helps identify their characteristics:
-
First letter: Language/accent
a
: American Englishb
: British Englishe
: European Spanishj
: Japanesez
: Chinese- And others...
-
Second letter: Gender
f
: Femalem
: Male
-
Remaining part: Voice name
- Example:
af_bella
= American Female "Bella"
- Example:
The voice-chat-ai application organizes these voices by language group in the dropdown menu for easier selection.
Testing and Web UI
Kokoro comes with its own web interface for testing and exploring voices:
http://localhost:8880/web/
Note: Some users have reported browser compatibility issues with the Kokoro web UI. If you encounter problems, try using Brave browser which has shown better compatibility.
Best Practices
-
Hardware Requirements: Kokoro TTS works best on systems with a dedicated GPU. CPU-only operation is possible but will be slower.
-
Voice Selection: Different voices may have varying quality levels. If one voice doesn't sound good, try others.
-
Speed Settings: While voice-chat-ai allows speed adjustments between 0.7-1.2, some voices may sound better at certain speeds.
-
Text Length: For best results, keep text segments under 200-300 characters. Very long passages may affect quality.
Troubleshooting
If you encounter issues with Kokoro TTS in voice-chat-ai:
-
Verify Kokoro is Running: Make sure the Kokoro service is running locally
curl http://localhost:8880/v1/audio/voices
-
Check Logs: Examine both voice-chat-ai and Kokoro logs for errors
-
Restart Services: Sometimes restarting both services can resolve connection issues
-
Port Conflicts: Ensure nothing else is using port 8880
-
Docker Users: If running in Docker, update your URLs to
http://host.docker.internal:8880/v1
Limitations
-
Kokoro TTS is a third-party service and not developed or maintained by the voice-chat-ai team
-
Audio quality may vary compared to cloud services like ElevenLabs
-
Resource usage can be high on less powerful systems
-
Limited multilingual support compared to some commercial services
Support
For issues with the Kokoro TTS service itself, please refer to:
- The official Kokoro GitHub repository
- Kokoro documentation and community forums