Audio Formats Reference - jxoesneon/gemini-audio-mcp GitHub Wiki
🎛️ Audio Formats Reference
Gemini Audio MCP supports a wide variety of formats by leveraging FFmpeg's codec library. Below is a guide on when to use each.
📊 Comparison Table
| Format | Extension | Recommended Use Case | Quality |
|---|---|---|---|
| MP3 | .mp3 |
General use, web compatibility | High (Compressed) |
| OGG | .ogg |
Game development (Godot/UE5) | Excellent |
| FLAC | .flac |
Archiving, high-fidelity production | Lossless |
| OPUS | .opus |
Low-latency streaming, VoIP | Best Compression |
| WAV | .wav |
Raw editing, zero compression | Uncompressed |
| AAC | .m4a |
Apple ecosystem, mobile apps | High |
⚙️ Quality Settings
You can override defaults using the audio_options in tool calls:
1. Bitrate
Higher bitrates result in better quality but larger files.
128k: Standard quality (good for voice).192k: High quality (recommended for soundscapes).320k: Premium quality (recommended for music).
2. Sample Rate
The default is 24,000 Hz (native to Gemini 2.0).
- If you request
44100or48000, the server will use high-quality resampling filters via FFmpeg to up-mix the audio.
3. Channels
- Mono (1): Native Gemini output.
- Stereo (2): The server will duplicate the signal or apply a slight "pseudo-stereo" width if supported by the chosen codec.