Prompt Engineering Guide - jxoesneon/gemini-audio-mcp GitHub Wiki
🎙️ Prompt Engineering for Audio
Getting high-quality results from Gemini and Lyria requires a slightly different approach than text generation. Audio models "listen" for descriptive cues about texture, space, and emotion.
🎲 Soundscape Generation (generate_soundscape)
Best for: Environmental ambience, background textures.
The Gemini 2.0 Live API excels at complex, layered sounds. Use "sensory" adjectives.
- ❌ Bad: "rain in a city"
- ✅ Good: "Heavy rhythmic rain hitting a glass window in a futuristic cyberpunk city, distant low-frequency hovercar hums, neon signs buzzing faintly, binaural spatial audio."
Tip: The "Seamless Loop" trick When generating loops, explicitly tell the model: "Constant background texture with no sudden peaks, suitable for a seamless loop."
🗣️ Voice Generation (generate_voice)
Best for: Narration, character dialogue, expressive reading.
The Native Audio model can follow "voice direction" within the prompt.
- Direction: "Read the following in a raspy, old wizard's voice, pausing for dramatic effect after every sentence."
- Emotion: "A nervous, high-pitched voice that cracks slightly when speaking about the monster."
🎵 Music & SFX (generate_music / generate_sfx)
Powered by Lyria 3 (Pro/Clip).
Lyria responds well to genre, instrumentation, and BPM.
- Genre-based: "Lo-fi hip hop with a dusty vinyl crackle, mellow electric piano, 85 BPM, relaxing study vibes."
- Instrumentation: "Solo cello playing a melancholic minor-key melody in a large cathedral with long reverb."
- SFX: "A crisp, metallic 'ching' sound effect for a level-up notification, high-pitched and rewarding."