TextToSpeech: Android vs Google Cloud - liweiyap/narradir-android GitHub Wiki

TextToSpeech

Android

Simple example of how to use Android Studio's built-in TextToSpeech

private TextToSpeech tts;

/* ... */

tts = new TextToSpeech(this, new TextToSpeech.OnInitListener() {
    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            String text = "Hello world";
            tts.setLanguage(Locale.US);
            tts.speak(text, TextToSpeech.QUEUE_ADD, null, null);
        }
    }
});

Customising the voice

Switching the gender

Set<String> features = new HashSet<>();
features.add("male");
Voice voice = new Voice("en-us-x-sfg#male_2-local", new Locale("en","US"), 400, 200, true, features);
tts.setVoice(voice);

Downloading more TextToSpeech data from Google

if (!isGoogleTTSInstalled()) {
    installGoogleTTS();
} else {
    openTTSSettingsToInstallUnsupportedLanguage();
}

createGoogleTTS();

with the function definitions below:

private boolean isGoogleTTSInstalled() {
    Intent ttsIntent = new Intent();
    ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
    PackageManager pm = this.getPackageManager();
    List<ResolveInfo> listOfInstalledTTSInfo = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
    for (ResolveInfo r : listOfInstalledTTSInfo) {
        String engineName = r.activityInfo.applicationInfo.packageName;
        if (engineName.equals("com.google.android.tts")) {
            return true;
        }
    }
    return false;
}

private void installGoogleTTS() {
    Intent goToMarket = new Intent(Intent.ACTION_VIEW).setData(Uri.parse("market://details?id=com.google.android.tts"));
    startActivity(goToMarket);
}

// use this if attempting to speak in foreign locale results in onError() being called by your UtteranceProgressListener.
private void openTTSSettingsToInstallUnsupportedLanguage() {
    Intent intent = new Intent();
    intent.setAction("com.android.settings.TTS_SETTINGS");
    intent.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
    startActivity(intent);
}

private void createGoogleTTS() {
    tts = new TextToSpeech(this, new TextToSpeech.OnInitListener() {
        @Override
        public void onInit(int status) {
            if (status == TextToSpeech.SUCCESS) {
                /* ... */
            }
        }
    }, "com.google.android.tts");
}

Difficulties

Prior assumption: user of Android device is already using Google TextToSpeech

For example, in the settings (Language & Input > Text-to-Speech output) of my own Samsung phone, I'm actually using Samsung Text-to-speech. Trying to switch the gender like above would not work.

Whilst in the app we could get the phone to install Text-to-Speech data, this poses the following difficulties:

  • The app will not be completely self-contained.
  • The user relies on Internet access to install Text-to-Speech data.

Checking if a voice is actually available on the Android device

Set<Voice> availableVoices = textToSpeechSystem.getVoices();
List<Locale> availableLocales = Arrays.asList(Locale.getAvailableLocales());

for (Voice voice : availableVoices) {
    if ( (voice.getLocale().getLanguage().equals(Locale.getDefault().getLanguage())) &&
         (availableLocales.contains(voice.getLocale())) &&
         (!voice.isNetworkConnectionRequired()) &&
         (tts.isLanguageAvailable(voice.getLocale()) != TextToSpeech.LANG_MISSING_DATA) &&
         (tts.isLanguageAvailable(voice.getLocale()) != TextToSpeech.LANG_NOT_SUPPORTED) &&
         (!voice.getFeatures().contains(TextToSpeech.Engine.KEY_FEATURE_NOT_INSTALLED)) ) {
        /* ... */
    }
}

Sources

Google Cloud

What if we use the Google Cloud synthesizers to pre-record an audio file with the text we want for playback? I believe this could improve the speed/performance of the app.

Resources for Google Cloud

Sample code using Android Studio's built-in MediaPlayer

Inserting pauses

Final texts used for Avalon

  • Note the punctuation, which we manipulate to create distinct pauses. We use SSML wherever our ability to create artificial pauses by manipulating punctuation is limited.
Title of audio file Text Image
introsegment0nomerlin.mp3 Everyone. Close your eyes. Nil
introsegment0withmerlin.mp3 Everyone. Close your eyes, and extend your hand into a fist in front of you. Nil
introsegment1nooberon.mp3 Agents of Evil. Wake up, and look for other agents of Evil. Evil
introsegment1withoberon.mp3 Agents of Evil. Except Oberon. Wake up, and look for other agents of Evil. Evil
introsegment2.mp3 Agents of Evil. Close your eyes. Evil
introsegment3nomerlin.mp3 Everyone. Wake up. Nil
introsegment3nomordred.mp3 Merlin. <break time=\"0.2s\"/>Wake up. <break time=\"0.6s\"/>Agents of Evil. Stick out your thumb, so that Merlin can see who you are. Merlin
introsegment3withmordred.mp3 Merlin. <break time=\"0.2s\"/>Wake up. <break time=\"0.6s\"/>Agents of Evil. Except Mordred. Stick out your thumb, so that Merlin can see who you are. Merlin
introsegment4.mp3 Agents of Evil. <break time=\"0.2s\"/>Put your thumbs away. <break time=\"0.6s\"/>Merlin. Close your eyes. Merlin
introsegment5nopercival.mp3 Everyone. Wake up. Nil
introsegment5withpercivalnomorgana.mp3 Percival. <break time=\"0.2s\"/>Wake up. <break time=\"0.6s\"/>Merlin. Stick out your thumb, so that Percival can see who you are. Percival
introsegment5withpercivalwithmorgana.mp3 Percival. <break time=\"0.2s\"/>Wake up. <break time=\"0.6s\"/>Merlin and Morgana. Stick out your thumb, so that Percival can see who you are. Percival
introsegment6withpercivalnomorgana.mp3 Merlin. <break time=\"0.2s\"/>Put your thumb away. <break time=\"0.6s\"/>Percival. Close your eyes. Percival
introsegment6withpercivalwithmorgana.mp3 Merlin and Morgana. <break time=\"0.2s\"/>Put your thumbs away. <break time=\"0.6s\"/>Percival. Close your eyes. Percival
introsegment7.mp3 Everyone. Wake up. Nil
  • Change image at start of segment with odd-numbered index (0-indexing).
  • Pause narration at end of segment with odd-numbered index (0-indexing) by user-defined duration (cannot go below 1000 ms).
  • Pause narration at end of segment with even-numbered index (0-indexing) by fixed duration (1000 ms).

Final texts used for Secret Hitler

Title of audio file Text Image
secrethitlerintrosegment0small.mp3 Everyone. Close your eyes. Nil
secrethitlerintrosegment1small.mp3 Fascist and Hitler. <break time=\"0.2s\"/> Wake up, and look for each other. Evil
secrethitlerintrosegment2small.mp3 Fascist and Hitler. <break time=\"0.2s\"/> Close your eyes. Evil
secrethitlerintrosegment3small.mp3 Everyone. Wake up. Nil
secrethitlerintrosegment0large.mp3 Everyone. Close your eyes, and extend your hand into a fist in front of you. Nil
secrethitlerintrosegment1large.mp3 Fascists. Except Hitler. <break time=\"0.2s\"/> Wake up, and look for other Fascists. Fascists
secrethitlerintrosegment2large.mp3 Hitler. <break time=\"0.2s\"/>Stick out your thumb, so that the Fascists can see who you are. Fascists
secrethitlerintrosegment3large.mp3 Hitler. <break time=\"0.2s\"/>Put your thumb away. <break time=\"0.6s\"/>Fascists. Close your eyes. Fascists
secrethitlerintrosegment4large.mp3 Everyone. Wake up. Nil
⚠️ **GitHub.com Fallback** ⚠️