Demo: Speech Recognition in Chrome - touretzkyds/ai4k12 Wiki
- Name: SpeechRecognition
- Subject Area: speech recognition
- Type: in-browser demo software (Google Chrome only)
- Grade(s): all ages
- URL: https://www.cs.cmu.edu/~dst/SpeechDemo
- Creators: Dave Touretzky and Sarah Pam
- License: public domain
Description: this demo uses the Google Speech API to record audio from the computer's microphone and return strings indicating the best hypotheses for what the speaker said. These hypotheses are displayed in rank order, highest score first. There will usually be more than one hypothesis, and sometimes none of them are correct. The demo illustrates the current state of the art in speech recognition, including its limitations.
Note: this demo only works in Google Chrome because other browsers do not yet implement in-browser speech recognition. Also, the machine must have a microphone, and a working network connection in order to access the Google Speech recognition service.
Some things to demonstrate:
- Longer utterances usually work better than one or two word utterances.
- Grammatical utterances are recognized much more easily than random strings of words.
- Careful enunciation improves the recognition rate.
- Homophones such as "which"/"witch" can often be disambiguated by context. Try these examples at yourdictionary.com.
- Non-words can sometimes be corrected based on context. Try these examples, where we replace 'grapes' or 'drapes' with the non-word 'brapes':
- "Start your fruit salad by cutting up the brapes"
- "Brighten the room by drawing back the brapes"
- Common sayings are recognized as a whole, not word by word. Compare these examples:
- "Able was I ere I saw Elba" (a well known palindrome; Google recognizes it)
- "Able were you ere you saw Elba" (not a palindrome and not well known; Google has trouble with this one)
- Why does Google have problems with "Able were you ere you saw Elba"? Two reasons:
- The syntax is unusual and slightly awkward.
- The word "ere", which means "before", is archaic and not used in modern conversational English, so Google does its best to interpret this sound as some more common word, such as "ear", unless the context strongly points to "ere".
- Try this quote from The Hobbit. Does Google get it? Do you think Google's training corpus might include famous literature?
- "we must away, ere break of day"
- Google tries really hard to hear famous quotes correctly. Try this example:
- "No man is an island" (John Donne, Meditation XVII)
- "No man is an eyelid" (unlikely to be in Google's training data)
- Try the English word "Kalamazoo". Then switch the language model from English to Spanish and try "Kalamazoo" again.
- Switch the language model from English to Spanish and try speaking English to the Spanish model. It actually works pretty well. Now try switching the language model to Mandarin and try speaking English to that model.
Also see this Google demo page: https://www.google.com/intl/en/chrome/demos/speech.html