Option Analysis: Current Language Translators & Video Conferencing Software - kristiperreault/ece601-placement-exam GitHub Wiki

Skype Real-Time Language Translator with Azure Microsoft Cognitive Services

Google Assistant & Pixel Buds with Google Cloud Translator

IBM Watson Translator

Amazon Translate, Comprehend, & Polly

  • Translate works with unstructured text, roughly 20 languages supported (https://docs.aws.amazon.com/translate/latest/dg/how-it-works.html)
  • Also uses neural networks and a Decoder to decode source text, and encoder to translate to target text (both one word at a time)
  • Amazon Comprehend does the Automated language detection using neural networks. It will recognize key phrases, words, language, sentiment, and syntax. Uses deep learning, async and sync processing, integrates with other AWS services, and supports customization and clustering (https://docs.aws.amazon.com/comprehend/latest/dg/what-is.html)
  • Polly converts text to "life-like" speech. Multiple voice options, low latency, pay for what you translate, logging available. Only available in 3 regions, throttle limits (https://docs.aws.amazon.com/polly/latest/dg/what-is.html)
  • Pricing model is simple and "pay for what you need", but multiple services can add up $$

Google Hangouts, Amazon Chime, Zoom, Slack, GoToMeeting, Join.Me, Facebook Video, FaceTime

General Notes

  • The basic flow of real time bilingual video conferencing: Condition input, language identification, automatic speech recognition, speech to text, text "cleanup", natural language processing, text to speech
  • Basic system for a bilingual video conferencing application: Frontend display -> video API/Lambda -> speech to text (ASR) -> translation api (Neural network) -> text to speech -> output to service -> output to user