GCP Pretrained ML models - ghdrako/doc_snipets GitHub Wiki

  • The Cloud Speech-to-Text API
  • The Cloud Text-to-Speech API
  • The Cloud Translation API
  • The Cloud Natural Language API
  • The Cloud Vision API
  • The Cloud Video Intelligence API

The Cloud Speech-to-Text API

The Cloud Speech-to-Text API empowers developers with the ability to turn speech into text. This API accepts received audio and returns a text transcription. This API can be used synchronously, asynchronously, or in a streaming model. Many languages and dialects are supported.

The Cloud Text-to-Speech API

The Cloud Text-to-Speech API empowers developers with the ability to transform text into a form of Speech Synthesis Markup Language (SSML) input into audio data of natural human speech. Many languages are supported, with multiple voices available per language. There are two types of voice to choose from, Standard and WaveNet, the latter constituting an advanced module that narrows the gap to human speech.

The Cloud Translation API

The Cloud Translation API enables the translation of hundreds of languages. If the language is unknown, the service can auto-detect it. Cloud Translation comes with libraries for the most popular languages, so you can use it directly in your code without using the REST API.

The Cloud Natural Language API

The Cloud Natural Language API allows you to leverage the deep learning models that Google uses for its search engine to analyze text. It is also leveraged by Google Assistant. It is able to perform the following operations:

  • Extract information regarding entities, including places, people, and events
  • Categorize the entities
  • Perform sentiment analysis
  • Perform syntax analysis

In view of the preceding capabilities, this API can be used for the following use cases. It can be leveraged to analyze documents, news, social media, or blog posts. In combination with the Speech-to-Text API, it can analyze customer satisfaction from a call center call. Be aware that a limited number of languages are supported. If your language is not supported, you can use the Translation API to convert the text into a supported language.

The API can be accessed both through the REST API and the gcloud ml language command, and the text can be provided as a parameter or uploaded from Cloud Storage.

The Cloud Vision API

The Cloud Vision API provides vision detection features, including the following:

  • Image labeling
  • Face and landmark detection
  • Optical Character Recognition (OCR)
  • Tagging explicit content

The Cloud Video Intelligence API

Google Cloud Video Intelligence allows you to analyze video that's been uploaded to Cloud Storage. Currently, the following features are available:

  • Labels: These detect and label entities, such as animals, plants, and people.
  • Shots: These detect scene changes within the video and label them.
  • Explicit content: These are explicit content annotations for pornography.

Video metadata can be created with labels that describe its content to allow improved searching in media libraries. In addition, videos with inappropriate content can be identified and removed from general access.