Alexa Integration with IPTV Video Services - larz7/larzworksamples GitHub Wiki

Overview of Functionality

Integrating Alexa Voice Service with IPTV Video Services enables voice commands sent through remote devices to be recognized by Set Top Boxes (STBs).

Testing Strategy

The IPTV Video Services ecosystem is composed of several microservices which enable streaming video content. The Operations team represents the customer experience interacting with their Alexa device, a remote device, and STB.

Video Skills API

The Video Skills API is a schema containing voice commands which are converted to JSON messages and sent to Video Services.In Amazon Alexa terminology, a skill is a unique capability configured through a customized API. You build a Video Skills API to communicate with Video Services. AWS Lambda is a cloud service offered by AWS that will host the microservices within Video Services which support the Video Skills API.

Voice Commands

There are three distinct use case scenarios for Alexa voice commands to be used by customers:

  1. Functionality during real-time programming.
  2. Fast forward and rewind for Video On Demand (VOD) and Digital Video Recording (DVR) playback.
  3. Searching across assets, including customer favorites and recommendations.

The search system is configured using the following voice commands:

Command Usage
β€œTune to channel x” real-time
β€œpause” real-time, VOD, DVR
β€œplay” real-time, VOD, DVR
β€œrewind x second or minutes" VOD, DVR
β€œfast forward x second or minutes" VOD, DVR
β€œmute audio” real-time, VOD, DVR
β€œlower volume” real-time, VOD, DVR
β€œincrease volume” real-time, VOD, DVR
"search for..." content search, VOD, DVR

Video searches use asset metadata to retrieve results. For example, title, actors, and plot synopsis. If functionality is not available for a scenario, the Alexa response will be "Functionality is not available for this type of programming." Attempting to use fast forward during live programming is not supported.

Alexa Companion App

The Alexa Companion App is a prerequisite for building a link between customers' Amazon Alexa devices and STBs. The Alexa Companion App is free for iOS, Android, and Fire Operating Systems.

From alexa.amazon.com, use the setup wizard to configure new Alexa devices to an STB for operations testing.

Alexa Integration

Use the configuration page to create links between Alexa and Video Services users.


Field Description
Service Endpoint Type A web service that accepts requests from and sends responses to the Alexa service in the cloud.
Authorization URL Used by the Alexa Companion App to request the authorization code from the AuthServer.
Client Id A custom value to designate linking to a skill in the Video Skills API.
Domain List Enter any domains other than the Authorization URL where content is retrieved.
Scope The amount of tracking for content attributes and events.
Redirect URLs Used by the OAuthServer to redirect the user and linked authorization code after a successful SSO.
Authorization Grant Type Select Implicit Grant.
Access Token URI The authorization code is used by Alexa Cloud to retrieve the access token and refresh token.
Client Secret A string statement to allow Alexa to authenticate using the Access Token URI.
Client Authorization Scheme Select HTTP basic.

Data Flow

  • For each Alexa Linked account, Alexa Cloud sends a Discover Directive to the Video Skills API in a JSON request payload to discover which skills are being used.

  • Alexa also sends the same Discover Directive when a user has initiated device pairing by selecting the Manage Devices option in the Alexa Companion App.

  • The Account-Services microservice uses an API /v1/account/devices which returns details of all the devices under a given account.

  • Alexa Cloud expects a response which contains an array of Endpoint objects for each Alexa device within a User account.

OAuth Server

The OAuthServer is used to provide authentication services (accessToken) for the Alexa Companion App and Alexa Skill API to communicate with Video Services which in turn communicates with STBs. It also provides the service for keeping track of the pairing between an Alexa device and a STB for each account.

Cloud Details

Eventing

Alexa Cloud expects device state related information to be sent in the form of Events for the following reasons.

  • Events verify the customer is using Video Services, which allows for non-ambiguous customer requests towards video (and away from music). For example, β€˜play Star Wars’ could refer to the soundtrack or the movie. With Events, the request will resolve to the movie.
  • Events inform the Alexa platform to route transport commands through to Video Services rather than taking action on the Alexa device itself. For example, the β€˜Alexa pause’ command will send the request to Video Services instead of the Echo device.
  • Events inform Video Services which of its microservices should get the request by knowing which is actively in use. This is important for a predictable customer experience.
  • Events allow customers to select on screen items with voice. For example, β€œshow me popular comedies” could display a list of results which are sent to Alexa via a LIST_ACTIVE event. A customer voice selection would then choose from on screen items instead of the broad catalog from metadata.

SNS Event Messages

Amazon Simple Notification Service (Amazon SNS) is a web service that uses pub/sub messaging and mobile notifications services for coordinating the delivery of messages to subscribed endpoints and clients.

Alexa expects events to be sent via the AWS SNS framework. Video Services is required to have a SNS topic and publish Events from all Alexa capable STBs to the designated topic.

Command Description Timing
POWER_ON Identifies the device waking from an off or low power state, or an application launching. This event signals a customer's intent to use the device or service. One time at event occurrence.
IN_USE Identifies customer activity that can be considered active use. For example, remote control usage, channel navigation, an application launched, video playback started, etc. These events should not be passive. For example, β€˜video is playing’. State changes every 5 minutes if continued.
IS_PLAYING (playing, paused) Describes video playback as occurring. This event should include the IS_PLAYING flag, and optionally the state of the player (playing, paused). Once when state changes and every 30 minutes while player is active.
POWER_OFF Identifies the device powering off/entering low power mode or application exit. This event signals a customer's intent to stop using the device or service. This is critical to releasing platform focus on video devices and return to standard 'music' mode. Occurs one time at event occurrence.

SNS Event API

SNS messages need to be in the following format:

eyJza2lsbElkdummystringNraWxsLjY4NDZiYjNhdummystringNWUzNiIsInN0dummystring2ZWxvcdummystring==_endpointId

The endpointId is a base64 encoded version of {"skillId":"amzn1.ask.skill.6846bb3a-108c-4947-a3c2-02c3ff605e36","stage":"development"} followed by the actual endpointId of the device assigned during Discovery.