by Terren Peterson
由Terren Peterson
如何在没有YouTube的情况下为Amazon Alexa Show创建视频频道 (How to create a video channel for the Amazon Alexa Show without YouTube)
I’m a software engineer that has published more than twenty custom skills on the Alexa platform. I’ve been recognized as an Alexa Champion, and have won multiple hackathons using the technology. The following highlights how I built a custom video skill for the Echo Show using native Amazon technology.
我是一位软件工程师,已在Alexa平台上发布了二十多种自定义技能。 我已经被公认为Alexa冠军 ,并且使用该技术赢得了多次黑客马拉松。 以下内容重点介绍了如何使用原生Amazon技术为Echo Show建立自定义视频技能。
视频消费的历史 (The History of Video Consumption)
Video broadcasting has been a successful medium for more than sixty years. Television sets dominated the entertainment industry for decades with broadcast signals beaming directly into living rooms. In 1990, Cable Television had reached 57% of US households. This rapidly expanded the variety of content consumed by viewers.
六十多年来,视频广播已成为一种成功的媒体。 几十年来,电视机一直主导着娱乐业,广播信号直接传到客厅。 1990年,有线电视已覆盖美国家庭的57%。 这Swift扩大了观众所消费内容的种类。
Live streaming officially began in 1995, but it wasn’t until 2007 that internet-based streaming began to use the standard HTTP protocol. Smartphones like the iPhone were added soon after using similar communication methods. With the push to mobile, smaller screens started gaining more of our time. YouTube has been a huge success during this phase, and now plays 5 billion videos a day.
实时流媒体正式开始于1995年 ,但是直到2007年,基于Internet的流媒体才开始使用标准HTTP协议。 使用类似的通信方法后,很快就添加了iPhone之类的智能手机。 随着移动的发展,较小的屏幕开始占用我们更多的时间。 YouTube在这一阶段取得了巨大的成功,目前每天播放50亿个视频 。
In 2015, Amazon launched the first device controlled solely by a user’s voice. Voice platforms like Alexa rapidly added another device to the home, with an estimated 35 million already being used in the US. In 2017, Amazon officially launched their first version of Alexa with a screen, called the Echo Show. The new Amazon Fire TV devices also have Alexa built in, enabling streaming video to a flat screen display to be controlled by your voice.
2015年, 亚马逊推出了首款完全由用户语音控制的设备。 像Alexa这样的语音平台Swift在家庭中增加了另一台设备, 据估计在美国已经使用了3500万台 。 2017年 ,亚马逊正式推出了带有屏幕的第一版Alexa,名为Echo Show。 新的Amazon Fire TV设备还内置了Alexa,可将视频流传输到纯平显示器,由您的声音控制。
The Echo Show initially included the capability to play videos from YouTube, but lately, Google and Amazon have been fighting about it’s availability. This impacted the popularity of the product given how big of a feature video playing was.
回声秀最初包括播放来自YouTube的视频的功能,但最近,谷歌和亚马逊一直在为它的可用性而战 。 鉴于播放功能视频有多大,这影响了产品的普及。
At the 2018 CES, multiple consumer electronics companies announced that they were launching a Google Home compatible device with a screen. This makes the voice market even more competitive. One solution to the YouTube compatibility challenge is to host video content directly on Amazon. The following describes how to create a custom skill for the Echo Show that does just that.
在2018年CES上 ,多家消费电子公司宣布将推出具有屏幕的Google Home兼容设备。 这使语音市场更具竞争力。 YouTube兼容性挑战的一种解决方案是直接在Amazon上托管视频内容。 下面介绍如何为执行此操作的Echo Show创建自定义技能。
如何在Alexa上创建自定义视频技能 (How to Create a Custom Video Skill on Alexa)
Building your own video channel is surprisingly easy using Alexa and a handful of AWS services. Here is the architecture featuring a custom Alexa skill that packages content to play on an Echo Show. The AWS storage service (S3) stores the media, and streams it to the device based on instructions given by the custom skill.
使用Alexa和少量AWS服务,构建自己的视频频道非常容易。 这是具有自定义Alexa技能的体系结构,可打包内容以在Echo Show上播放。 AWS存储服务(S3)存储媒体,并根据自定义技能给出的说明将其流式传输到设备。
Producing content for this is comparable to that required for a YouTube channel. The current version of the Echo Show (as well as the Echo Spot) have specifications around the media type to follow. For example, the video should use an mp4 extension and a standard H.264 compression format. The resolution of the video quality should be no greater than 1280x720 in pixel size. These constraints provide a high quality video stream on the Show’s seven inch display comparable to that of a HD video player.
为此制作的内容与YouTube频道所需的内容相当。 当前版本的Echo Show(以及Echo Spot)具有有关要遵循的媒体类型的规范。 例如,视频应使用mp4扩展名和标准的H.264压缩格式。 视频质量的分辨率,像素尺寸应不大于1280x720。 这些限制可在Show的7英寸显示屏上提供与高清视频播放器相当的高质量视频流。
建立自定义视频技能 (Building the Custom Video Skill)
Creating a video channel requires authoring a custom skill for Alexa and publishing it to the public skill store. There are a number of features needed for navigating the content in the channel and playing the videos. This section will cover these steps in detail. For a working example, try out the Piano Teacher skill that is already in the Alexa store. It is a skill that contains short videos with beginner lessons on how to play the piano, as well as note-by-note video instructions on how to play simple songs. Here is the repo that contains all the source code needed, as well as detailed instructions to configure and deploy.
创建视频频道需要为Alexa创作自定义技能,然后将其发布到公共技能商店。 浏览频道中的内容和播放视频需要许多功能。 本节将详细介绍这些步骤。 举例来说,请尝试一下Alexa商店中已经存在的“钢琴老师”技能。 它是一项技能,其中包含有关如何弹钢琴的简短视频和初学者课程,以及有关如何弹奏简单歌曲的逐条视频说明。 这是包含所有所需源代码以及配置和部署的详细说明的存储库。
There are three features required to make a video channel.
制作视频频道需要具备三个功能。
1 — Render a background image when the skill is initially launched. This establishes the channel’s brand.2 — Build navigation controls to browse and select which video to play. This includes handling touch gestures on the Echo Show screen.3 — Delegate control to the device to play a video once content is selected.
1-最初启动该技能时渲染背景图像。 这将建立频道的品牌。2-建立导航控件以浏览并选择要播放的视频。 这包括在“回声显示”屏幕上处理触摸手势。3—将控件委派给设备以在选择内容后播放视频。
1 —具有背景图片的品牌 (1 — Brand with a Background Image)
To facilitate the building of custom video skills, Alexa provides a series of templates. I use the “BodyTemplate1” template to render the background image when the skill is first invoked. When generating the metadata within the Alexa Developer Console, check the second and third boxes on the global fields screen (Video App & Render Template).
为了促进自定义视频技能的培养,Alexa提供了一系列模板。 首次调用该技能时,我使用“ BodyTemplate1”模板渲染背景图像。 在Alexa开发人员控制台中生成元数据时,请选中全局字段屏幕(视频应用程序和渲染模板)上的第二个和第三个框。
Setting these attributes enables additional APIs within the custom skill. Additional standard intents are required to use the APIs. These are created when building the intent model in skills kit. They are as follows:
设置这些属性可启用自定义技能内的其他API。 使用API需要其他标准意图。 这些是在技能套件中构建意图模型时创建的。 它们如下:
- AMAZON.NavigateSettingsIntent AMAZON.NavigateSettingsIntent
- AMAZON.NextIntent 亚马逊下一个意图
- AMAZON.PageDownIntent AMAZON.PageDownIntent
- AMAZON.PageUpIntent AMAZON.PageUpIntent
- AMAZON.PreviousIntent AMAZON.PreviousIntent
- AMAZON.ScrollDownIntent AMAZON.ScrollDownIntent
- AMAZON.ScrollLeftIntent AMAZON.ScrollLeftIntent
- AMAZON.ScrollRightIntent AMAZON.ScrollRightIntent
- AMAZON.ScrollUpIntent AMAZON.ScrollUpIntent
No coding in the Lambda function is required for these events, as the device will handle them natively. They just need to be included in your custom skills intent model.
这些事件不需要Lambda函数中的编码,因为设备将本地处理它们。 它们只需要包含在您的自定义技能意图模型中。
Rendering a background image requires two utility methods that are already distributed in the standard Alexa SDK. The skill creates two identifiers to illustrate how they are used.
渲染背景图像需要两种实用程序方法,这些方法已在标准Alexa SDK中分发。 该技能将创建两个标识符以说明如何使用它们。
// utility methods for creating Image and TextField objects
const makePlainText = Alexa.utils.TextUtils.makePlainText;
const makeImage = Alexa.utils.ImageUtils.makeImage;
Next, I add an identifier for the location of your jpg/png file that serves as the background image. This object needs to be publicly available. The pixel size is 1024x600 based on the dimensions of the Echo Show. You don’t need to provide a separate image for the smaller Echo Spot. Alexa creates the smaller image based on the original file sized for the Show.
接下来,我为您的jpg / png文件的位置添加一个标识符,作为背景图片。 该对象需要公开可用。 根据回声秀的尺寸,像素大小为1024x600。 您无需为较小的Echo Spot提供单独的图像。 Alexa根据原始大小为Show创建较小的图像。
// This is a public endpoint - the easiest way is to host in S3
// It needs to be SSL enabled (which S3 does for you)
const backgroundImage = ‘https://s3.amazonaws.com/.../image.jpg';
Next, add the following code to render the background image when the skill gets launched, along with any other audio messages.
接下来,添加以下代码以在技能启动时渲染背景图像以及任何其他音频消息。
‘LaunchRequest’: function () {
const builder = new Alexa.templateBuilders.BodyTemplate1Builder();
const template = builder.setTitle(‘Your Personal Instructor’)
.setBackgroundImage(makeImage(backgroundImage))
.setTextContent(makePlainText(‘Piano Teacher’))
.build();
// check if the device has a video screen
if (this.event.context.System.device.supportedInterfaces.Display){
this.response.speak(welcomeMessage)
.listen(repeatWelcomeMessage).renderTemplate(template);
this.emit(‘:responseReady’);
} else {
// handle error of not having a video screen to play
this.emit(‘:tell’, nonVideoMessage);
}
},
2 —内容导航 (2 — Content Navigation)
The navigation for the content takes advantage of the flexibility of the Alexa platform. The viewer can use either their voice or fingers to navigate the content catalog, selecting exactly what they would like to view. This requires using the list template within the Alexa SDK, as well as handling events triggered from the user touching the screen.
内容导航利用了Alexa平台的灵活性。 观看者可以使用他们的声音或手指来导航内容目录,从而精确选择他们想要观看的内容。 这需要使用Alexa SDK中的列表模板,以及处理用户触摸屏幕触发的事件。
Here are the different options that the user may request using either their voice or touching the screen.
这是用户可以使用语音或触摸屏幕请求的不同选项。
Rendering a list of what content is available is central to the user experience. I use ‘ListTemplate1’ from the Alexa SDK to render the list of videos. Scrolling up and down can be done through either voice or touch, and is handled by the device with no coding required.
呈现可用内容的列表对于用户体验至关重要。 我使用Alexa SDK中的“ ListTemplate1”来呈现视频列表。 上下滚动既可以通过语音也可以通过触摸来完成,并且无需任何编码即可由设备处理。
The response object that is sent to list the content contains an array listing what is available on the channel. In my skill, this list is read by the Alexa voice, and is rendered visually on the screen. Here is an example of what it looks like.
发送以列出内容的响应对象包含一个数组,该数组列出了通道上可用的内容。 以我的能力,此列表由Alexa语音读取,并在屏幕上可视地呈现。 这是一个看起来像的例子。
Within the code, the content is externalized into an array object (songs.json) that contains a list of videos, as well as metadata about the location of each media file. Each item in the list has a unique token assigned. Here is a sample of the layout written in standard Javascript object notation:
在代码内,内容被外部化为一个数组对象(songs.json),该对象包含视频列表以及有关每个媒体文件位置的元数据。 列表中的每个项目都分配有唯一的令牌。 这是用标准Javascript对象表示法编写的布局示例:
[
{ “requestName”: “Silent Night”,
“listSong”:true,
“token”:”song001",
“difficulty”:”Moderate”,
“videoObject”: “SilentNight.mp4”,
“audioObject”: “SilentNight.mp3”
},
{ “requestName”: “Mary Had a Little Lamb”,
“listSong”:true,
“token”:”song002",
“difficulty”:”Easy”,
“videoObject”: “MaryHadLittleLamb.mp4”,
“audioObject”: “MaryHadLittleLamb.mp3”
},
...
]
Here is the code that converts the array into the response needed by Alexa. Included is embedding the token for each item in the array.
这是将数组转换为Alexa所需响应的代码。 包括将令牌嵌入到数组中的每个项目。
// these are the songs that recordings have been made for
var songs = require("data/songs.json");
// create List
const itemImage = null;
const listItemBuilder = new Alexa.templateBuilders.ListItemBuilder();
const listTemplateBuilder = new Alexa.templateBuilders.ListTemplate1Builder();
// build an array of all available songs
for (i = 0; i < songs.length; i++ ) {
if (songs[i].listSong) {
// pull attributes from song array and apply to the list
listItemBuilder.addItem(null, songs[i].token,
makePlainText(songs[i].requestName),
makePlainText(songs[i].difficulty));
message = message + songs[i].requestName + “, “;
}
}
message = message + “Just select on the screen a song, or request by saying something “ + “like, Teach me how to play “ + songs[0].requestName + “.”;
// now create the response object using the SDK
const listItems = listItemBuilder.build();
const imageLoc = pianoStrings;
const listTemplate = listTemplateBuilder.setToken(‘listToken’)
.setTitle(‘Available Song List’) .setListItems(listItems)
.setBackgroundImage(makeImage(imageLoc))
.build(); this.response.speak(message).listen(noSongRepeatMessage).renderTemplate(listTemplate);
this.emit(‘:responseReady’);
The Lambda function handles the ‘ElementSelected’ event invoked by the Echo Show. The request object sent by the device to the custom skill contains the token used to translate what was selected by the user.
Lambda函数处理由Echo Show调用的'ElementSelected'事件。 设备发送给自定义技能的请求对象包含用于翻译用户选择的内容的令牌。
// this function is invoked from the 'ElementSelected' event
‘ScreenSongSelected’: function() {
console.log(“Element Selected:” + this.event.request.token);
var videoName = “”;
// match token to song name and find the video object to play
for (i = 0; i < songs.length; i++ ) {
if (songs[i].token === this.event.request.token) {
console.log(“Play “ + songs[i].requestName);
videoName = songs[i].videoObject;
}
}
const videoClip = videoLoc + videoName;
this.response.playVideo(videoClip); this.emit(‘:responseReady’);
},
The Lambda function uses the token it receives and finds the media file corresponding to the unique identifier. Control is then transferred to the device with the appropriate video.
Lambda函数使用其接收的令牌并找到与唯一标识符相对应的媒体文件。 然后将控制与适当的视频一起传输到设备。
3 —将控制权委派给视频播放器 (3 — Delegate Control to the Video Player)
Once the video that needs to be played is found, the endpoint of the media is added to the response. This requires a few lines of code within the Lambda function.
找到需要播放的视频后,会将媒体端点添加到响应中。 这需要Lambda函数中的几行代码。
First, identify a folder in the S3 bucket where the video files will be stored.
首先,在S3存储桶中标识一个将存储视频文件的文件夹。
// These are the folders where the mp4 files are located
const videoLoc = ‘https://s3.amazonaws.com/…/media/';
Then specify the exact file based on what was identified by the user. Metadata is added that contains more information about the video.
然后根据用户标识的内容指定确切的文件。 添加的元数据包含有关视频的更多信息。
if (this.event.context.System.device.supportedInterfaces.VideoApp) {
const videoClip = videoLoc + videoObject; // endpoint of the file
// this will be rendered when the user selects video controls
const metadata = {
‘title’: slots.SongName.value
};
this.response.playVideo(videoClip, metadata);
this.emit(':responseReady');
} else {
// handle error - and close the session
this.emit(‘:tell’, nonVideoMessage);
}
After this code executes, the Alexa device will take over navigation of playing the video. The user can either use their voice or touch the screen to pause, rewind, fast-forward, etc. Here is what the screen looks like when playing a video, including the Metadata title at the top.
执行此代码后,Alexa设备将接管播放视频的导航。 用户可以使用语音或触摸屏幕来暂停,快退,快进等。这是播放视频时屏幕的外观,包括顶部的元数据标题。
When the video playing on the device is complete, the skill can be used again to select more content.
在设备上播放完视频后,可以再次使用该技能选择更多内容。
摘要 (Summary)
Building the custom skill can be done using the example above as a template in just a few hours. The Alexa certification process takes just a day or two, then the skill (and content within it) will be available for anyone with an Echo Show. As a fan of YouTube, I hope to be able to use it on my Alexa device soon, but there’s also a way for content publishers to go around it.
只需几个小时,就可以使用上面的示例作为模板来完成自定义技能的构建。 Alexa认证过程只需要一两天,那么拥有Echo Show的任何人都可以使用该技能(及其中的内容)。 作为YouTube的粉丝,我希望能够很快在我的Alexa设备上使用它,但是内容发布者也可以使用它。