AVS开发相关知识

AVS相关知识

  • 概念

 Alexa Voice Service使用户获得基于云的Alexa能力,即将Alexa整合到产品中。通过在云端进行语音识别和自然语言理解,使构建语音产品更加简单

  • 交互模型(Interactive model)

 AVS由多个相应的客户端功能的接口(关于接口,下面有详细解释)组成,比如语音识别、音频播放和音量控制。每个接口包含逻辑上分组的消息,这些消息称为指令和事件。

指令是从云端发送的消息,指示客户端执行行动。

事件是从客户端发送到云的消息,通知Alexa有些事情已经发生。

  • 步骤

    授权

为了访问AVS API,产品需要获得一个LWA(a Login with Amazon)访问令牌,授权产品代表用户的意愿去调用API。有以下授权方式:

(1)远程授权(Remote Authorization 远程授权用于设备,比如智能音箱。

配套应用程序授权  参考

https://blog.csdn.net/qq_27061049/article/details/80167376

关键:获取正确的Api_key。授权成功后获得access tokentoken需放在eventHeader上发送给AVS

配套网页授权

(2)本地授权(Local Authorization 本地授权用于AndroidiOS应用程序。

(3)基于codeCode Based Linking)通常用于输入受限的产品,如电视或智能手表

   构建Http/2连接

(1)基本概念:

Frame:帧,HTTP/2中基本的协议单元,每一帧服务于不同的目的,例如Head Frame和Data Frame组成了基本的请求和响应

Stream在HTTP/2连接中,一个独立的,在客户端和服务器间交互的双向一系列连续的帧。

Downchannel: 在HTTP/2连接中创建的流用于从云端向客户端分发指令,主要用于发送从云端首发的指令或是音频给客户端。

Cloud-initiated Directives:云端首发的指令,如用户通过App调节设备的音量,一个指令被发送给产品而没有相关的语音请求。

(2)维持HTTP/2连接需要做到:a)建立下行通道流(downchannel stream)b)同步产品与AVS组件状态

(3)建立下行通道流。打开AVS连接10s之内,客户端发送GET请求。请求如下

:method = GET 

:scheme = https 

:path = /{{API version}}/directives

authorization = Bearer {{YOUR_ACCESS_TOKEN}}  

在Android中通过OkHTTP进行访问,代码如下

mClient = new OkHttpClient.Builder().connectTimeout(5, TimeUnit.SECONDS).writeTimeout(10, TimeUnit.SECONDS)
        .readTimeout(
10, TimeUnit.SECONDS).build();

//构建请求

Request request = new Request.Builder().addHeader(AUTHORIZATION, BEARER + mAuthToken).addHeader
        (
CONTENT_TYPE, MULTI_DATA).url(AVS_DIRECTIVES_URL).build();
  Log.i(
TAG, "[http call] downchannel request");
mDownChannelCall = mClient.newCall(request);
Response downChannelRsp =
mDownChannelCall.execute();
boolean isDownChannelOpen = downChannelRsp.isSuccessful();
Log.i(
TAG, "[down channel]: " + downChannelRsp.code());

内部相应变量如下:

private final String AVS_DIRECTIVES_URL = "https://avs-alexa-na.amazon.com/v20160207/directives";
private final String AVS_EVENTS_URL = "https://avs-alexa-na.amazon.com/v20160207/events";
private final String AVS_EVENTS_PING = "https://avs-alexa-na.amazon.com/ping";
private final String BOUNDARY = "this-is-a-boundary";
private final String AUTHORIZATION = "authorization";
private final String BEARER = "Bearer ";
private final String CONTENT_TYPE = "content-type";
private final String MULTI_DATA = "multipart/form-data; boundary=" + BOUNDARY;
private final String DISPOSITION = "Content-Disposition";
private final String META_DATA = "form-data; name=\"metadata\"";
private final String AUDIO_DATA = "form-data; name=\"audio\"";

(4)同步组件状态

在现有的连接上在新的事件流上建立POST请求。该事件流可以在客户端接收到响应后关闭。下面是一个同步状态事件的例子

​​​​​​​:method = POST  
:scheme = https  
:path = /{{API version}}/events
authorization = Bearer {{YOUR_ACCESS_TOKEN}}
content-type = multipart/form-data; boundary={{BOUNDARY_TERM_HERE}}  

--{{BOUNDARY_TERM_HERE}}
Content-Disposition: form-data; name="metadata"  
Content-Type: application/json; charset=UTF-8  
{  
    "context": [   
       // This is an array of context objects that are used to communicate the
       // state of all client components to Alexa. See Context for details.
    ],  
    "event": {  
        "header": {  
            "namespace": "System",  
            "name": "SynchronizeState",  
            "messageId": "{{STRING}}"  
        },  
        "payload": {  
        }  
    }  
}  
--{{BOUNDARY_TERM_HERE}}--
Android代码
RequestBody requestBody = RequestBody.create(JSON_TYPE, JsonHelper.getSyncStateJson(mPlayToken, 

        mMsgId));

RequestBody multiBody = new MultipartBody.Builder().setType(MultipartBody.FORM).addPart(Headers.of

        (DISPOSITION, META_DATA), requestBody).build();

Request request2 = new Request.Builder().url(AVS_EVENTS_URL).post(multiBody).addHeader(AUTHORIZATION,

        BEARER + mAuthToken).addHeader(CONTENT_TYPE, MULTI_DATA).build();

Log.i(TAG, "[http call] Synchronized state request");

mStateCall = mClient.newCall(request2);

Response response2 = mStateCall.execute();

同步成功后,客户端就能通过这个连接发送事件events,接收指令directives.

(5)注意事项:

         a)采集的音频应该按如下方式编码:

                     16bit Linear PCM (LPCM16)

                     16kHz sample rate

                      Single channel

                      Little endian byte order

        b)HTTP/2连接仅能同时支持10个流,包括events streamdownchannelping。因此需要确保事件流在接收到响应后被关闭

        c)许多库都设有读取超时,当客户端长时间没有接收到数据就读取超时。因为downchannel stream需要AVS和客户端保持打开的状态,而且该流可能长时间不发送数据给客户端,所以需要设置读取超时为至少60分钟。

  (6)Ping and Timeout

      当连接处于空闲状态时,应该每5分钟发送一次Ping帧

       Sample Request

:method = GET  
:scheme = https  
:path = /ping  
authorization = Bearer {{YOUR_ACCESS_TOKEN}}
  

语音识别(SpeechRecognizer Interface

https://developer.amazon.com/zh/docs/alexa-voice-service/speechrecognizer.html#stopcapture

1、语音采集

Android中通过AudioRecord类进行音频采集。大致过程如下

private final int AUDIO_SAMPLE_RATE = 16000; //采样率16000Hz

private final int AUDIO_CHANNEL_CONFIG = AudioFormat.CHANNEL_IN_MONO;//单声道

private final int AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT;//音频量化位数,可以设置每个样本的分辨率为16位或者8位,16位将占用更多的空间和处理能力,表示的音频也更加接近真实。

private final int AUDIO_BUFFER_SIZE = AudioRecord.getMinBufferSize(AUDIO_SAMPLE_RATE, AUDIO_CHANNEL_CONFIG, 

        AUDIO_FORMAT); //1280
//1、构造AudioRecorder

mAudioRecorder = new AudioRecord(MediaRecorder.AudioSource.MIC, AUDIO_SAMPLE_RATE, AUDIO_CHANNEL_CONFIG, 

        AUDIO_FORMAT, AUDIO_BUFFER_SIZE);//(音频采集来源(麦克风),音频采样率,声道,音频采样精度,音频数据存放的缓冲区大小)

//2、开始采集

mAudioRecorder.startRecording();

mRecording = true;

//3、将采集的数据写入文件,AVS识别

mAudioRecordingThread = new Thread(new Runnable() {

    @Override

    public void run() {

        writeAudioDataToFile();

    }

});

mAudioRecordingThread.start();

//4、停止采集

mHandler.postDelayed(mStopAudioRunnable, MAX_TIME);

在同Alexa交互初始化后,麦克风始终保持打开状态直到出现如下状况

  1. 接收到stopCapture指令
  2. 流被Alexa Service关闭
  3. 用户手动关闭麦克风

2、语音发送给AVS  ——Recognize Event

Recognize Event用于把用户的语音发送给AVS,该event由两部分message构成,第一部分为JSON格式的object,第二部分为由产品麦克风采集的音频字节流。

所有采集到的发送给AVS的音频应该按如下方式编码:

  • 16bit Linear PCM
  • 16kHz sample rate
  • Single channel
  • Little endian byte order

Sample Message

{
    "context": [
        // This is an array of context objects that are used to communicate the
        // state of all client components to Alexa. See Context for details.      
    ],   
    "event": {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "Recognize",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "profile": "{{STRING}}",
            "format": "{{STRING}}",
            "initiator": {
                "type": "{{STRING}}",
                "payload": {
                    "wakeWordIndices": {
                        "startIndexInSamples": {{LONG}},
                        "endIndexInSamples": {{LONG}}
                    },
                    "token": "{{STRING}}"   
                }
            }
        }
    }
}

Binary Audio Attachment

每个Recognize event都要求有音频字节流,下面是每个音频字节流的请求头headers

Content-Disposition: form-data; name="audio"
Content-Type: application/octet-stream

{{BINARY AUDIO ATTACHMENT}}
对应Android请求如下
RequestBody audioBody = new MultipartBody.Builder().setType(MultipartBody.FORM).addPart(Headers.of

        (DISPOSITION, META_DATA), RequestBody.create(JSON_TYPE, JsonHelper.getRecognizeJson(mPlayToken, 

        mMsgId, mDialogReqId))).addPart(Headers.of(DISPOSITION, AUDIO_DATA), streamBody).build();



Request request3 = new Request.Builder().url(AVS_EVENTS_URL).post(audioBody).addHeader(AUTHORIZATION, 

        BEARER + mAuthToken).addHeader(CONTENT_TYPE, MULTI_DATA).build();

Log.i(TAG, "[http call] recognize audio request");

mAudioCall = mClient.newCall(request3);
其中streambody即为麦克风采集的音频数据流,为下面的mRequestBody
mRequestBody = new RequestBody() {

    @Override

    public MediaType contentType() {

        return AVSMgr.OCTET_TYPE;

    }

//将音频缓冲区AUDIO_BUFFER_SIZE内的音频数据写入BufferedSink

    @Override

    public void writeTo(BufferedSink bufferedSink) throws IOException {

        int readSize;

        byte buffer[] = new byte[AUDIO_BUFFER_SIZE];



        while (mRecording) {

            readSize = mAudioRecorder.read(buffer, 0, AUDIO_BUFFER_SIZE);



            if (AudioRecord.ERROR_INVALID_OPERATION != readSize) {

                try {

                    int sizeCount = readSize / AMZN_AUDIO_BYTE_SIZE;



                    if (sizeCount > 0) {

                        byte[][] bytes = new byte[sizeCount][AMZN_AUDIO_BYTE_SIZE];

                        for (int i = 0, start = 0, end = AMZN_AUDIO_BYTE_SIZE; i < sizeCount; i++, start += 

                                AMZN_AUDIO_BYTE_SIZE, end += AMZN_AUDIO_BYTE_SIZE) {

                            bytes[i] = Arrays.copyOfRange(buffer, start, end);

                            bufferedSink.write(bytes[i]);

                        }

                    }



                    mTimeCounter++;

                    mCounter++;

                    if (mTimeCounter > 4 && mCounter > FIRST_RECORD_COUNTER) {

                        long sum = 0;

                        for (int i = 0; i < buffer.length; i++) {

                            sum += Math.abs(buffer[i]);

                        }

                        double rawAmplitude = sum / (double) readSize;

                        mAmplitude = Math.max(rawAmplitude, mAmplitude);

                        Log.d(TAG, "rawAmp: " + mAmplitude);



                        if (rawAmplitude < 33.3 && rawAmplitude > 25) {

                            Log.d(TAG, "no sound dectected");

                            mNoSoundCounter++;

                        } else if (rawAmplitude > 33.3) {

                            Log.d(TAG, "sound dectected");

                            mNoSoundCounter = 0;

                        }



                        mTimeCounter = 0;

                        mAmplitude = 0;

                    }



                    if (mNoSoundCounter > 3) {

                        stopRecording();

                    }

                } catch (IOException e) {

                    e.printStackTrace();

                    Log.e(TAG, "Error writing to audio file", e);

                }

            }

        }



        Log.i(TAG, "Finish speaking");

        if (null != mListener) {

            mListener.onFinishRecording();

        } else {

            Log.e(TAG, "RecordMgr: mListener == null.");

        }

    }

};

3、AVS   返回指令

       a)停止抓取指令StopCapture Directive

AVS识别到用户意图或检测到用户语音结束时,该指令将指示客户端停止抓取用户语音。当接收到该指令时,客户端需要关闭麦克风,停止监听用户语音。

Sample Message

{
    "directive": {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "StopCapture",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
        }
    }
}

      b) ExpectSpeech Directive

Alexa需要额外的信息去完成用户的请求时,AVS将发送ExpectSpeech Directive。它指示客户端打开麦克风,采集语音流。如果麦克风未在指定的时间内打开,客户端必须发送ExpectSpeechTimedOut eventAVS

在与Alexa的多轮交互中,设备将接收到至少一个ExpectSpeech Directive,指示客户端监听用户语音。如果当前 initiator object包含在ExpectSpeech directive payload中,则在返回给Alexa Recognize event 中, initiator object也包含在里面。如果payload内不包含 initiator ,则在Recognize even也不包含initiator

Sample Message

{
    "directive": {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "ExpectSpeech",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "timeoutInMilliseconds": {{LONG}},
            "initiator": {{STRING}}
        }
    }
}

 

   c) ExpectSpeechTimedOut Event

如果麦克风未在指定的时间内打开,客户端必须发送ExpectSpeechTimedOut eventAVS

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "ExpectSpeechTimedOut",
            "messageId": "{{STRING}}",
        },
        "payload": {
        }
    }
}

 

4、语音合成SpeechSynthesizer Interface 

当用户询问问题或做出请求时,语音合成接口用于发送Alexa语音回复。如当用户发送一个天气询问What's the weather in Seattle?时,AVS将会返回一个带有二进制音频的Speak directive给客户端。

Speak Directive

当客户端需要一个来自于Alexa的语音响应的时候,AVS发送Speak Directive给客户端。该指令发送给客户端包括多个部分的信息,一部分是JSON格式的指令,另一部分是二进制音频数据。

Sample Message

{
    "directive": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "Speak",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "url": "{{STRING}}",
            "format": "{{STRING}}",
            "token": "{{STRING}}"
        }
    }
}

Binary Audio Attachment 下面是二进制音频文件的头信息

Content-Type: application/octet-stream
Content-ID: {{Audio Item CID}}

{{BINARY AUDIO ATTACHMENT}}

SpeechStarted Event

当客户端处理完Speak Directive并且开始播放合成语音的时候,SpeechStarted Event应当被发送给AVS

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechStarted",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }
    }
}

 

SpeechFinished Event

当客户端处理完Speak Directive并且 Alexa TTS完全渲染给用户(语音播放完成),SpeechFinished Event应当被发送给AVS。如果播放没有结束,如用户通过"Alexa, stop"打断了播放的时候,SpeechFinished就不会被发送。

Sample Message

{
    "event": {
        "header": {
            "namespace": "SpeechSynthesizer",
            "name": "SpeechFinished",
            "messageId": "{{STRING}}"
        },
        "payload": {
            "token": "{{STRING}}"
        }    }
}
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值