[洪流学堂]Hololens开发高级篇3：语音（Voice）

最新推荐文章于 2023-06-29 09:11:16 发布

大智_Unity玩家

最新推荐文章于 2023-06-29 09:11:16 发布

阅读量2k

点赞数

分类专栏：大话Unity 文章标签： unity hololens 混合现实路线

本文链接：https://blog.csdn.net/zhenghongzhi6/article/details/78750346

版权

大话Unity 专栏收录该内容

202 篇文章 126 订阅

订阅专栏

本教程基于Unity2017.2及Visual Studio 2017
本教程编写时间：2017年12月8日

本文内容提要

设计语音命令并针对Hololens语音引擎优化
让用户知道可以用什么语音命令
听到用户的语音命令后给出反馈
使用Dictation Recognizer理解用户在说什么
使用Grammar Recognizer监听基于SRGS（Speech Recognition Grammar Specification，语音识别文法规范）文件的命令

预备知识

资源下载

本文使用了官方教程的资源
原地址
 如果下载有困难，百度云地址

0 创建工程

将下载资源的解压出来
打开Unitiy，选择Open，选择 HolographicAcademy-Holograms-212-Voice\Starting\ModelExplorer 目录并打开 （会提示版本不一致，点击继续即可）
在Project面板中Scenes目录下，双击打开ModelExplorer场景
打开菜单File > Build Settings，选择Universal Windows Platform，点击Switch Platform按钮
Target device设置为Hololens，选中Unity C# Projects
在Build Settings面板，点击Build，新建App文件夹并选择该文件夹
Build完成后，打开App文件夹下的ModelExplorer.sln，将Debug改为Release，ARM改为x86，并选中Hololens Emulator
点击调试 > 开始执行（不调试）或者Ctrl+F5（注意：模拟器启动慢可能会引起部署超时，这时候不要关闭模拟器，直接再次Ctrl+F5即可）

1 关于语音命令你应该知道的

应该这样设计

创建简洁的命令。您不应该使用“Play the currently selected video”，因为该命令不简洁，很容易被用户遗忘。相反，你应该使用：“Play Video”，因为它是简洁的，有多个音节。
使用一个简单的词汇。始终尝试使用易于用户发现和记忆的常用词语和短语。例如，如果您的应用程序具有可以显示或隐藏的注释对象，则不会使用“Show Placard” 命令，因为“placard”是一个很少使用的术语。相反，您可以使用命令“Show Note”来显示应用程序中的注释。
始终如一。语音命令应该在您的应用程序中保持一致。设想你的应用程序中有两个场景，两个场景都包含一个关闭应用程序的按钮。如果第一个场景使用命令“Exit”触发按钮，而第二个场景使用命令“Close App”，那么用户将会非常困惑。如果相同的功能在多个场景中持续存在，则应该使用相同的语音命令来触发它。

不应该这样设计

使用单音节命令。例如，如果您正在创建语音命令来播放视频，则应避免使用简单命令“Play”，因为它只是单个音节，很容易被系统忽略。相反，你应该使用：“Play Video”，因为它是简洁的，有多个音节。
使用系统命令。“Select”命令由系统保留来触发当前具有焦点的对象的Tap事件。不要在关键字或短语中使用“Select”命令，因为它可能无法按预期工作。例如，如果您的应用程序中用于选择cube的语音命令是“Select cube”，但是用户在发出命令时正在查看一个球体，则会选中该球体。应用栏的简单命令是一直启用的。请勿在CoreWindow视图中使用以下语音命令：
1. Go Back
2. Scroll Tool
3. Zoom Tool
4. Drag Tool
5. Adjust
6. Remove
使用相似的声音。尽量避免使用押韵的声音命令。如果有“Show Store”和“Show More”作为语音命令的购物应用程序，那么应该禁用其中一个命令。例如，您可以使用“Show Store”按钮打开商店，然后在显示商店时禁用该命令，以便可以使用“Show more”命令进行浏览。

引导

在Hierarchy面板中，用上方的搜索框找到holoComm_screen_mesh 物体
双击holoComm_screen_mesh物体使它在Scene面板居中。这是宇航员的手表，将会对我们的语音命令做出反馈。
展开Keyword Manager组件，展开 Keywords and Responses 区域查看支持的语音命令 Open Communicator
双击打开KeywordManager.cs脚本，看一看如何用KeywordRecognizer 添加语音命令以及使用委托处理这些语音命令
build测试一下！
- 凝视宇航员的手表，光标会变成麦克风的图标，表明正在监听语音命令
- tooltip会在上方显示出来，帮助用户了解有什么命令
- 说“Open Communicator”来打开communicator面板

2 Acknowledgement

注意

Microphone权限必须开启，本工程已经设置开启了该权限，你自己创建工程时，务必要牢记
1. Edit > Project Settings > Player
2. 切换到Universal Windows Platform
3. “Publishing Settings > Capabilities”区域中，选中Microphone capability

引导

在Unity的Hierarchy面板中，选中holoComm_screen_mesh物体。
在“ Inspector”面板中，找到“ Astronaut Watch (Script)”组件。
单击Communicator Prefab属性值的小蓝色立方体Communicator。
在“ Projects”面板中，Communicator prefab现在应该具有焦点。
单击“ Projects”面板中的Communicator prefab可在Inspector面板中查看其组件。
看看Microphone Manager (Script)组件，这将允许我们记录用户的声音。
请注意，Communicator prefab上有用于响应“ Send Message”命令的Keyword Manager (Script) 组件。
查看Communicator（Script）组件，然后双击脚本以在Visual Studio中打开它。
Communicator.cs负责在通讯器设备上设置正确的按钮状态。这将允许我们的用户记录消息，回放，并将消息发送给宇航员。它也将启动和停止动画波形，向用户确认他们的声音被听到。
在Communicator.cs中，从Start方法中删除以下行。这将启用传播者上的“记录”按钮。

// TODO: 2.a Delete the following two lines:
RecordButton.SetActive(false);
MessageUIRenderer.gameObject.SetActive(false);

build测试一下吧！
- 凝视宇航员的手表，说 “Open Communicator” 来显示通信面板
- 点击录音按钮 (麦克风图标) 开始录制一段语音消息给宇航员
- 开始讲话，确认通信面板上的波形图在变化，这是听到用户声音的反馈
- 点停止按钮 (左侧方框图标), 并确认波形图不再变化
- 点击播放按钮 (右侧三角形图标) 回放录制的消息
- 点击停止按钮（右侧方框图标）停止回放录制的消息
- 说 “Send Message” 来关闭通信面板并能看到屏幕上显示 ‘Message Received’

3 理解

目标

使用听写识别器将用户的语音转换为文本。
在通信面板中中显示听写识别器的假设和最终结果。

在本章中，我们将使用听写识别器为宇航员创建一条消息。使用听写识别器时，请记住：

您必须连接WiFi才能使听写识别器正常工作。
超时发生在一段时间之后。有两个超时要注意：
- 如果识别器启动并且在头五秒内没有听到任何音频，则会超时。
- 如果识别器已经给出结果，但随后20秒没有听到声音，它将超时。
- 一次只能运行一种类型的识别器（关键字或听写）。

注意

引导

我们要编辑MicrophoneManager.cs来使用听写识别器。这是我们要补充的：
1. 当按下Record按钮时，我们将启动DictationRecognizer。
2. 显示DictationRecognizer理解的假设。
3. 锁定DictationRecognizer理解的结果。
4. 检查DictationRecognizer的超时。
5. 当按下“ 停止”按钮或麦克风会话超时时，请停止DictationRecognizer。
6. 重新启动KeywordRecognizer，它将侦听Send Message命令。

现在，在MicrophoneManager.cs中完成注释为3.a的所有编码练习，下面为最终完整代码：

using Academy.HoloToolkit.Unity;
using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using UnityEngine.Windows.Speech;

public class MicrophoneManager : MonoBehaviour
{
    [Tooltip("A text area for the recognizer to display the recognized strings.")]
    public Text DictationDisplay;

    private DictationRecognizer dictationRecognizer;

    // Use this string to cache the text currently displayed in the text box.
    private StringBuilder textSoFar;

    // Using an empty string specifies the default microphone. 
    private static string deviceName = string.Empty;
    private int samplingRate;
    private const int messageLength = 10;

    // Use this to reset the UI once the Microphone is done recording after it was started.
    private bool hasRecordingStarted;

    void Awake()
    {
        /* TODO: DEVELOPER CODING EXERCISE 3.a */

        // 3.a: Create a new DictationRecognizer and assign it to dictationRecognizer variable.
        dictationRecognizer = new DictationRecognizer();

        // 3.a: Register for dictationRecognizer.DictationHypothesis and implement DictationHypothesis below
        // This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
        dictationRecognizer.DictationHypothesis += DictationRecognizer_DictationHypothesis;

        // 3.a: Register for dictationRecognizer.DictationResult and implement DictationResult below
        // This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
        dictationRecognizer.DictationResult += DictationRecognizer_DictationResult;

        // 3.a: Register for dictationRecognizer.DictationComplete and implement DictationComplete below
        // This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
        dictationRecognizer.DictationComplete += DictationRecognizer_DictationComplete;

        // 3.a: Register for dictationRecognizer.DictationError and implement DictationError below
        // This event is fired when an error occurs.
        dictationRecognizer.DictationError += DictationRecognizer_DictationError;

        // Query the maximum frequency of the default microphone. Use 'unused' to ignore the minimum frequency.
        int unused;
        Microphone.GetDeviceCaps(deviceName, out unused, out samplingRate);

        // Use this string to cache the text currently displayed in the text box.
        textSoFar = new StringBuilder();

        // Use this to reset the UI once the Microphone is done recording after it was started.
        hasRecordingStarted = false;
    }

    void Update()
    {
        // 3.a: Add condition to check if dictationRecognizer.Status is Running
        if (hasRecordingStarted && !Microphone.IsRecording(deviceName) && dictationRecognizer.Status == SpeechSystemStatus.Running)
        {
            // Reset the flag now that we're cleaning up the UI.
            hasRecordingStarted = false;

            // This acts like pressing the Stop button and sends the message to the Communicator.
            // If the microphone stops as a result of timing out, make sure to manually stop the dictation recognizer.
            // Look at the StopRecording function.
            SendMessage("RecordStop");
        }
    }

    /// <summary>
    /// Turns on the dictation recognizer and begins recording audio from the default microphone.
    /// </summary>
    /// <returns>The audio clip recorded from the microphone.</returns>
    public AudioClip StartRecording()
    {
        // 3.a Shutdown the PhraseRecognitionSystem. This controls the KeywordRecognizers
        PhraseRecognitionSystem.Shutdown();

        // 3.a: Start dictationRecognizer
        dictationRecognizer.Start();

        // 3.a Uncomment this line
        DictationDisplay.text = "Dictation is starting. It may take time to display your text the first time, but begin speaking now...";

        // Set the flag that we've started recording.
        hasRecordingStarted = true;

        // Start recording from the microphone for 10 seconds.
        return Microphone.Start(deviceName, false, messageLength, samplingRate);
    }

    /// <summary>
    /// Ends the recording session.
    /// </summary>
    public void StopRecording()
    {
        // 3.a: Check if dictationRecognizer.Status is Running and stop it if so
        if (dictationRecognizer.Status == SpeechSystemStatus.Running)
        {
            dictationRecognizer.Stop();
        }

        Microphone.End(deviceName);
    }

    /// <summary>
    /// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
    /// </summary>
    /// <param name="text">The currently hypothesized recognition.</param>
    private void DictationRecognizer_DictationHypothesis(string text)
    {
        // 3.a: Set DictationDisplay text to be textSoFar and new hypothesized text
        // We don't want to append to textSoFar yet, because the hypothesis may have changed on the next event
        DictationDisplay.text = textSoFar.ToString() + " " + text + "...";
    }

    /// <summary>
    /// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
    /// </summary>
    /// <param name="text">The text that was heard by the recognizer.</param>
    /// <param name="confidence">A representation of how confident (rejected, low, medium, high) the recognizer is of this recognition.</param>
    private void DictationRecognizer_DictationResult(string text, ConfidenceLevel confidence)
    {
        // 3.a: Append textSoFar with latest text
        textSoFar.Append(text + ". ");

        // 3.a: Set DictationDisplay text to be textSoFar
        DictationDisplay.text = textSoFar.ToString();
    }

    /// <summary>
    /// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
    /// Typically, this will simply return "Complete". In this case, we check to see if the recognizer timed out.
    /// </summary>
    /// <param name="cause">An enumerated reason for the session completing.</param>
    private void DictationRecognizer_DictationComplete(DictationCompletionCause cause)
    {
        // If Timeout occurs, the user has been silent for too long.
        // With dictation, the default timeout after a recognition is 20 seconds.
        // The default timeout with initial silence is 5 seconds.
        if (cause == DictationCompletionCause.TimeoutExceeded)
        {
            Microphone.End(deviceName);

            DictationDisplay.text = "Dictation has timed out. Please press the record button again.";
            SendMessage("ResetAfterTimeout");
        }
    }

    /// <summary>
    /// This event is fired when an error occurs.
    /// </summary>
    /// <param name="error">The string representation of the error reason.</param>
    /// <param name="hresult">The int representation of the hresult.</param>
    private void DictationRecognizer_DictationError(string error, int hresult)
    {
        // 3.a: Set DictationDisplay text to be the error string
        DictationDisplay.text = error + "\nHRESULT: " + hresult;
    }

    private IEnumerator RestartSpeechSystem(KeywordManager keywordToStart)
    {
        while (dictationRecognizer != null && dictationRecognizer.Status == SpeechSystemStatus.Running)
        {
            yield return null;
        }

        keywordToStart.StartKeywordRecognizer();
    }
}

编译部署

在Visual Studio中Ctrl+F5运行。
用ait-tap（点击）手势隐藏欢迎屏幕。
凝视宇航员的手表，说“Open Communicator”。
点击录制按钮（麦克风图标）来录制您的信息。
开始讲话。该听写识别器将转换你的语音，并显示在communicator面板上。
尝试在录制讯息时说“Send Message”。请注意，关键字识别器不会响应，因为听写识别器仍处于活动状态，关键字识别器和听写识别器只能有一个处于活动状态。
暂停几秒钟。观察听写识别器完成其假设并显示最终结果。
开始说话，然后暂停20秒。这将导致听写识别器超时。
请注意，关键字识别器在听写识别器超时后重新启用。communicator现在将监听语音命令。
说“Send Message”发送消息给宇航员。

4 语法识别器

目标

使用语法识别器根据SRGS或语音识别语法规范文件识别用户的语音。

注意

说明

在Hierarchy面板中，搜索Jetpack_Center并选择。
在 Inspector 面板中查找“Interactible Action”脚本。
单击 Object To Tag Along 字段右侧的小圆圈。
在弹出的窗口中，搜索 SRGSToolbox 并从列表中选择。
查看StreamingAssets文件夹中的SRGSColor.xml文件。

SRGS设计规范可以在W3C网站上找到。

在我们的SRGS文件中，我们有三种类型的规则：
- 一个规则，让你从一个十二种颜色的列表中说一种颜色。
- 三种规则用于监听颜色规则和三种形状之一的组合。
- 根规则：colorChooser，用于监听三个“颜色+形状”规则的任意组合。形状可以是任意顺序和从1到3的数量。这是唯一被监听的规则，因为它在初始标记中被指定为文件顶部的根规则。