关于hololens调用讯飞语音sdk问题汇总_hololens2 接入科大讯飞-CSDN博客

本文链接：https://blog.csdn.net/VR_Utopia/article/details/60468167

本文介绍了在Hololens上使用Unity开发时，因仅支持英文语音识别而选择讯飞SDK进行中文识别的尝试。在实现过程中遇到讯飞SDK不支持Win10及音频格式转换问题。通过获取麦克风音频数据，设定采样周期和转换数据格式，最终实现了从32位浮点双声道到16位单声道PCM格式的转换，为实时语音转写提供可能。然而，目前仍受限于讯飞SDK的更新，项目无法在Hololens上正常运行。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

由于hololens只支持英语的语音识别，而本人的项目需要用到中文语音识别，考虑到讯飞是国内比较著名的语音识别技术供应商，因此尝试将讯飞的windows SDK嫁接到unity3d中进行开发。而结果很遗憾，该程序能在unity 的调试中运行，但是在hololens中无法正常运行，原因可能是讯飞的sdk不能支持win10的缘故吧。只能够等待讯飞那边进行sdk更新。在尝试使用讯飞开发的过程中，由于unity提供的音频格式与讯飞的需要的不同，导致进度一度停滞，经过一番努力终于摸索出了规律，现在分享一下。

实现讯飞语音识别，需要向讯飞服务器上传音频数据，由于讯飞需要该音频数据格式为 pcm 格式，16k采样率，16位数据，单声道，而unity 通过麦克风获取的音频流数据为32位的浮点型双声道音频数组，因此需要将数据转换才能使用，因此流程如下：

1.通过unity获取麦克风音频数据

2.设定采样周期，采样数，处理周期

3.将32位浮点双声道数组转换成16位单声道音频数组

1.unity 获取麦克风音频流数据

首先我这个项目实现的功能是这样的，实时进行语音转写。即不能是通过按钮来录取一段音频上传上去识别，而是实时地从麦克风中获取数据上传。因此，需要读取音频流（AudioStream）。在网上提供的方案是通过第三方插件Naudio进行麦克风缓存的获取，但经过测试，Naudio在unity中编译不了，PS：正确来说是Naudio不知UWP编译，因此放弃了该方案，后来查找holoToolkit的Input组件，其提供了MicStream脚本来帮助读取麦克风中的缓存。以下是代码

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in the project root for license information.


using System.Runtime.InteropServices;
using UnityEngine;
using System.Text;


namespace HoloToolkit.Unity.InputModule
{
    public class MicStream
    {
        // This class replaces Unity's Microphone object
        // This class is made for HoloLens mic stream selection, but should work well on all windows 10 devices
        // chooses from one of three possible microphone modes on HoloLens
        // There is an example of how to use this script in HoloToolkit\Input\Tests\Scripts\MicStreamDemo.cs


        // Streams: LOW_QUALITY_VOICE is optimized for speech analysis, COMMUNICATIONS is higher quality voice and is probably preferred
        //          ROOM_CAPTURE tries to get the sounds of the room more than the voice of the suer
        // can only be set on initialization
        public enum StreamCategory { LOW_QUALITY_VOICE, HIGH_QUALITY_VOICE, ROOM_CAPTURE }


        public enum ErrorCodes { ALREADY_RUNNING = -10, NO_AUDIO_DEVICE, NO_INPUT_DEVICE, ALREADY_RECORDING, GRAPH_NOT_EXIST, CHANNEL_COUNT_MISMATCH, FILE_CREATION_PERMISSION_ERROR, NOT_ENOUGH_DATA, NEED_ENABLED_MIC_CAPABILITY };


        const int MAX_PATH = 260; // 260 is maximum path length in windows, to be returned when we MicStopRecording


        [UnmanagedFunctionPointer(CallingConvention.StdCall)] // If included in MicStartStream, this callback will be triggered when audio data is ready. This is not the preferred method for Game Engines and can probably be ignored.
        public delegate void LiveMicCallback();


        /// 
        /// Called before calling MicStartStream or MicstartRecording to initialize microphone
        /// 
        /// One of the entries in the StreamCategory enumeratio
        /// error code or 0
        [DllImport("MicStreamSelector", ExactSpelling = true)]
        public static extern int MicInitializeDefault(int category);


        /// 
        /// Called