android语音识别服务,使用语音服务 API 的语音识别 - Xamarin | Microsoft Docs

weixin_39795419

于 2021-05-26 08:58:45 发布

阅读量317

点赞数

文章标签： android语音识别服务

使用 Azure Speech Service 进行语音识别Speech recognition using Azure Speech Service

01/14/2020

本文内容

Azure Speech Service 是一种基于云的 API，它提供以下功能：Azure Speech Service is a cloud-based API that offers the following functionality:

语音到文本转录音频文件或流到文本。Speech-to-text transcribes audio files or streams to text.

文本到语音转换将输入文本转换为用户喜欢的合成语音。Text-to-speech converts input text into human-like synthesized speech.

语音翻译为语音到文本和语音到语音转换启用了实时、多语言翻译。Speech translation enables real-time, multi-language translation for both speech-to-text and speech-to-speech.

语音助手可为应用程序创建类似于用户的对话接口。Voice assistants can create human-like conversation interfaces for applications.

本文介绍如何 :::no-loc(Xamarin.Forms)::: 使用 Azure 语音服务在示例应用程序中实现语音到文本。This article explains how speech-to-text is implemented in the sample :::no-loc(Xamarin.Forms)::: application using the Azure Speech Service. 以下屏幕截图显示了 iOS 和 Android 上的示例应用程序：The following screenshots show the sample application on iOS and Android:

创建 Azure 语音服务资源Create an Azure Speech Service resource

Azure Speech Service 是 Azure 认知服务的一部分，可为图像识别、语音识别和翻译等任务和必应搜索提供基于云的 Api。Azure Speech Service is part of Azure Cognitive Services, which provides cloud-based APIs for tasks such as image recognition, speech recognition and translation, and Bing search.

示例项目需要在 Azure 门户中创建 Azure 认知服务资源。The sample project requires an Azure Cognitive Services resource to be created in your Azure portal. 可以为单个服务(如语音服务)或多服务资源创建认知服务资源。A Cognitive Services resource can be created for a single service, such as Speech Service, or as a multi-service resource. 创建语音服务资源的步骤如下所示：The steps to create a Speech Service resource are as follows:

创建多服务或单一服务资源。Create a multi-service or single-service resource.

获取资源的 API 密钥和区域信息。Obtain the API key and region information for your resource.

更新示例 Constants.cs 文件。Update the sample Constants.cs file.

有关创建资源的循序渐进指南，请参阅创建认知服务资源。For a step-by-step guide to creating a resource, see Create a Cognitive Services resource.

备注

如果还没有 Azure 订阅，可以在开始前创建一个免费帐户。If you don't have an Azure subscription, create a free account before you begin. 拥有帐户后，可以在免费层创建单个服务资源来试用服务。Once you have an account, a single-service resource can be created at the free tier to try out the service.

通过语音服务配置应用Configure your app with the Speech Service

创建认知服务资源后，可使用 Azure 资源中的区域和 API 密钥更新 Constants.cs 文件：After creating a Cognitive Services resource, the Constants.cs file can be updated with the region and API key from your Azure resource:

public static class Constants

{

public static string CognitiveServicesApiKey = "YOUR_KEY_GOES_HERE";

public static string CognitiveServicesRegion = "westus";

}

安装 NuGet 语音服务包Install NuGet Speech Service package

示例应用程序使用 cognitiveservices account NuGet 包来连接到 Azure 语音服务。The sample application uses the Microsoft.CognitiveServices.Speech NuGet package to connect to the Azure Speech Service. 在共享项目和每个平台项目中安装此 NuGet 包。Install this NuGet package in the shared project and each platform project.

创建 IMicrophoneService 接口Create an IMicrophoneService interface

每个平台都需要有权访问麦克风。Each platform requires permission to access to the microphone. 示例项目 IMicrophoneService 在共享项目中提供了一个接口，并使用 :::no-loc(Xamarin.Forms)::: DependencyService 来获取接口的平台实现。The sample project provides an IMicrophoneService interface in the shared project, and uses the :::no-loc(Xamarin.Forms)::: DependencyService to obtain platform implementations of the interface.

public interface IMicrophoneService

{

Task GetPermissionAsync();

void OnRequestPermissionResult(bool isGranted);

}

创建页面布局Create the page layout

示例项目定义了 MainPage 文件中的基本页面布局。The sample project defines a basic page layout in the MainPage.xaml file. 键布局元素是一个 Button ，它启动脚本过程、 Label 包含转录文本的，以及一个要在脚本 ActivityIndicator 进行过程中显示的：The key layout elements are a Button that starts the transcription process, a Label to contain the transcribed text, and an ActivityIndicator to show when transcription is in progress:

...>

... />

IsRunning="False" />

...

Clicked="TranscribeClicked"/>

实现语音服务Implement the Speech Service

MainPage.xaml.cs 代码隐藏文件包含用于从 Azure Speech Service 发送音频和接收转录文本的所有逻辑。The MainPage.xaml.cs code-behind file contains all of the logic to send audio and receive transcribed text from the Azure Speech Service.

MainPage构造函数从获取接口的实例 IMicrophoneService DependencyService ：The MainPage constructor gets an instance of the IMicrophoneService interface from the DependencyService:

public partial class MainPage : ContentPage

{

SpeechRecognizer recognizer;

IMicrophoneService micService;

bool isTranscribing = false;

public MainPage()

{

InitializeComponent();

micService = DependencyService.Resolve();

}

// ...

}

TranscribeClicked当点击实例时，将调用方法 transcribeButton ：The TranscribeClicked method is called when the transcribeButton instance is tapped:

async void TranscribeClicked(object sender, EventArgs e)

{

bool isMicEnabled = await micService.GetPermissionAsync();

// EARLY OUT: make sure mic is accessible

if (!isMicEnabled)

{

UpdateTranscription("Please grant access to the microphone!");

return;

}

// initialize speech recognizer

if (recognizer == null)

{

var config = SpeechConfig.FromSubscription(Constants.CognitiveServicesApiKey, Constants.CognitiveServicesRegion);

recognizer = new SpeechRecognizer(config);

recognizer.Recognized += (obj, args) =>

{

UpdateTranscription(args.Result.Text);

};

}

// if already transcribing, stop speech recognizer

if (isTranscribing)

{

try

{

await recognizer.StopContinuousRecognitionAsync();

}

catch(Exception ex)

{

UpdateTranscription(ex.Message);

}

isTranscribing = false;

}

// if not transcribing, start speech recognizer

else

{

Device.BeginInvokeOnMainThread(() =>

{

InsertDateTimeRecord();

});

try

{

await recognizer.StartContinuousRecognitionAsync();

}

catch(Exception ex)

{

UpdateTranscription(ex.Message);

}

isTranscribing = true;

}

UpdateDisplayState();

}

TranscribeClicked 方法执行以下操作：The TranscribeClicked method does the following:

检查应用程序是否有权访问麦克风，如果不是，则提前退出。Checks if the application has access to the microphone and exits early if it does not.

创建类的实例( SpeechRecognizer 如果它尚不存在)。Creates an instance of SpeechRecognizer class if it doesn't already exist.

如果正在进行，则停止运行。Stops continuous transcription if it is in progress.

插入时间戳，如果未在进行，则启动连续脚本。Inserts a timestamp and starts continuous transcription if it is not in progress.

通知应用程序基于新应用程序状态更新其外观。Notifies the application to update its appearance based on the new application state.

类方法的其余部分 MainPage 是用于显示应用程序状态的帮助程序：The remainder of the MainPage class methods are helpers for displaying the application state:

void UpdateTranscription(string newText)

{

Device.BeginInvokeOnMainThread(() =>

{

if (!string.IsNullOrWhiteSpace(newText))

{

transcribedText.Text += $"{newText}\n";

}

});

}

void InsertDateTimeRecord()

{

var msg = $"=================\n{DateTime.Now.ToString()}\n=================";

UpdateTranscription(msg);

}

void UpdateDisplayState()

{

Device.BeginInvokeOnMainThread(() =>

{

if (isTranscribing)

{

transcribeButton.Text = "Stop";

transcribeButton.BackgroundColor = Color.Red;

transcribingIndicator.IsRunning = true;

}

else

{

transcribeButton.Text = "Transcribe";

transcribeButton.BackgroundColor = Color.Green;

transcribingIndicator.IsRunning = false;

}

});

}

UpdateTranscription方法将提供的写入名为的 newText string Label 元素 transcribedText 。The UpdateTranscription method writes the provided newText string to the Label element named transcribedText. 它强制在 UI 线程上进行此更新，以便在不引发异常的情况下从任何上下文中进行调用。It forces this update to happen on the UI thread so it can be called from any context without causing exceptions. 将 InsertDateTimeRecord 当前日期和时间写入 transcribedText 实例，以标记新脚本的开头。The InsertDateTimeRecord writes the current date and time to the transcribedText instance to mark the start of a new transcription. 最后， UpdateDisplayState 方法更新 Button 和 ActivityIndicator 元素以反映脚本是否正在进行。Finally, the UpdateDisplayState method updates the Button and ActivityIndicator elements to reflect whether or not transcription is in progress.

创建平台麦克风服务Create platform microphone services

应用程序必须具有用于收集语音数据的麦克风访问权限。The application must have microphone access to collect speech data. IMicrophoneService接口必须在 DependencyService 每个平台上实现并注册，才能让应用程序正常工作。The IMicrophoneService interface must be implemented and registered with the DependencyService on each platform for the application to function.

AndroidAndroid

示例项目定义了一个 IMicrophoneService 名为的 Android 实现 AndroidMicrophoneService ：The sample project defines an IMicrophoneService implementation for Android called AndroidMicrophoneService:

[assembly: Dependency(typeof(AndroidMicrophoneService))]

namespace CognitiveSpeechService.Droid.Services

{

public class AndroidMicrophoneService : IMicrophoneService

{

public const int RecordAudioPermissionCode = 1;

private TaskCompletionSource tcsPermissions;

string[] permissions = new string[] { Manifest.Permission.RecordAudio };

public Task GetPermissionAsync()

{

tcsPermissions = new TaskCompletionSource();

if ((int)Build.VERSION.SdkInt < 23)

{

tcsPermissions.TrySetResult(true);

}

else

{

var currentActivity = MainActivity.Instance;

if (ActivityCompat.CheckSelfPermission(currentActivity, Manifest.Permission.RecordAudio) != (int)Permission.Granted)

{

RequestMicPermissions();

}

else

{

tcsPermissions.TrySetResult(true);

}

return tcsPermissions.Task;

}

public void OnRequestPermissionResult(bool isGranted)

{

tcsPermissions.TrySetResult(isGranted);

}

void RequestMicPermissions()

{

if (ActivityCompat.ShouldShowRequestPermissionRationale(MainActivity.Instance, Manifest.Permission.RecordAudio))

{

Snackbar.Make(MainActivity.Instance.FindViewById(Android.Resource.Id.Content),

"Microphone permissions are required for speech transcription!",

Snackbar.LengthIndefinite)

.SetAction("Ok", v =>

{

((Activity)MainActivity.Instance).RequestPermissions(permissions, RecordAudioPermissionCode);

})

.Show();

}

else

{

ActivityCompat.RequestPermissions((Activity)MainActivity.Instance, permissions, RecordAudioPermissionCode);

}

AndroidMicrophoneService具有以下功能：The AndroidMicrophoneService has the following features:

Dependency特性向注册该类 DependencyService 。The Dependency attribute registers the class with the DependencyService.

GetPermissionAsync方法根据 Android SDK 版本检查是否需要权限，并在 RequestMicPermissions 尚未授予权限的情况下调用。The GetPermissionAsync method checks if permissions are required based on the Android SDK version, and calls RequestMicPermissions if permission has not already been granted.

RequestMicPermissions方法使用 Snackbar 类从用户请求权限(如果需要基本原理)，否则它会直接请求音频录制权限。The RequestMicPermissions method uses the Snackbar class to request permissions from the user if a rationale is required, otherwise it directly requests audio recording permissions.

OnRequestPermissionResult bool 当用户对权限请求作出响应后，将使用结果调用方法。The OnRequestPermissionResult method is called with a bool result once the user has responded to the permissions request.

MainActivity自定义类，以便 AndroidMicrophoneService 在完成权限请求时更新实例：The MainActivity class is customized to update the AndroidMicrophoneService instance when permissions requests are complete:

public class MainActivity : global:::::no-loc(Xamarin.Forms):::.Platform.Android.FormsAppCompatActivity

{

IMicrophoneService micService;

internal static MainActivity Instance { get; private set; }

protected override void OnCreate(Bundle savedInstanceState)

{

Instance = this;

// ...

micService = DependencyService.Resolve();

}

public override void OnRequestPermissionsResult(int requestCode, string[] permissions, [GeneratedEnum] Android.Content.PM.Permission[] grantResults)

{

// ...

switch(requestCode)

{

case AndroidMicrophoneService.RecordAudioPermissionCode:

if (grantResults[0] == Permission.Granted)

{

micService.OnRequestPermissionResult(true);

}

else

{

micService.OnRequestPermissionResult(false);

}

break;

}

MainActivity类定义了一个名为的静态引用 Instance ， AndroidMicrophoneService 对象在请求权限时需要该引用。The MainActivity class defines a static reference called Instance, which is required by the AndroidMicrophoneService object when requesting permissions. OnRequestPermissionsResult AndroidMicrophoneService 当用户批准或拒绝了权限请求时，它将重写方法以更新对象。It overrides the OnRequestPermissionsResult method to update the AndroidMicrophoneService object when the permissions request is approved or denied by the user.

最后，Android 应用程序必须包括在 AndroidManifest.xml 文件中录制音频的权限：Finally, the Android application must include the permission to record audio in the AndroidManifest.xml file:

...

iOSiOS

示例项目定义了一个 IMicrophoneService 名为的 iOS 实现 iOSMicrophoneService ：The sample project defines an IMicrophoneService implementation for iOS called iOSMicrophoneService:

[assembly: Dependency(typeof(iOSMicrophoneService))]

namespace CognitiveSpeechService.iOS.Services

{

public class iOSMicrophoneService : IMicrophoneService

{

TaskCompletionSource tcsPermissions;

public Task GetPermissionAsync()

{

tcsPermissions = new TaskCompletionSource();

RequestMicPermission();

return tcsPermissions.Task;

}

public void OnRequestPermissionResult(bool isGranted)

{

tcsPermissions.TrySetResult(isGranted);

}

void RequestMicPermission()

{

var session = AVAudioSession.SharedInstance();

session.RequestRecordPermission((granted) =>

{

tcsPermissions.TrySetResult(granted);

});

}

iOSMicrophoneService具有以下功能：The iOSMicrophoneService has the following features:

Dependency特性向注册该类 DependencyService 。The Dependency attribute registers the class with the DependencyService.

GetPermissionAsync方法调用 RequestMicPermissions 来请求设备用户的权限。The GetPermissionAsync method calls RequestMicPermissions to request permissions from the device user.

RequestMicPermissions方法使用共享 AVAudioSession 实例来请求录制权限。The RequestMicPermissions method uses the shared AVAudioSession instance to request recording permissions.

OnRequestPermissionResult方法 TaskCompletionSource 用提供的值更新实例 bool 。The OnRequestPermissionResult method updates the TaskCompletionSource instance with the provided bool value.

最后，iOS 应用信息。 info.plist 必须包含一条消息，告知用户应用请求访问麦克风的原因。Finally, the iOS app Info.plist must include a message that tells the user why the app is requesting access to the microphone. 编辑 info.plist 文件，以在元素中包含以下标记：Edit the Info.plist file to include the following tags within the element:

...

NSMicrophoneUsageDescription

Voice transcription requires microphone access

UWPUWP

示例项目定义了一个 IMicrophoneService 名为的 UWP 实现 UWPMicrophoneService ：The sample project defines an IMicrophoneService implementation for UWP called UWPMicrophoneService:

[assembly: Dependency(typeof(UWPMicrophoneService))]

namespace CognitiveSpeechService.UWP.Services

{

public class UWPMicrophoneService : IMicrophoneService

{

public async Task GetPermissionAsync()

{

bool isMicAvailable = true;

try

{

var mediaCapture = new MediaCapture();

var settings = new MediaCaptureInitializationSettings();

settings.StreamingCaptureMode = StreamingCaptureMode.Audio;

await mediaCapture.InitializeAsync(settings);

}

catch(Exception ex)

{

isMicAvailable = false;

}

if(!isMicAvailable)

{

await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-microphone"));

}

return isMicAvailable;

}

public void OnRequestPermissionResult(bool isGranted)

{

// intentionally does nothing

}

UWPMicrophoneService具有以下功能：The UWPMicrophoneService has the following features:

Dependency特性向注册该类 DependencyService 。The Dependency attribute registers the class with the DependencyService.

此 GetPermissionAsync 方法尝试初始化 MediaCapture 实例。The GetPermissionAsync method attempts to initialize a MediaCapture instance. 如果此操作失败，则会启动用户请求以启用麦克风。If that fails, it launches a user request to enable the microphone.

此 OnRequestPermissionResult 方法存在以满足接口，但不是 UWP 实现所必需的。The OnRequestPermissionResult method exists to satisfy the interface but is not required for the UWP implementation.

最后，UWP Package. appxmanifest.xml 必须指定应用程序使用麦克风。Finally, the UWP Package.appxmanifest must specify that the application uses the microphone. 双击 appxmanifest.xml 文件，并在 Visual Studio 2019 的 " 功能 " 选项卡上选择 " 麦克风 " 选项：Double-click the Package.appxmanifest file and select the Microphone option on the Capabilities tab in Visual Studio 2019:

测试应用程序Test the application

运行应用，然后单击 " 转录 " 按钮。Run the app and click the Transcribe button. 应用应请求麦克风访问并开始脚本过程。The app should request microphone access and begin the transcription process. ActivityIndicator将动画显示，并显示脚本处于活动状态。The ActivityIndicator will animate, showing that transcription is active. 在说话时，应用程序会将音频数据流式传输到 Azure 语音服务资源，该资源将以转录文本响应。As you speak, the app will stream audio data to the Azure Speech Services resource, which will respond with transcribed text. 转录文本将在收到时显示在 Label 元素中。The transcribed text will appear in the Label element as it is received.

备注

Android 仿真程序无法加载和初始化语音服务库。Android emulators fail to load and initialize the Speech Service libraries. 对于 Android 平台，建议使用物理设备进行测试。Testing on a physical device is recommended for the Android platform.

android语音识别服务,使用语音服务 API 的语音识别 - Xamarin | Microsoft Docs

“相关推荐”对你有帮助么？