speech api_如何使用Web Speech API构建文本语音转换应用

最新推荐文章于 2024-07-24 20:44:50 发布

cukw6666

最新推荐文章于 2024-07-24 20:44:50 发布

阅读量1k

点赞数

文章标签： vue java javascript js python ViewUI

原文链接：https://www.digitalocean.com/community/tutorials/how-to-build-a-text-to-speech-app-with-web-speech-api

版权

本文档介绍了如何使用Web Speech API来创建一个文本转语音应用程序。内容涵盖了获取API引用、获取可用语音、构建言语表达和说出话语的步骤。此外，还展示了如何检查浏览器支持、获取设备上可用的语音以及构建和说出语音表达。文章通过逐步指导，帮助读者利用JavaScript实现文本到语音的转换功能。

摘要由CSDN通过智能技术生成

speech api

介绍 (Introduction)

Assuming that you’ve used several apps over the years, there is a very high chance that you have interacted with apps that provide some form of voice experience. It could be an app with text-to-speech functionality, like reading your text messages or notifications aloud. It could also be an app with voice recognition functionality like Siri or Google Assistant.

假设您多年来使用过多个应用程序，那么您与提供某种形式的语音体验的应用程序进行交互的可能性很大。它可能是一个具有文本到语音功能的应用程序，例如大声阅读您的短信或通知。它也可能是具有语音识别功能的应用程序，例如Siri或Google Assistant。

With the advent of HTML5, there has been a very fast growth in the number of API available on the web platform. Over the years, we have come across API such as WebSocket, File, Geolocation, Notification, Battery, Vibration, DeviceOrientation, WebRTC, etc. Some of these API have gained very high support across various browsers.

随着HTML5的到来，Web平台上可用的API的数量有了非常快速的增长。多年以来，我们遇到了诸如WebSocket，文件，地理位置，通知，电池，振动，DeviceOrientation，WebRTC等API，其中一些API在各种浏览器中都获得了很高的支持。

There are a couple of API known as the Web Speech API that have been developed to make it easy to seamlessly build varying kinds of voice applications and experiences for the web. These API are still pretty experimental, although there is increasing support for most of them across all the modern browsers.

已经开发了一些称为Web语音API的API ，可以轻松无缝地为Web构建各种语音应用程序和体验。这些API仍处于试验阶段，尽管在所有现代浏览器中对它们中的大多数都有越来越多的支持。

第1步-使用Web Speech API (Step 1 — Using the Web Speech API)

The Web Speech API is broken into two major interfaces:

Web Speech API分为两个主要接口：

SpeechSynthesis - For text-to-speech applications. This allows apps to read out their text content using the device’s speech synthesizer. The available voice types are represented by a SpeechSynthesisVoice object, while the text to be uttered is represented by a SpeechSynthesisUtterance object. See the support table for the SpeechSynthesis interface to learn more about browser support.
SpeechSynthesis-适用于文本到语音的应用程序。这使应用程序可以使用设备的语音合成器读出其文本内容。可用的语音类型由SpeechSynthesisVoice对象表示，而要说出的文本则由SpeechSynthesisUtterance对象表示。请参阅SpeechSynthesis接口的支持表，以了解有关浏览器支持的更多信息。
SpeechRecognition - For applications that require asynchronous voice recognition. This allows apps to recognize voice context from an audio input. A SpeechRecognition object can be created using the constructor. The SpeechGrammar interface exists for representing the set of grammar that the app should recognize. See the support table for the SpeechRecognition interface to learn more about browser support.
语音识别 -对于需要异步语音识别应用。这使应用程序可以从音频输入中识别语音上下文。可以使用构造函数创建SpeechRecognition对象。 SpeechGrammar接口用于表示应用程序应识别的一组语法。请参阅SpeechRecognition界面的支持表，以了解有关浏览器支持的更多信息。

In this tutorial, you will use the SpeechSynthesis interface to build a text-to-speech app. Here is a demo screenshot of what the app will look like (without the sound):

在本教程中，您将使用SpeechSynthesis界面构建文本到语音应用程序。这是该应用程序外观的演示屏幕截图(无声音)：

获取参考 (Getting a Reference)

Getting a reference to a SpeechSynthesis object can be done with a single line of code:

SpeechSynthesis一行代码即可获得对SpeechSynthesis对象的引用：

var synthesis = window.speechSynthesis;

It is very useful to check if SpeechSynthesis is supported by the browser before using the functionality it provides. The following code snippet shows how to check for browser support:

在使用浏览器提供的功能之前，检查浏览器是否支持SpeechSynthesis非常有用。以下代码段显示了如何检查浏览器支持：

if ('speechSynthesis' in window) {
  var synthesis = window.speechSynthesis;

} else {
  console.log('Text-to-speech not supported.');
}

获取可用的声音 (Getting Available Voices)

In this step you will build on your already existing code to get the available speech voices. The getVoices() method returns a list of SpeechSynthesisVoice objects representing all the available voices on the device.

在这一步中，您将基于已经存在的代码来获取可用的语音。 getVoices()方法返回一个SpeechSynthesisVoice对象列表， SpeechSynthesisVoice对象代表设备上所有可用的语音。

Take a look at the following code snippet:

看一下以下代码片段：

if ('speechSynthesis' in window) {

  var synthesis = window.speechSynthesis;

  // Regex to match all English language tags e.g en, en-US, en-GB
  var langRegex = /^en(-[a-z]{2})?$/i;

  // Get the available voices and filter the list to only have English speakers
  var voices = synthesis.getVoices().filter(voice => langRegex.test(voice.lang));

  // Log the properties of the voices in the list
  voices.forEach(function(voice) {
    console.log({
      name: voice.name,
      lang: voice.lang,
      uri: voice.voiceURI,
      local: voice.localService,
      default: voice.default
    })
  });

} else {
  console.log('Text-to-speech not supported.');
}

In the above snippet, you get the list of available voices on the device, and filter the list using the langRegex regular expression to ensure that we get voices for only English speakers. Finally, you loop through the voices in the list and log the properties of each to the console.

在上面的代码段中，您将获得设备上可用语音的列表，并使用langRegex正则表达式过滤列表以确保我们仅获得英语使用者的语音。最后，您遍历列表中的声音并将每个声音的属性记录到控制台。

构建言语表达 (Constructing Speech Utterances)

In this step you will construct speech utterances by using the SpeechSynthesisUtterance constructor and setting values for the available properties. The following code snippet creates a speech utterance for reading the text "Hello World".

在这一步中，您将通过使用SpeechSynthesisUtterance构造函数并为可用属性设置值来构造语音。以下代码段创建了语音朗读，用于读取文本"Hello World" 。

if ('speechSynthesis' in window) {

  var synthesis = window.speechSynthesis;

  // Get the first `en` language voice in the list
  var voice = synthesis.getVoices().filter(function(voice) {
    return voice.lang === 'en';
  })[0];

  // Create an utterance object
  var utterance = new SpeechSynthesisUtterance('Hello World');

  // Set utterance properties
  utterance.voice = voice;
  utterance.pitch = 1.5;
  utterance.rate = 1.25;
  utterance.volume = 0.8;

  // Speak the utterance
  synthesis.speak(utterance);

} else {
  console.log('Text-to-speech not supported.');
}

Here, you get the first en language voice from the list of available voices. Next, you create a new utterance using the SpeechSynthesisUtteran