html5的个API：SpeechSynthesis语音合成的使用

baroncoder

已于 2023-09-07 14:51:03 修改

阅读量3.5k

点赞数 1

CC 4.0 BY-SA版权

文章标签： html5 语音识别前端

于 2023-09-07 11:19:16 首次发布

原文链接：https://mp.weixin.qq.com/s?__biz=MjM5MDA2MTI1MA==&mid=2649118413&idx=3&sn=3385dee75bcffa307baa79c3cde4095b&chksm=be587160892ff87605cf347eddad2ad7a55a957836a30a665416f8f976f6b2a3a75b8a4ad4df&scene=27

SpeechSynthesis用于将指定文字合成为对应的语音.也包含一些配置项,指定如何去阅读(语言,音量,音调)等等。

实例对象属性

lang 获取并设置话语的语言
pitch 获取并设置话语的音调(值越大越尖锐,越低越低沉)
rate 获取并设置说话的速度(值越大语速越快,越小语速越慢)
text 获取并设置说话时的文本
voice 获取并设置说话的声音
volume 获取并设置说话的音量

SpeechSynthesis方法

speak() 将对应的实例添加到语音队列中
cancel() 删除队列中所有的语音.如果正在播放,则直接停止
pause() 暂停语音
resume() 恢复暂停的语音
getVoices 获取支持的语言数组. 注意:必须添加在voiceschanged事件中才能生效

实例对象方法

onstart – 语音合成开始时候的回调。
onpause – 语音合成暂停时候的回调。
onresume – 语音合成重新开始时候的回调。
onend – 语音合成结束时候的回调。

简单实现

先从最简单的例子说起，如果想让浏览器读出“你好，世界！”的声音，可以下面的js代码：

let utterThis = new SpeechSynthesisUtterance('你好，世界！');speechSynthesis.speak(utterThis);

只需要这么一点代码就足够了，大家可以在自己浏览器的控制台里面运行上面两行代码，看看有没有读出声音。

除了使用speak方法，我们还可以实例对象属性text，因此上面的代码也可以写成：

let utterThis = new SpeechSynthesisUtterance();
utterThis.text = '你好，世界！';
utterThis.lang = 'zh';//汉语
utterThis.rate = 0.7;//语速
speechSynthesis.speak(utterThis);

项目实战

html：


<div class="voiceinator">
  <h1>听说 5000</h1>
  
  <select name="voice" id="voices">
    <option value="">Select A Voice</option>
  </select>
  
  <label for="rate">Rate:</label>
  <input name="rate" type="range" min="0" max="3" value="1" step="0.1">
  
  <label for="pitch">Pitch:</label>
  <input name="pitch" type="range" min="0" max="2" step="0.1">
  
  <textarea name="text">你好 给你点?</textarea>
  
  <button id="stop">Stop!</button>
  <button id="speak">Speak</button>
</div>

JavaScript：

const synth = window.speechSynthesis
const msg = new SpeechSynthesisUtterance()
let voices = []
const voicesDropdown = document.querySelector('[name="voice"]')
const options = document.querySelectorAll('[type="range"], [name="text"]')
const speakButton = document.querySelector('#speak')
const stopButton = document.querySelector('#stop')

msg.text = '你好 给你点?'
msg.lang = 'zh-CN'

synth.addEventListener('voiceschanged',getSupportVoices)
speakButton.addEventListener('click',throttle(handleSpeak,1000))
stopButton.addEventListener('click',handleStop)
options.forEach(e => e.addEventListener('change',handleChange))

function getSupportVoices() {
  voices = synth.getVoices()
  voices.forEach(e => {
      const option = document.createElement('option')
      option.value = e.lang
      option.text = e.name
      voicesDropdown.appendChild(option)
   })
}
function handleSpeak(e) {
  msg.lang = voicesDropdown.selectedOptions[0].value
  synth.speak(msg)
}
function handleStop(e) {
  synth.cancel(msg)
}
function handleChange(e) {
  msg[this.name] = this.value
}
function throttle(fn,delay) {
  let last = 0
  return function() {
      const now = new Date()
      if(now - last > delay) {
        fn.apply(this,arguments)
        last = now
      }
  }
}

代码解读

html部分：

页面布局方面就是通过select下拉菜单选择需要转换为什么语言，具体包括什么语言是通过js动态加载的。

其次分别用两个input的滑动来选择语音播报的速度和音调。

通过修改textarea来设置需要播报的文字内容。

最后通过按钮来控制语音的播报和停止。

JS部分：

首先通过const synth = window.speechSynthesis来创建语音,用const msg = new SpeechSynthesisUtterance()来创建文本实例设置默认播报的文本和语言:msg.text和msg.lang。

通过voiceschanged事件来动态获取支持的语言种类,并生成options添加到html中.其中最主要的方法就是synth.getVoices()获取.各位可以通过自行打印获取到的数组查看具体包含的属性。

创建按钮点击事件,分别通过synth.speak(msg)和synth.cancel(msg)来播放和取消播放。

在播放前通过voicesDropdown.selectedOptions[0].value来设置文本的语言(这里如果文本的内容语言和播报选择的语言不一致的话会出现乱读的情况)。

最后添加了一个节流函数,防止多次点击按钮不断播放(最好是能获取播放的时长,或监听播报完毕事件,这里就是简单的2秒识别一次,有兴趣的小伙伴可以自行编写)。

遇到问题

1、google chrome播放语音可能会卡住，所以无声音。

解决方法：在播放语音之前先调用一下cancel方法：

window.speechSynthesis.cancel()

2、出现警告:speechSynthesis.speak() without user activation is no longer allowed since M71, around December 2018.

解决方法：进去必须有一个事件动作，如点击事件click，或者你直接鼠标点击页面某处就可以播放了。

3、SpeechSynthesisUtterance在浏览器会存在兼容性问题（如IE不支持），目前主流浏览器如Chrome，Edge，Safari等等都是支持的。

解决方案,提示用户更换其他浏览器访问，代码：


if(!!window.SpeechSynthesisUtterance){
   console.log（"请使用其他浏览器！"）
}

在react 项目中使用

import { Button, Input, Select } from 'antd';
import { useEffect, useState } from 'react';
import { useLocation } from 'react-router-dom';
// 用于移动端调试
import VConsole from 'vconsole';
const { Option } = Select;
const SpeechText = () => {
  const route = useLocation();
  const [text, setText] = useState('你好');
  const [voices, setVoices] = useState<SpeechSynthesisVoice[]>([]);
  const [selectedVoice, setSelectedVoice] =
    useState<SpeechSynthesisVoice | null>(null);
  function populateVoiceList() {
    if (!speechSynthesis) {
      return;
    }
    const voices = speechSynthesis.getVoices().sort(function (a, b) {
      const aname = a.name.toUpperCase(),
        bname = b.name.toUpperCase();
      if (aname < bname) return -1;
      else if (aname === bname) return 0;
      else return +1;
    });
    const formatter = voices.map((voice) => {
      return {
        default: voice.default,
        lang: voice.lang,
        localService: voice.localService,
        name: voice.name,
        voiceURI: voice.voiceURI,
      };
    });
    console.log('available voices', formatter);
    setVoices(voices);
  }
  useEffect(() => {
    populateVoiceList();
    if (speechSynthesis.onvoiceschanged !== undefined) {
      speechSynthesis.onvoiceschanged = populateVoiceList;
    }
    if (route.search.includes('debug=1')) {
      new VConsole();
    }
  }, [route.search]);
  const handleUpdate = (event: any) => {
    setText(event.target.value);
  };
  const handlePlay = () => {
    const msg = new SpeechSynthesisUtterance(text);
    msg.voice = selectedVoice;
    speechSynthesis.speak(msg);
  };
  const handleSelectVoice = (value: any) => {
    console.log(value);
    const selected = voices.find((voice) => voice.name === value);
    selected && setSelectedVoice(selected);
  };
  return (
    <div style={{ padding: 24 }}>
      <div style={{ paddingBottom: 16 }}>语音播放测试</div>
      <div>
        输入文字：
        <div>
          <Input
            value={text}
            onChange={handleUpdate}
            style={{ width: 240, marginBottom: 12 }}
          ></Input>
        </div>
      </div>
      <div style={{ marginBottom: 12 }}>
        选择播放源：
        <div>
          <Select style={{ minWidth: 240 }} onChange={handleSelectVoice}>
            {voices.map((voice) => (
              <Option key={voice.name} value={voice.name}>
                {voice.name + ` (${voice.lang}) `}
              </Option>
            ))}
          </Select>
        </div>
      </div>
      <Button type="primary" onClick={handlePlay}>
        播放
      </Button>
    </div>
  );
};

export default SpeechText;