deepspeech实时语音识别

zhqh100

已于 2022-12-08 20:31:57 修改

阅读量2.2k

点赞数 1

分类专栏：语音识别 TensorFlow 文章标签：语音识别 python 人工智能

于 2019-12-17 17:24:17 首次发布

本文链接：https://blog.csdn.net/zhqh100/article/details/103584057

版权

TensorFlow 同时被 2 个专栏收录

48 篇文章 1 订阅

订阅专栏

语音识别

2 篇文章 0 订阅

订阅专栏

DeepSpeech-examples/README.rst at r0.6 · mozilla/DeepSpeech-examples · GitHub

下载该工程

git clone https://github.com/mozilla/DeepSpeech-examples.git

安装依赖

conda install numpy
sudo apt install portaudio19-dev 
pip install pyaudio
pip install deepspeech
pip install webrtcvad
pip install halo
conda install scipy

注意的一点是，pyaudio不要用conda安装，会报一个错

OSError: [Errno -9996] Invalid input device (no default output device)

下载model

Releases · mozilla/DeepSpeech · GitHub

或者

cd mic_vad_streaming/

wget -c https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz

tar xvf deepspeech-0.6.0-models.tar.gz

执行如下命令开始语音识别

python3 mic_vad_streaming.py \
  --model deepspeech-0.6.0-models/output_graph.pbmm \
  --lm deepspeech-0.6.0-models/lm.binary \
  --trie deepspeech-0.6.0-models/trie

如果要支持GPU，应该安装GPU版本的deepspeech

pip install deepspeech-gpu

conda install cudatoolkit==10.0.130

_____________________________________________________________________

如果是识别指定语音文件的话，我用上面的代码感觉识别不成功，按道理命令

python mic_vad_streaming.py --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --file audio/2830-3980-0043.wav

应该是可以识别语音的，因为该工程下的test.sh的示例命令就是这么写的，但我没有识别成功过

_____________________________________________________________________

通过如下命令可以识别指定语音文件：

下载工程 https://github.com/mozilla/DeepSpeech.git ，进入该工程的 native_client/python 目录，执行如下命令进行指定语音文件识别：

python client.py --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav

极度精简之后(不考虑任何异常情况)直接识别示例语音的核心代码如下：

import numpy as np
import wave
from deepspeech import Model

ds = Model('deepspeech-0.6.0-models/output_graph.pbmm', 500)
ds.enableDecoderWithLM('deepspeech-0.6.0-models/lm.binary', 'deepspeech-0.6.0-models/trie', 0.75, 1.85)
fin = wave.open('audio/2830-3980-0043.wav', 'rb')
audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)
fin.close()
print(ds.stt(audio))

zhqh100

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
deepspeech实时语音识别

https://github.com/mozilla/DeepSpeech-examples/blob/r0.6/mic_vad_streaming/README.rst下载该工程git clone https://github.com/mozilla/DeepSpeech-examples.git安装依赖conda install numpysudo apt inst...
复制链接

扫一扫

专栏目录