语音转文字demo——pip安装DeepSpeech体验

最新推荐文章于 2025-01-28 23:21:08 发布

qq_27158179

最新推荐文章于 2025-01-28 23:21:08 发布

阅读量1w

点赞数 2

分类专栏：开源项目

本文链接：https://blog.csdn.net/qq_27158179/article/details/90137864

版权

开源项目专栏收录该内容

8 篇文章

订阅专栏

本文介绍如何在Ubuntu18.04.2LTS环境下安装并测试DeepSpeech语音识别软件。DeepSpeech是Mozilla开源的语音识别项目，基于百度的DeepSpeech论文。文中详细描述了使用pip安装DeepSpeech、下载官方提供的预训练英文模型及测试音频文件的过程，并展示了识别结果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

0 环境

Ubuntu 18.04.2 LTS的电脑即可。我的电脑是i3-6100CPU，无外接GPU，内存8G。64位系统。

Python 3.6.7（以前电脑就安装了）

TensorFlow 1.12.0（以前电脑就安装了）

DeepSpeech 0.4.1

1 要求

要求电脑是Linux或者Mac。python 3.6

2 简介

DeepSpeech是Mozilla开源的软件，网址是github.com/mozilla/deepspeech。算法基于百度研究院关于DeelSpeech的论文。根据DeepSpeech在github仓库的说明，支持5秒左右的wav录音文件，输出直接是文本。本软件需要安装DeepSpeech本体，另外加上训练好的模型。即可运行识别。

2 安装

使用pip能快捷的安装上deepspeech本体。另外还需要wget一个2G左右的训练好的用于识别英文的模型。赞不支持中文，中文需要另外训练模型。

2.1 安装DeepSpeech

pip3 install deepspeech

2.2 下载训练模型（官方模型暂支持英文）

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/deepspeech-0.4.1-models.tar.gz

2.3 下载用于测试的英文录音文件wav格式

这份压缩文件2G左右，解压完成后占空间2.6G。我的坐标是广州。这里的网络下载github的releases文件速度还可以接受。但是这边git clone真的折腾人。

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/audio-0.4.1.tar.gz

2.4 解压训练模型

tar -xvfz deepspeech-0.4.1-models.tar.gz

2.5 解压音频样品

tar -xvfz audio-0.4.1.tar.gz

3 测试

这里的文件结构是，DeepSpeech包含了两个文件夹，一个是audio，一个是models。见下图。

我们可以在这里的DeepSpeech根目录下，执行以下代码：

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio/4507-16021-0012.wav

demo就跑起来了。以下是识别结果：why should one hall to on the way

Loading model from file models/output_graph.pbmm
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-0-g0e40db6
2019-05-12 10:31:25.040413: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.00448s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 0.147s.
Running inference.
why should one hall to on the way
Inference took 1.634s for 2.735s audio file.

参考资料：

1. 语音识别开源软件--DeepSpeech（1）安装和使用

2. DeepSpeech的github地址