ubuntu使用DeepSpeech进行语音识别（包含交叉编译）

最新推荐文章于 2025-03-30 09:29:47 发布

山河君

最新推荐文章于 2025-03-30 09:29:47 发布

阅读量2.6k

点赞数 29

分类专栏： C++之开源框架语音识别文章标签： ubuntu 语音识别 linux

本文链接：https://blog.csdn.net/qq_42956179/article/details/143569716

版权

文章目录

前言
一、DeepSpeech编译
二、DeepSpeech使用示例
三、核心代码分析
- 1.创建模型核心代码
- 2.识别过程核心代码
四、交叉编译
- 1.交叉编译
- 2.使用
总结

前言

由于工作需要语音识别的功能，环境是在linux arm版上，所以想先在ubuntu上跑起来看一看，就找了一下语音识别的开源框架，选中了很多框架可以看编译vosk那篇文章，现在一一试验一下。

本篇博客将会在ubuntu上进行DeepSpeech编译使用，并且进行交叉编译。

|版本声明：山河君，未经博主允许，禁止转载

一、DeepSpeech编译

如果想先自己编编看，可以先看这里，如果想直接使用库文件等，可以跳过本节，下文会标注出官方支持的各种平台已经编好的二进制文件。

不过博主还是建议先自己编编看，因为源码中有一个文件是官方的示例文档，还是值得一看的。

下载依赖项

sudo apt-get update
sudo apt-get install -y \
    build-essential \
    libatlas-base-dev \
    libfftw3-dev \
    libgfortran5 \
    sox \
    libsox-dev
 sudo apt-get install libmagic-dev

下载DeepSpeech源码

git clone https://github.com/mozilla/DeepSpeech.git
cd DeepSpeech
git submodule sync tensorflow/
git submodule update --init tensorflow/

DeepSpeech是使用bazel构建的，下载bazel

sudo apt install curl
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
sudo apt update && sudo apt install bazel

配置tensorlow

cd tensorflow
./configure
ln -s ../native_client

如果native_client不存在，使用native_client进行创建

编译

只需要库文件

bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so

库和可执行文件

bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_scorer_package

native_client存在deepspeech可执行文件，值得注意的是，头文件是deepspeech.h，而client.cc是C++示例文件
在这里插入图片描述
tensorflow/baze-bin/native_client下存在对应库文件

二、DeepSpeech使用示例

模型下载地址

模型文件：deepspeech-0.9.3-models-zh-CN.pbmm
打分文件：deepspeech-0.9.3-models-zh-CN.scorer
在这里插入图片描述

./deepspeech --model /home/aaron/workplace/audioread/deepspeech-0.9.3-models-zh-CN.pbmm --scorer /home/aaron/workplace/audioread/deepspeech-0.9.3-models-zh-CN.scorer --audio /home/aaron/workplace/audioread/test.wav

在这里插入图片描述

三、核心代码分析

核心代码是上文提到client.cc文件中的示例代码

1.创建模型核心代码

// Initialise DeepSpeech
    ModelState* ctx;
    // sphinx-doc: c_ref_model_start
    int status = DS_CreateModel(model, &ctx);
    if (status != 0) {
   
        char* error = DS_ErrorCodeToErrorMessage(status)

最低0.47元/天解锁文章