DeepSpeech的使用尝试Linux环境下

最新推荐文章于 2025-04-11 11:48:33 发布

nqct1

最新推荐文章于 2025-04-11 11:48:33 发布

阅读量2k

点赞数

分类专栏：实践操作文章标签： python 语音识别

本文链接：https://blog.csdn.net/qq_45978862/article/details/127178451

版权

实践操作专栏收录该内容

1 篇文章

订阅专栏

环境：Ubuntu18.0.4 python3.6

安装DeepSpeech：会自动安装最新的版本

pip install deepspeech

或者，也可以指定版本：

pip install deepspeech~=0.9.3

● 首先wget获取deepspeech的model：这里选取最新的0.9.3

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm

● 在wget语音数据，这里下载了 0.4.0版本的。

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/audio-0.4.1.tar.gz

● 然后将语言解压：

tar -xvf audio-0.4.1.tar.gz

● 然后执行下列语句：

deepspeech --model deepspeech-0.9.3-models.pbmm  --audio audio/4507-16021-0012.wav

结果如下：

Loading model from file deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2022-10-05 23:10:21.707689: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0969s.
Running inference.
why should one hald on the way
Inference took 4.557s for 2.735s audio file.

输出的英文为： why should one hald on the way

同理可以输出其他audio对应的英文。

题外话：
可以查看deepspeech可执行的的命令：

deepspeech# deepspeech --help
usage: deepspeech [-h] --model MODEL [--scorer SCORER] --audio AUDIO [--beam_width BEAM_WIDTH] [--lm_alpha LM_ALPHA]
                  [--lm_beta LM_BETA] [--version] [--extended] [--json]
                  [--candidate_transcripts CANDIDATE_TRANSCRIPTS] [--hot_words HOT_WORDS]

Running DeepSpeech inference.

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         Path to the model (protocol buffer binary file)
  --scorer SCORER       Path to the external scorer file
  --audio AUDIO         Path to the audio file to run (WAV format)
  --beam_width BEAM_WIDTH
                        Beam width for the CTC decoder
  --lm_alpha LM_ALPHA   Language model weight (lm_alpha). If not specified, use default from the scorer package.
  --lm_beta LM_BETA     Word insertion bonus (lm_beta). If not specified, use default from the scorer package.
  --version             Print version and exits
  --extended            Output string from extended metadata
  --json                Output json from metadata with timestamp of each word
  --candidate_transcripts CANDIDATE_TRANSCRIPTS
                        Number of candidate transcripts to include in JSON output
  --hot_words HOT_WORDS
                        Hot-words and their boosts.