刘永浩记__Kaldi理解与部署cvte模型

最新推荐文章于 2024-06-11 10:27:21 发布

衣晨曦的小娇妻

最新推荐文章于 2024-06-11 10:27:21 发布

阅读量210

点赞数

文章标签：自然语言处理语音识别深度学习

本文链接：https://blog.csdn.net/HeiTanMeiQiu/article/details/133439723

版权

Kaldi理解与部署cvte模型

基本知识

kaldi训练模型时中需要准备的五个文本

# id--->path  : BAC001 /data/train/BAC001.wav 
1 wav.scp      

# id-->文本    :BAC001 wo1 shi1 ni3 ba4 ba4 (有可能标注的是汉字)
2 text         

# id-->spk (一对一)
3 utt2spk     

# spk-->id (一对多)
4 spk2utt     

# 词----音素(不是音标)   注意多音词一个词多个音都要写    市盈率 sh ix4 ing2 l v4
5 lexicon.txt

基本的语音识别的流程

特征-->[  声学模型(H.fst)   ]-->音素-->[   语音模型(G.fst)   ]--->字词

wav.scp     --|
text          |
utt2spk       |--------->kaldi ----->L.fst
spk2utt       |
lexicon.txt --|

安装kaldi

安装git

sudo apt install git

克隆kaldi

git clone https://github.com/kaldi-asr/kaldi.git

kaldi目录下包的作用

# 一般常用的是src、tools和egs包

src（源代码）包：  包含Kaldi的核心源代码，提供了实现语音识别所需的基础功能，例如声学建模、特征提取、解码器等。

tools（工具）包：  包含与Kaldi相关的工具和第三方依赖项。这些工具可以用于数据预处理、特征提取、模型训练等。例如，sph2pipe工具用于处理SPH音频格式，sox工具用于音频格式转换和处理

egs（示例）包：包含一些示例和教程，用于演示如何使用Kaldi构建语音识别系统。这些示例通常包括数据准备、特征提取、模型训练和解码等步骤，帮助用户理解和使用Kaldi。


cmake 目录：包含用于构建 Kaldi 的 CMake 脚本和相关文件。CMake 是一个跨平台的构建工具，用于自动化构建过程

scripts 目录：包含用于执行各种任务的脚本文件。这些脚本可以用于数据准备、特征提取、模型训练、解码等操作。

运行kaldi前的准备工作(这一步是部署wsf，可以不看)

找到egs示例包中的随便一个示例，然后进入，再进入s5，里面会有path.sh,用vim的方式进入然后修改路径为Linux本地的kaldi路径，再执行 sh path.sh 配置好环境变量

cd kaldi      #切换到kaldi目录下
pwd           #查看路径：/root/kaldi

cd kaldi/egs/wsf/s5  # 切换到s5文件夹
ls        #cmd.sh  conf  local  path.sh  RESULTS  rnnlm  run.sh  steps  utils
vim path.sh

#然后再path.sh中修改路径
export KALDI_ROOT=/root/kaldi

# 然后运行path.sh加载环境变量
sh path.sh

# 注：sh 是用于在 Unix/Linux 系统上执行 Shell 脚本的命令。它是 Shell 解释器的一种，用于解释和执行 Shell 脚本文件中的命令。

1 安装还未安装的包

# 找到kaldi目录下的tools进入该文件夹
# 检查需要安装的库
extras/check_dependencies.sh

# 显示(这些都是需要你下载的)：
extras/check_dependencies.sh: automake is not installed.
extras/check_dependencies.sh: autoconf is not installed.
extras/check_dependencies.sh: sox is not installed.
extras/check_dependencies.sh: gfortran is not installed
extras/check_dependencies.sh: neither libtoolize nor glibtoolize is installed
extras/check_dependencies.sh: subversion is not installed
extras/check_dependencies.sh: python2.7 is not installed
extras/check_dependencies.sh: Intel MKL does not seem to be installed.
 ... Run extras/install_mkl.sh to install it. Some distros (e.g., Ubuntu 20.04) provide
 ... a version of MKL via the package manager, but verify that it is up-to-date.
 ... You can also use other matrix algebra libraries. For information, see:
 ...   http://kaldi-asr.org/doc/matrixwrap.html
extras/check_dependencies.sh: Some prerequisites are missing; install them using the command:
  sudo apt-get install automake autoconf sox gfortran libtool subversion python2.7
  
  # 然后再一个一个的安装
  用 sudo apt-get install --名称--  下载

下载MKL时遇到的问题

# 这一个命令是运行 extras/check_dependencies.sh 时建议我们这样做的，这样可以更好的安装未安装的包
sh extras/install_mkl.sh
# 错误为：extras/install_mkl.sh: 13: set: Illegal option -o pipefail

# 解决方法
不能用sh命令来执行，要用 ". file.sh"来执行，即 . extras/install_mkl.sh

# 再执行
extras/check_dependencies.sh

# 显示
extras/check_dependencies.sh: all OK.

说明配置成功

2 继续在tool文件夹中make编译一下

make

最后编译后报错

g++: fatal error: Killed signal terminated program cc1plus
# 原因是分配的内存大小不够，因为云服务器无卡模式只分配2gb，咱们有卡模式开机再编译一遍

# 等待时间会有些长，最后显示下方信息，表示编译成功
Warning: IRSTLM is not installed by default anymore. If you need IRSTLM
Warning: use the script extras/install_irstlm.sh
All done OK.

3、进入/kaldi/src目录下并进行外部库安装情况的检查并进行后续的编译

./configure --shared

# 一大串信息中会出现一下的内容，这表明kaldi已经成功编译，你还要编译下面的内容
#其中＜NCPU＞是您可以负担得起的并行构建数量。如果不确定，使用以GB为单位的CPU数量或RAM数量除以2中较小的一个，保持在安全范围内不带数字值的make-j可能不受限制并行作业的数量甚至超过了强大的工作站，因为Kaldi构建是高度并行化的。
Kaldi has been successfully configured. To compile:

  make -j clean depend; make -j <NCPU>
  
  # 因为我租用的服务器cpu是12核，所以我写的<NCPU>为6即分别执行以下命令
  make -j clean depend
  make -j 6
  
  # 成功编译后会有以下提示：
  make[1]: Leaving directory '/root/kaldi/src/latbin'
Done

4、测试kaldi是否配置成功

进入 kaldi/egs/yesno/s5 目录下运行以下命令

# 表示执行已经训练好的模型
./run.sh

# 最后末尾出现以下信息代表运行成功
local/score.sh --cmd utils/run.pl data/test_yesno exp/mono0a/graph_tgpr exp/mono0a/decode_test_yesno
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
%WER 0.00 [ 0 / 232, 0 ins, 0 del, 0 sub ] exp/mono0a/decode_test_yesno/wer_10_0.0

#部署cvte的博客

https://blog.csdn.net/Ryan0828/article/details/121185749?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522169598808016800215030410%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=169598808016800215030410&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~rank_v31_ecpm-1-121185749-null-null.142^v94^insert_down1&utm_term=%E5%9C%A8Kaldi%E4%B8%8A%E9%83%A8%E7%BD%B2CVTE%E6%A8%A1%E5%9E%8B&spm=1018.2226.3001.4187

部署cvte模型

1、下载模型加载到kaldi中

# 从下面的连接中下载V2模型，然后将下载后的模型解压到kaldi目录中的egs，即egs/cvte，要保证文件kaldi/egs/cvte/s5的存在

http://kaldi-asr.org/models/m2

2、准备需要测试的语音文件

# 1 格式要求16-bit位深，采样率16000Hz，单声道，wav格式(可以采用adobe audition软件录制)，文件详细要求（可能是用sox工具来转换的）


sox --info ~/autodl-tmp/data/wav/001.wav

Input File     : '/root/autodl-tmp/data/wav/001.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.38 = 166000 samples ~ 778.125 CDDA sectors
File Size      : 332k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

# 2 将测试语音文件.wav文件放置在`/egs/cvte/s5/data/wav/00030/`下。这里由于还没有语音格式转换的工具，先用thchs30数据集中的wav文件暂时代替一下，测试用

3、配置cvte

# 1 将egs/wsj/s5中的steps和utils拷贝到egs/cvte/s5目录下

# 2 打开utils/lang/check_phones_compatible.sh，将其中if语句中的 exit 1注释掉


# check if the files exist or not
if [ ! -f $table_first ]; then
  if [ ! -f $table_second ]; then
    echo "$0: Error! Both of the two phones-symbol tables are absent."
    echo "Please check your command"
    # exit 1;

4、执行run.sh

# 在Teminal中cd到egs/cvte/s5的目录中，执行指令：
./run.sh

5、执行./run.sh时的错误以及解决方法

# 执行./run.sh时，出现如下的错误
bash: ./run.sh: Permission denied

#查看文件权限
root@autodl-container-cbb611a852-a643331c:~/kaldi/egs/cvte/s5# ls -l
total 24
-rw-r--r--  1 root root  954 Jun 20  2017 cmd.sh
drwxr-xr-x  2 root root   24 Sep 29 19:50 conf
drwxr-xr-x  5 root root   64 Sep 29 19:50 data
drwxr-xr-x  3 root root   19 Sep 29 19:50 exp
drwxr-xr-x  3 root root   18 Sep 29 20:50 fbank
drwxr-xr-x  2 root root    6 Sep 29 20:50 local
-rw-r--r--  1 root root  374 Jun 20  2017 path.sh
-rw-r--r--  1 root root  937 Jun 20  2017 run.sh
drwxr-xr-x 19 root root 4096 Sep 30 12:42 steps
drwxr-xr-x 11 root root 4096 Sep 30 12:43 utils

 #可知，run.sh文件缺少执行权限x，使用如下命令修改文件权限 ：
 chmod 775 run.sh
 
 #再次查看，run.sh已经有执行权限x
 root@autodl-container-cbb611a852-a643331c:~/kaldi/egs/cvte/s5# chmod 755 run.sh
root@autodl-container-cbb611a852-a643331c:~/kaldi/egs/cvte/s5# ls -l
total 24
-rw-r--r--  1 root root  954 Jun 20  2017 cmd.sh
drwxr-xr-x  2 root root   24 Sep 29 19:50 conf
drwxr-xr-x  5 root root   64 Sep 29 19:50 data
drwxr-xr-x  3 root root   19 Sep 29 19:50 exp
drwxr-xr-x  3 root root   18 Sep 29 20:50 fbank
drwxr-xr-x  2 root root    6 Sep 29 20:50 local
-rw-r--r--  1 root root  374 Jun 20  2017 path.sh
-rwxr-xr-x  1 root root  937 Jun 20  2017 run.sh
drwxr-xr-x 19 root root 4096 Sep 30 12:42 steps
drwxr-xr-x 11 root root 4096 Sep 30 12:43 utils

#继续执行./run.sh,发现如下错误： 
./run.sh: line 14: steps/make_fbank.sh: Permission denied
#看到是在step包下的make_fbank.sh没有执行权限，那咱们就再去step包中修改make_fbank.sh的权限
cd steps
chmod 755 make_fbank.sh
# 再次执行./run.sh，发现还有很多没有执行权限，咱们一个一个的修改即可，权限全部增加好后，运行./run.sh会出现以下错误：
Error!Both of the two phones-symbol tables are absent.
Please check your command
#这个ERROR的出现，原因是CVTE作者没有提供phones.txt, 不影响结果，忽略就好了

# 出现下面的信息说明正在运行
root@autodl-container-cbb611a852-a643331c:~/kaldi/egs/cvte/s5# ./run.sh
steps/make_fbank.sh --nj 1 --cmd run.pl data/fbank/test exp/make_fbank/test fbank/test
steps/make_fbank.sh: moving data/fbank/test/feats.scp to data/fbank/test/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/fbank/test
steps/make_fbank.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_fbank.sh: Succeeded creating filterbank features for test
steps/compute_cmvn_stats.sh data/fbank/test exp/fbank_cmvn/test fbank/test
Succeeded creating CMVN stats for test
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --nj 1 --num-threads 1 --cmd run.pl --mem 64G --iter final --frames-per-chunk 50 exp/chain/tdnn/graph data/fbank/test exp/chain/tdnn/decode_test
utils/lang/check_phones_compatible.sh: Error! Both of the two phones-symbol tables are absent.
Please check your command
grep: exp/chain/tdnn/phones.txt: No such file or directory
grep: exp/chain/tdnn/graph/phones.txt: No such file or directory
steps/nnet3/decode.sh: feature type is raw

#继续出现错误：decode.sh如果需要打分的话 需要自己定义local/score.sh
Not scoring because local/score.sh does not exist or not executable.

# 解决措施，我们从hkust中找它的local文件夹来用，复制到我们的cvte中来用,再将score.sh添加执行权限

# 最后执行./run.sh 成功！

+ steps/score_kaldi.sh --cmd 'run.pl --mem 64G' data/fbank/test exp/chain/tdnn/graph exp/chain/tdnn/decode_test
steps/score_kaldi.sh --cmd run.pl --mem 64G data/fbank/test exp/chain/tdnn/graph exp/chain/tdnn/decode_test
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
score confidence and timing with sclite
Decoding done.
root@autodl-cont

！！！=结果从~/kaldi/egs/cvte/s5/exp/chain/tdnn/decode_test/scoring_kaldi/penalty_1.0/log 文件中查看随便一个log==！！！
# 显示一下结果：



~/kaldi/egs/cvte/s5/exp/chain/tdnn/decode_test/scoring_kaldi/penalty_1.0/log# lattice-scale --inv-acoustic-scale=10 "ark:gunzip -c exp/chain/tdnn/decode_test/lat.*.gz|" ark:- | lattice-add-penalty --word-ins-penalty=1.0 ark:- ark:- | lattice-best-path --word-symbol-table=exp/chain/tdnn/graph/words.txt ark:- ark,t:- | utils/int2sym.pl -f 2- exp/chain/tdnn/graph/words.txt | cat > exp/chain/tdnn/decode_test/scoring_kaldi/penalty_1.0/10.txt
# Started at Sat Sep 30 15:03:26 CST 2023
#
lattice-add-penalty --word-ins-penalty=1.0 ark:- ark:-
lattice-scale --inv-acoustic-scale=10 'ark:gunzip -c exp/chain/tdnn/decode_test/lat.*.gz|' ark:-
lattice-best-path --word-symbol-table=exp/chain/tdnn/graph/words.txt ark:- ark,t:-
LOG (lattice-scale[5.5.1074~1-71f3]:main():lattice-scale.cc:107) Done 10 lattices.
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165722_1175, best cost 221.58 + -1062.05 = -840.466 over 452 frames.
CVTE201703_00030_165722_1175 据 楼主 老婆 说 楼主 昨天 家族 聚会 喝 多 了 回家 路上 大脑 和面 跟 电线杆 表白 了 一个 多 小时
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165740_2562, best cost 174.463 + -909.272 = -734.809 over 379 frames.
CVTE201703_00030_165740_2562 因为 没 捞 了 不少 我家 里 经常 来往 的 人 也 都是 搞 煤矿 的 基本上 现在 都 转行 了
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165754_5069, best cost 163.572 + -744.388 = -580.816 over 298 frames.
CVTE201703_00030_165754_5069 为啥 叫 皇上 呢 因为 那时候 凡是 公司 聚餐 行政 都 要 问 我 想 吃 什么
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165809_2685, best cost 166.568 + -850.329 = -683.761 over 303 frames.
CVTE201703_00030_165809_2685 一旦 有 什么 问题 手机 马上 就会 报警 然后 系统 自动 停机 等 解决 故障 之后 再开 机
LOG (lattice-add-penalty[5.5.1074~1-71f3]:main():lattice-add-penalty.cc:62) Done adding word insertion penalty to 10 lattices.
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165830_5107, best cost 175.114 + -658.104 = -482.99 over 260 frames.
CVTE201703_00030_165830_5107 首先 你 说 沈 大人 是 这个 就 不符合 按 答 组 的 情况 只能 去 做 天 猫
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165847_5561, best cost 146.185 + -661.788 = -515.602 over 247 frames.
CVTE201703_00030_165847_5561 还有 就是 几年 同学 不 联系 微信 问 在 不在 就让 你 帮忙 刷 好评
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165907_3088, best cost 169.244 + -791.803 = -622.559 over 307 frames.
CVTE201703_00030_165907_3088 读 硕 一般 只要 有 学校 录取通知书 签证 肯定 下来 申请 学校 还是 得 靠 你 自己 啊
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165916_7980, best cost 99.3545 + -444.942 = -345.587 over 183 frames.
CVTE201703_00030_165916_7980 我 认识 一个 叔叔 辈 从前 都是 老实巴交 的 好好先生
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165929_3456, best cost 150.684 + -765.872 = -615.188 over 290 frames.
CVTE201703_00030_165929_3456 这样 即使 有事 故 发生 冷却 系统 停止 工作 断电 这里 仍然 会 保持 在 零下
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:99) For utterance CVTE201703_00030_165942_5013, best cost 144.939 + -624.515 = -479.577 over 226 frames.
CVTE201703_00030_165942_5013 关于 还款 汇率 希望 大家 不要 被 误导 当然 这个 火鸡 的 答案 并不 对
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:124) Overall cost per frame is -2.00386 = 0.547268 [graph] + -2.55112 [acoustic] over 2945 frames.
LOG (lattice-best-path[5.5.1074~1-71f3]:main():lattice-best-path.cc:128) Done 10 lattices, failed for 0
# Accounting: time=0 threads=1
# Ended (code 0) at Sat Sep 30 15:03:26 CST 2023, elapsed time 0 seconds