kaldi环境配置与aishell实践

kaldi环境配置与实践

代码来源:kaldi-asr/kaldi: kaldi-asr/kaldi is the official location of the Kaldi project. (github.com)

官网文档:Kaldi: The build process (how Kaldi is compiled) (kaldi-asr.org)

一. 环境配置

Kaldi不是终端用户软件,也没有安装包。安装Kaldi包括编译其代码和准备必要的工具与运行环境。由于Kaldi的示例依赖于shell脚本和管道,最佳的运行环境是UNIX类系统,以Ubuntu为例。

1.安装依赖的系统开发库

在编译Kaldi之前,需要检查和安装Kaldi依赖的系统开发库,包括g++、LLVM、Clang、zlib、python、gawk、perl、wget、git、libtool等。具体依赖库可以通过在Kaldi的tools目录下执行extras/check_dependencies.sh脚本来检查。如果某些库未安装,脚本会给出提示,你只需根据提示安装缺失的库即可。特别需要注意的是,Kaldi默认使用Intel MKL作为线性代数库。如果没有安装,会有相应提醒。如果希望使用其他线性代数库(如ATLAS或OpenBLAS),可以忽略这个提示。

cd kaldi/tools/
mkdir -p python/ # check_dependencies.sh需要在python/目录下创建符号链接
extras/check_dependencies.sh

extras/check_dependencies.sh: python2.7 present
extras/check_dependencies.sh: python3 present
extras/check_dependencies.sh: Configuring python
extras/check_dependencies.sh: ... If you really want to avoid this, add an empty file /home/lnj524/module/kaldi/tools/python/.use_default_python and run this script again.
extras/check_dependencies.sh: ... python2.7 found, making it default (python, python2, python2.7)
extras/check_dependencies.sh: ... python3 found, making symlink (python3)
extras/check_dependencies.sh: all OK.

2. 安装依赖的第三方工具库

如果能够访问GitHub,ATLAS headers、OpenFst、SCTK、sph2pipe和CUB将会被自动下载、编译和安装。如果无法访问GitHub,则下载对应压缩包到tools目录下,并在tools/Makefile中调整相应库的版本号,然后执行make命令进行编译。压缩包链接:https://pan.baidu.com/s/1la0nmku2zBxa1lRKiUTWcw?pwd=flho

tools/Makefile中相应库的版本号位置如下:

# SHELL += -x

CXX ?= g++
CC ?= gcc        # used for sph2pipe
# CXX = clang++  # Uncomment these lines...
# CC = clang     # ...to build with Clang.

WGET ?= wget

OPENFST_VERSION ?= 1.7.2
CUB_VERSION ?= 1.8.0
# No '?=', since there exists only one version of sph2pipe.
SPH2PIPE_VERSION = 2.5
# SCTK official repo does not have version tags. Here's the mapping:
# 2.4.9 = 659bc36; 2.4.10 = d914e1b; 2.4.11 = 20159b5.
SCTK_GITHASH = 2.4.12
cd kaldi/tools/
make

3. 安装语音模型工具

IRSTLM / SRILM / Kaldi_lm 这是三个不同的语言模型工具,不同的示例使用不同的工具。

cd kaldi/tools/
extras/install_irstlm.sh 
extras/install_srilm.sh
extras/install_kaldi_lm.sh
踩坑记录

1.install_irstlm.sh

# extras/install_irstlm.sh
# 服务器无法访问github
# install_irstlm.sh中将https://github.com/irstlm-team/irstlm.git改为https://gitee.com/chengsili/irstlm.git
# 安装完成,加载环境变量
source ../tools/env.sh

2.extras/install_srilm.sh

# extras/install_srilm.sh
# 安装srilm之前先安装依赖lbfgs库
extras/install_liblbfgs.sh
# SRILM是商业软件,不是免费的,需要到SRILM网站上注册、接受许可协议才能下载。
# 下载压缩包https://pan.baidu.com/s/1la0nmku2zBxa1lRKiUTWcw?pwd=flho到tools/目录下
# 解压并改名
tar -xvzf srilm-1.7.3.tar.gz
mv srilm-1.7.3/ srilm/
#将下方shell内容复制粘贴到install_srilm.sh中,随后运行
extras/install_srilm.sh
# 安装完成,加载环境变量
source ../tools/env.sh
#!/usr/bin/env bash

current_path=`pwd`
current_dir=`basename "$current_path"`

if [ "tools" != "$current_dir" ]; then
    echo "You should run this script in tools/ directory!!"
    exit 1
fi

if [ ! -d liblbfgs-1.10 ]; then
    echo Installing libLBFGS library to support MaxEnt LMs
    bash extras/install_liblbfgs.sh || exit 1
fi

! command -v gawk > /dev/null && \
   echo "GNU awk is not installed so SRILM will probably not work correctly: refusing to install" && exit 1;

# Skip SRILM download and extraction
# if [ ! -f srilm.tgz ] && [ ! -f srilm.tar.gz ] && [ ! -d srilm ]; then
#   if [ $# -ne 4 ]; then
#       echo "SRILM download requires some information about you"
#       echo
#       echo "Usage: $0 <name> <organization> <email> <address>"
#       exit 1
#   fi

#   srilm_url="http://www.speech.sri.com/projects/srilm/srilm_download2.php"
#   post_data="file=1.7.3&name=$1&org=$2&email=$3&address=$4&license=on"

#   if ! wget --post-data "$post_data" -O ./srilm.tar.gz "$srilm_url"; then
#       echo 'There was a problem downloading the file.'
#       echo 'Check your internet connection and try again.'
#       exit 1
#   fi

#   if [ ! -s srilm.tar.gz ]; then
#       echo 'The file is empty. There was a problem downloading the file.'
#       exit 1
#   fi
# fi

mkdir -p srilm
cd srilm

# Skip extraction of SRILM package
# if [ -f ../srilm.tgz ]; then
#     tar -xvzf ../srilm.tgz || exit 1 # Old SRILM format
# elif [ -f ../srilm.tar.gz ]; then
#     tar -xvzf ../srilm.tar.gz || exit 1 # Changed format type from tgz to tar.gz
# fi

# Ensure SRILM directory exists
# if [ ! -d srilm ]; then
#     echo 'The SRILM directory does not exist. There was a problem with the extraction.'
#     exit 1
# fi

# if [ ! -f RELEASE ]; then
#     echo 'The file RELEASE does not exist. There was a problem extracting.'
#     exit 1
# fi

major=`gawk -F. '{ print $1 }' RELEASE`
minor=`gawk -F. '{ print $2 }' RELEASE`
micro=`gawk -F. '{ print $3 }' RELEASE`

if [ $major -le 1 ] && [ $minor -le 7 ] && [ $micro -le 1 ]; then
  echo "Detected version 1.7.1 or earlier. Applying patch."
  patch -p0 < ../extras/srilm.patch
fi

# set the SRILM variable in the top-level Makefile to this directory.
cp Makefile tmpf

cat tmpf | gawk -v pwd=`pwd` '/SRILM =/{printf("SRILM = %s\n", pwd); next;} {print;}' \
  > Makefile || exit 1
rm tmpf

mtype=`sbin/machine-type`

echo HAVE_LIBLBFGS=1 >> common/Makefile.machine.$mtype
grep ADDITIONAL_INCLUDES common/Makefile.machine.$mtype | \
    sed 's|$| -I$(SRILM)/../liblbfgs-1.10/include|' \
    >> common/Makefile.machine.$mtype

grep ADDITIONAL_LDFLAGS common/Makefile.machine.$mtype | \
    sed 's|$| -L$(SRILM)/../liblbfgs-1.10/lib/ -Wl,-rpath -Wl,$(SRILM)/../liblbfgs-1.10/lib/|' \
    >> common/Makefile.machine.$mtype

make || exit

cd ..
(
  [ ! -z "${SRILM}" ] && \
    echo >&2 "SRILM variable is aleady defined. Undefining..." && \
    unset SRILM

  [ -f ./env.sh ] && . ./env.sh

  [ ! -z "${SRILM}" ] && \
    echo >&2 "SRILM config is already in env.sh" && exit

  wd=`pwd`
  wd=`readlink -f $wd || pwd`

  echo "export SRILM=$wd/srilm"
  dirs="\${PATH}"
  for directory in $(cd srilm && find bin -type d ) ; do
    dirs="$dirs:\${SRILM}/$directory"
  done
  echo "export PATH=$dirs"
) >> env.sh

echo >&2 "Installation of SRILM finished successfully"
echo >&2 "Please source the tools/env.sh in your path.sh to enable it"

3.install_kaldi_lm.sh

# extras/install_kaldi_lm.sh
https://github.com/danpovey/kaldi_lm.git

# 服务器无法访问github
# install_kaldi_lm.sh中将https://github.com/danpovey/kaldi_lm.git改为https://gitee.com/chengsili/kaldi_lm.git
# 安装完成,加载环境变量
source ../tools/env.sh

4. 安装线性代数库

OpenBLAS / MKL / ATLAS / CLAPACK 任选一个就行,我安装的是OpenBLAS

# 假如你可以访问github
cd kaldi/tools/
extras/install_openblas.sh
# 假如你不可访问github
# 根据install_openblas.sh文件可得,下载OPENBLAS_VERSION=0.3.13,手动编译
# 压缩包链接:https://pan.baidu.com/s/1la0nmku2zBxa1lRKiUTWcw?pwd=flho
# 将压缩包下载到tools目录下
cd kaldi/tools/
tar xzf OpenBLAS-0.3.13.tar.gz
cd OpenBLAS-0.3.13
make

5.安装CUDA

具体方法参考:Espent环境配置与实践-CSDN博客

本次我使用的cuda同espent使用的版本一致

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
# 创建conda虚拟环境
conda create -n kaldi python=3.8
conda activate kaldi
# 安装Pytorch
conda install cudatoolkit=11.6 -c pytorch -c conda-forge
# 安装cudnn
conda install cudnn

6.编译Kaldi代码

cd kaldi/src
./configure --help

-------------------------------------------------------------------------------------
Configuration options:
  --help                Display this help message and exit
  --version             Display the version of 'configure' and exit
  --static              Build and link against static libraries [default=no]
  --shared              Build and link against shared libraries [default=no]
  --use-cuda            Build with CUDA [default=yes]
  --with-cudadecoder    Build with CUDA decoder [default=yes]
  --cudatk-dir=DIR      CUDA toolkit directory
  --cuda-arch=FLAGS     Override the default CUDA_ARCH flags. See:
         https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#nvcc-examples.
  --use-rocm            Build with ROCm
  --rocm-dir=DIR        ROCM directory
  --rocm-targets=TGTS   Comma separated list of GPU targets to target through ROCm
  --debug-level=N       Use assertion level 0 (disabled), 1, or 2 [default=1]
  --double-precision    Build with BaseFloat set to double if yes [default=no],
                        mostly useful for testing purposes.
  --static-fst          Build with static OpenFst libraries [default=no]
  --fst-root=DIR        OpenFst root directory [default=../tools/openfst/]
  --fst-version=STR     OpenFst version string
  --mathlib=LIB         Math library [default=MKL|OPENBLAS, based on platform]
                        Supported libraries: ATLAS, MKL, CLAPACK, OPENBLAS.
  --static-math         Build with static math libraries [default=no]
  --atlas-root=DIR      ATLAS root directory [default=../tools/ATLAS/]
  --openblas-root=DIR   OpenBLAS root directory
  --clapack-root=DIR    CLAPACK root directory
  --mkl-root=DIR        MKL root directory
  --mkl-libdir=DIR      MKL library directory
  --omp-libdir=DIR      OpenMP directory
  --speex-root=DIR      SPEEX root directory
  --speex-libdir=DIR    SPEEX library directory
  --speex-incdir=DIR    SPEEX include directory
  --static-speex        Build with static speex libraries [default=no]
  --host=HOST           Host triple in the format 'cpu-vendor-os'
                        If provided, it is prepended to all toolchain programs.
  --android-incdir=DIR  Android include directory
  --enable-kenlm        enable kenlm runtime library installed in "../tools/kenlm"
                        via extras/install_kenlm_query_only.sh, [default=no]
  --enable-tcmalloc     enable tcmalloc library installed in "../tools/gperftools"
                        via extras/install_tcmalloc.sh, [default=no]

完整配置:

# 注意--cudatk-dir和--openblas-root改为你自己的地址
#  --cuda-arch=-arch=sm_86需要好你的GPU版本对应
./configure --shared --use-cuda=yes --cudatk-dir=/home/lnj524/module/cuda/cuda-11.6/ --cuda-arch=-arch=sm_86 --mathlib=OPENBLAS --openblas-root=/home/lnj524/module/kaldi/tools/OpenBLAS-0.3.13/install/

编译:

make depend -j 8 # 处理 Kaldi 项目中所有的依赖关系,-j 8 参数可以让 make 使用 8 个并行线程加快这个过程。
make -j 8 # 开始编译 Kaldi

二.Aishell数据集

cd ../kaldi/egs/aishell
cat README.txt 

Aishell is an open Chinese Mandarin speech database published by Beijing Shell Shell Technology Co.,Ltd.

400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. The corpus contains 170 hours of speech, and is devided into training(85%), developement(10%) and testing(5%) sets. The developement set is used to tune the hyperparameters in training.

The database can be downloaded from openslr:
http://www.openslr.org/33/

This folder contains two subfolders:
s5: a speech recognition recipe  # 语音识别recipe
v1: a speaker recognition recipe # 说话人识别recipe

For more details, please visit:
http://www.aishelltech.com/kysjcp

下文将采用的是s5:语音识别recipe

1.准备数据

cd /kaldi/egs/aishell/s5

# 1.下载并解压数据,直接下载从www.openslr.org/resources/33下载可能较慢
local/download_and_untar.sh aishell数据集的存放地址 www.openslr.org/resources/33 data_aishell
local/download_and_untar.sh  aishell数据集的存放地址 www.openslr.org/resources/33 resource_aishell

# 2.准备词典
local/aishell_prepare_dict.sh /home/lnj524/module/data/opensource_data/aishell/resource_aishell

local/aishell_prepare_dict.sh: AISHELL dict preparation succeeded

# 3.数据准备
local/aishell_data_prep.sh /home/lnj524/module/data/opensource_data/aishell/data_aishell/wav /home/lnj524/module/data/opensource_data/aishell/data_aishell/transcript

Preparing data/local/train transcriptions
Preparing data/local/dev transcriptions
Preparing data/local/test transcriptions
local/aishell_data_prep.sh: AISHELL data preparation succeeded

# 4.创建一个包含语言模型和词典的语言目录
# --position-dependent-phones false:指定是否使用与位置相关的音素。
# data/local/dict:包含字典的目录,字典包含词到音素的映射。
# "<SPOKEN_NOISE>":用于表示未在字典中定义的噪声或其他非词汇性项。
# data/local/lang:中间输出目录。
# data/lang:最终的语言目录输出
utils/prepare_lang.sh --position-dependent-phones false data/local/dict \
    "<SPOKEN_NOISE>" data/local/lang data/lang
    
data/lang/
│
├── L.fst                # 语言模型的 FST 文件
├── L_disambig.fst       # 包含处理音素歧义的 FST 文件
├── oov.int              # 未见词(OOV)标记的整数映射文件
├── oov.txt              # 未见词(OOV)标记的文本表示文件
├── phones/              # 包含音素映射文件文件的目录
├── phones.txt           # 音素列表文件,包含音素及其索引
├── topo                 # 声学模型的拓扑结构文件
└── words.txt            # 词汇表文件,包含每个词及其对应的索引

utils/prepare_lang.sh --position-dependent-phones false data/local/dict <SPOKEN_NOISE> data/local/lang data/lang
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.

Checking data/local/dict/lexicon.txt
--> reading data/local/dict/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/lexicon.txt is OK

Checking data/local/dict/extra_questions.txt ...
--> reading data/local/dict/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory data/local/dict]

**Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt
fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int 
prepare_lang.sh: validating output directory
utils/validate_lang.pl data/lang
Checking existence of separator file
separator file data/lang/subword_separator.txt is empty or does not exist, deal in word case.
Checking data/lang/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang/phones.txt is OK

Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt

Checking data/lang/phones/context_indep.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{txt, int, csl} are OK

Checking data/lang/phones/nonsilence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 216 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang/phones/silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.{txt, int, csl} are OK

Checking data/lang/phones/optional_silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang/phones/disambig.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 105 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{txt, int, csl} are OK

Checking data/lang/phones/roots.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 67 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{txt, int} are OK

Checking data/lang/phones/sets.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 67 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{txt, int} are OK

Checking data/lang/phones/extra_questions.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 7 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{txt, int} are OK

Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK

Checking topo ...

Checking word-level disambiguation symbols...
--> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking data/lang/oov.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{txt, int} are OK

--> data/lang/L.fst is olabel sorted
--> data/lang/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang]

2.语言模型(LM)训练

local/aishell_train_lms.sh

Getting raw N-gram counts
discount_ngrams: for n-gram order 1, D=0.000000, tau=0.000000 phi=1.000000
discount_ngrams: for n-gram order 2, D=0.000000, tau=0.000000 phi=1.000000
discount_ngrams: for n-gram order 3, D=1.000000, tau=0.000000 phi=1.000000
Iteration 1/6 of optimizing discounting parameters
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.675000 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.675000 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=0.825000 phi=2.000000
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.900000 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.900000 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.100000 phi=2.000000
discount_ngrams: for n-gram order 1, D=0.600000, tau=1.215000 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=1.215000 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.485000 phi=2.000000
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
Perplexity over 99496.000000 words is 573.088187
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 573.088187

real    0m2.418s
user    0m3.290s
sys     0m0.050s
Perplexity over 99496.000000 words is 571.860357
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 571.860357

real    0m2.576s
user    0m3.422s
sys     0m0.113s
Perplexity over 99496.000000 words is 571.430399
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 571.430399

real    0m2.640s
user    0m3.478s
sys     0m0.162s
Projected perplexity change from setting alpha=-0.413521475380432 is 571.860357->571.350704659834, reduction of 0.509652340166213
Alpha value on iter 1 is -0.413521475380432
Iteration 2/6 of optimizing discounting parameters
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=0.483845 phi=2.000000
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=0.645126 phi=2.000000
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=0.870921 phi=2.000000
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
Perplexity over 99496.000000 words is 570.909914
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 570.909914

real    0m2.463s
user    0m3.285s
sys     0m0.124s
Perplexity over 99496.000000 words is 570.548231
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 570.548231

real    0m2.475s
user    0m3.261s
sys     0m0.149s
Perplexity over 99496.000000 words is 570.209333
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 570.209333

real    0m2.482s
user    0m3.262s
sys     0m0.149s
optimize_alpha.pl: alpha=0.782133003937562 is too positive, limiting it to 0.7
Projected perplexity change from setting alpha=0.7 is 570.548231->570.0658029, reduction of 0.482428099999765
Alpha value on iter 2 is 0.7
Iteration 3/6 of optimizing discounting parameters
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.750000
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=2.000000
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=2.350000
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
Perplexity over 99496.000000 words is 570.074175
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 570.074175
Perplexity over 99496.000000 words is 570.135232
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 570.135232

real    0m2.451s
user    0m3.219s
sys     0m0.155s

real    0m2.452s
user    0m3.264s
sys     0m0.118s
Perplexity over 99496.000000 words is 570.070852
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 570.070852

real    0m2.474s
user    0m3.294s
sys     0m0.107s
Projected perplexity change from setting alpha=-0.149743638839048 is 570.074175->570.068152268062, reduction of 0.00602273193794645
Alpha value on iter 3 is -0.149743638839048
Iteration 4/6 of optimizing discounting parameters
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=1.080000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
Perplexity over 99496.000000 words is 651.559076
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 651.559076

real    0m1.772s
user    0m2.258s
sys     0m0.090s
Perplexity over 99496.000000 words is 571.811721
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 571.811721

real    0m2.486s
user    0m3.296s
sys     0m0.127s
Perplexity over 99496.000000 words is 570.079098
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 570.079098

real    0m2.490s
user    0m3.365s
sys     0m0.081s
Projected perplexity change from setting alpha=-0.116327143544381 is 570.079098->564.672375993263, reduction of 5.40672200673657
Alpha value on iter 4 is -0.116327143544381
Iteration 5/6 of optimizing discounting parameters
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.706938, tau=0.395873 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.706938, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.706938, tau=0.712571 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
Perplexity over 99496.000000 words is 567.231151
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 567.231151

real    0m2.433s
user    0m3.244s
sys     0m0.126s
Perplexity over 99496.000000 words is 567.980179
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 567.980179

real    0m2.451s
user    0m3.254s
sys     0m0.121s
Perplexity over 99496.000000 words is 567.407206
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 567.407206

real    0m2.467s
user    0m3.270s
sys     0m0.126s
Projected perplexity change from setting alpha=0.259356959958262 is 567.407206->567.206654822021, reduction of 0.20055117797915
Alpha value on iter 5 is 0.259356959958262
Iteration 6/6 of optimizing discounting parameters
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.706938, tau=0.664727 phi=1.750000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.706938, tau=0.664727 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.706938, tau=0.664727 phi=2.350000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
Perplexity over 99496.000000 words is 567.478625
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 567.478625

real    0m2.444s
user    0m3.213s
sys     0m0.144s
Perplexity over 99496.000000 words is 567.346876
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 567.346876

real    0m2.467s
user    0m3.284s
sys     0m0.126s
Perplexity over 99496.000000 words is 567.181130
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 567.181130

real    0m2.478s
user    0m3.335s
sys     0m0.089s
optimize_alpha.pl: alpha=2.83365708509299 is too positive, limiting it to 0.7
Projected perplexity change from setting alpha=0.7 is 567.346876->567.0372037, reduction of 0.309672299999761
Alpha value on iter 6 is 0.7
Final config is:
D=0.6 tau=0.527830672157611 phi=2
D=0.706938285164495 tau=0.664727230661135 phi=2.7
D=0 tau=1.09671484103859 phi=1.85025636116095
Discounting N-grams.
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.527831 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.706938, tau=0.664727 phi=2.700000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.096715 phi=1.850256
Computing final perplexity
Building ARPA LM (perplexity computation is in background)
interpolate_ngrams: 137074 words in wordslist
interpolate_ngrams: 137074 words in wordslist
Perplexity over 99496.000000 words is 567.320537
Perplexity over 99496.000000 words (excluding 0.000000 OOVs) is 567.320537
567.320537
Done training LM of type 3gram-mincount

3.语言模型的格式化

使用 utils/format_lm.sh 将语言模型(LM)转换为所需的 FST 格式,并生成 data/lang_test

utils/format_lm.sh data/lang data/local/lm/3gram-mincount/lm_unpruned.gz \
    data/local/dict/lexicon.txt data/lang_test

Converting 'data/local/lm/3gram-mincount/lm_unpruned.gz' to FST
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test/words.txt - data/lang_test/G.fst 
LOG (arpa2fst[5.5.1148~2-122a3]:Read():arpa-file-parser.cc:94) Reading \data\ section.
LOG (arpa2fst[5.5.1148~2-122a3]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.
LOG (arpa2fst[5.5.1148~2-122a3]:Read():arpa-file-parser.cc:149) Reading \2-grams: section.
LOG (arpa2fst[5.5.1148~2-122a3]:Read():arpa-file-parser.cc:149) Reading \3-grams: section.
LOG (arpa2fst[5.5.1148~2-122a3]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 561655 to 102646
fstisstochastic data/lang_test/G.fst 
8.84583e-06 -0.56498
Succeeded in formatting LM: 'data/local/lm/3gram-mincount/lm_unpruned.gz'

4.生成 MFCC 和音高特征

# 单独运行run.sh中mfcc的部分
mfccdir=mfcc
for x in train dev test; do
  steps/make_mfcc_pitch.sh --cmd "$train_cmd" --nj 10 data/$x exp/make_mfcc/$x $mfccdir || exit 1;
  steps/compute_cmvn_stats.sh data/$x exp/make_mfcc/$x $mfccdir || exit 1;
  utils/fix_data_dir.sh data/$x || exit 1;
done

steps/make_mfcc_pitch.sh --cmd run.pl --mem 16G --nj 10 data/train exp/make_mfcc/train mfcc
utils/validate_data_dir.sh: Successfully validated data-directory data/train
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc_pitch.sh: Succeeded creating MFCC and pitch features for train
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
Succeeded creating CMVN stats for train
fix_data_dir.sh: kept all 120098 utterances.
fix_data_dir.sh: old files are kept in data/train/.backup
steps/make_mfcc_pitch.sh --cmd run.pl --mem 16G --nj 10 data/dev exp/make_mfcc/dev mfcc
utils/validate_data_dir.sh: Successfully validated data-directory data/dev
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc_pitch.sh: Succeeded creating MFCC and pitch features for dev
steps/compute_cmvn_stats.sh data/dev exp/make_mfcc/dev mfcc
Succeeded creating CMVN stats for dev
fix_data_dir.sh: kept all 14326 utterances.
fix_data_dir.sh: old files are kept in data/dev/.backup
steps/make_mfcc_pitch.sh --cmd run.pl --mem 16G --nj 10 data/test exp/make_mfcc/test mfcc
utils/validate_data_dir.sh: Successfully validated data-directory data/test
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc_pitch.sh: Succeeded creating MFCC and pitch features for test
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
Succeeded creating CMVN stats for test
fix_data_dir.sh: kept all 7176 utterances.
fix_data_dir.sh: old files are kept in data/test/.backup

5.单音素模型

# 注释run.sh其他内容,运行./run.sh
# Train a monophone model on delta features.
steps/train_mono.sh --cmd "$train_cmd" --nj 10 \
  data/train data/lang exp/mono || exit 1;

# Decode with the monophone model.
utils/mkgraph.sh data/lang_test exp/mono exp/mono/graph || exit 1;
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
  exp/mono/graph data/dev exp/mono/decode_dev
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
  exp/mono/graph data/test exp/mono/decode_test

# Get alignments from monophone system.
steps/align_si.sh --cmd "$train_cmd" --nj 10 \
  data/train data/lang exp/mono exp/mono_ali || exit 1;
  
exp/mono/log:训练阶段的日志。
exp/mono/decode_dev/log 和 exp/mono/decode_test/log:训练集和测试集解码阶段的日志。
exp/mono_ali/log:对齐阶段的日志。

6.三音素模型(Tri1)

# 注释run.sh其他内容,运行./run.sh
# Train the first triphone pass model tri1 on delta + delta-delta features.
steps/train_deltas.sh --cmd "$train_cmd" \
 2500 20000 data/train data/lang exp/mono_ali exp/tri1 || exit 1;

# decode tri1
utils/mkgraph.sh data/lang_test exp/tri1 exp/tri1/graph || exit 1;
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
  exp/tri1/graph data/dev exp/tri1/decode_dev
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
  exp/tri1/graph data/test exp/tri1/decode_test

# align tri1
steps/align_si.sh --cmd "$train_cmd" --nj 10 \
  data/train data/lang exp/tri1 exp/tri1_ali || exit 1;
  
steps/train_deltas.sh --cmd run.pl --mem 16G 2500 20000 data/train data/lang exp/mono_ali exp/tri1
steps/train_deltas.sh: accumulating tree stats
steps/train_deltas.sh: getting questions for tree-building, via clustering
steps/train_deltas.sh: building the tree
steps/train_deltas.sh: converting alignments from exp/mono_ali to use current tree
steps/train_deltas.sh: compiling graphs of transcripts
steps/train_deltas.sh: training pass 1
steps/train_deltas.sh: training pass 2
steps/train_deltas.sh: training pass 3
steps/train_deltas.sh: training pass 4
steps/train_deltas.sh: training pass 5
steps/train_deltas.sh: training pass 6
steps/train_deltas.sh: training pass 7
steps/train_deltas.sh: training pass 8
steps/train_deltas.sh: training pass 9
steps/train_deltas.sh: training pass 10
steps/train_deltas.sh: aligning data
steps/train_deltas.sh: training pass 11
steps/train_deltas.sh: training pass 12
steps/train_deltas.sh: training pass 13
steps/train_deltas.sh: training pass 14
steps/train_deltas.sh: training pass 15
steps/train_deltas.sh: training pass 16
steps/train_deltas.sh: training pass 17
steps/train_deltas.sh: training pass 18
steps/train_deltas.sh: training pass 19
steps/train_deltas.sh: training pass 20
steps/train_deltas.sh: aligning data
steps/train_deltas.sh: training pass 21
steps/train_deltas.sh: training pass 22
steps/train_deltas.sh: training pass 23
steps/train_deltas.sh: training pass 24
steps/train_deltas.sh: training pass 25
steps/train_deltas.sh: training pass 26
steps/train_deltas.sh: training pass 27
steps/train_deltas.sh: training pass 28
steps/train_deltas.sh: training pass 29
steps/train_deltas.sh: training pass 30
steps/train_deltas.sh: aligning data
steps/train_deltas.sh: training pass 31
steps/train_deltas.sh: training pass 32
steps/train_deltas.sh: training pass 33
steps/train_deltas.sh: training pass 34
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri1
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri1/log/analyze_alignments.log
1 warnings in exp/tri1/log/build_tree.log
1 warnings in exp/tri1/log/compile_questions.log
2850 warnings in exp/tri1/log/align.*.*.log
862 warnings in exp/tri1/log/acc.*.*.log
exp/tri1: nj=10 align prob=-79.45 over 150.17h [retry=0.5%, fail=0.0%] states=2064 gauss=20052 tree-impr=4.48
steps/train_deltas.sh: Done training system with delta+delta-delta features in exp/tri1
tree-info exp/tri1/tree 
tree-info exp/tri1/tree 
fstcomposecontext --context-size=3 --central-position=1 --read-disambig-syms=data/lang_test/phones/disambig.int --write-disambig-syms=data/lang_test/tmp/disambig_ilabels_3_1.int data/lang_test/tmp/ilabels_3_1.3377139 data/lang_test/tmp/LG.fst 
fstisstochastic data/lang_test/tmp/CLG_3_1.fst 
0 -0.0666824
[info]: CLG not stochastic.
make-h-transducer --disambig-syms-out=exp/tri1/graph/disambig_tid.int --transition-scale=1.0 data/lang_test/tmp/ilabels_3_1 exp/tri1/tree exp/tri1/final.mdl 
fsttablecompose exp/tri1/graph/Ha.fst data/lang_test/tmp/CLG_3_1.fst 
fstminimizeencoded 
fstrmsymbols exp/tri1/graph/disambig_tid.int 
fstrmepslocal 
fstdeterminizestar --use-log=true 
fstisstochastic exp/tri1/graph/HCLGa.fst 
0.000487613 -0.178947
HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true exp/tri1/final.mdl exp/tri1/graph/HCLGa.fst 
steps/decode.sh --cmd run.pl --mem 16G --config conf/decode.config --nj 10 exp/tri1/graph data/dev exp/tri1/decode_dev
decode.sh: feature type is delta
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri1/graph exp/tri1/decode_dev
steps/diagnostic/analyze_lats.sh: see stats in exp/tri1/decode_dev/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,4,30) and mean=11.9
steps/diagnostic/analyze_lats.sh: see stats in exp/tri1/decode_dev/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/dev exp/tri1/graph exp/tri1/decode_dev
steps/score_kaldi.sh --cmd run.pl --mem 16G data/dev exp/tri1/graph exp/tri1/decode_dev
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/dev exp/tri1/graph exp/tri1/decode_dev
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/dev exp/tri1/graph exp/tri1/decode_dev
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/decode.sh --cmd run.pl --mem 16G --config conf/decode.config --nj 10 exp/tri1/graph data/test exp/tri1/decode_test
decode.sh: feature type is delta
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri1/graph exp/tri1/decode_test
steps/diagnostic/analyze_lats.sh: see stats in exp/tri1/decode_test/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,4,40) and mean=15.3
steps/diagnostic/analyze_lats.sh: see stats in exp/tri1/decode_test/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/test exp/tri1/graph exp/tri1/decode_test
steps/score_kaldi.sh --cmd run.pl --mem 16G data/test exp/tri1/graph exp/tri1/decode_test
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/test exp/tri1/graph exp/tri1/decode_test
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/test exp/tri1/graph exp/tri1/decode_test
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/align_si.sh --cmd run.pl --mem 16G --nj 10 data/train data/lang exp/tri1 exp/tri1_ali
steps/align_si.sh: feature type is delta
steps/align_si.sh: aligning data in data/train using model from exp/tri1, putting alignments in exp/tri1_ali
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri1_ali
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri1_ali/log/analyze_alignments.log
steps/align_si.sh: done aligning data.
%WER 36.54 [ 38281 / 104765, 828 ins, 3174 del, 34279 sub ] exp/mono/decode_test/cer_10_0.0
%WER 18.80 [ 19701 / 104765, 881 ins, 1277 del, 17543 sub ] exp/tri1/decode_test/cer_14_0.5

7.更高级的三音素模型(Tri2)

# train tri2 [delta+delta-deltas]
steps/train_deltas.sh --cmd "$train_cmd" \
 2500 20000 data/train data/lang exp/tri1_ali exp/tri2 || exit 1;

# decode tri2
utils/mkgraph.sh data/lang_test exp/tri2 exp/tri2/graph
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
  exp/tri2/graph data/dev exp/tri2/decode_dev
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
  exp/tri2/graph data/test exp/tri2/decode_test

# Align training data with the tri2 model.
steps/align_si.sh --cmd "$train_cmd" --nj 10 \
  data/train data/lang exp/tri2 exp/tri2_ali || exit 1;

steps/train_deltas.sh --cmd run.pl --mem 16G 2500 20000 data/train data/lang exp/tri1_ali exp/tri2
steps/train_deltas.sh: accumulating tree stats
steps/train_deltas.sh: getting questions for tree-building, via clustering
steps/train_deltas.sh: building the tree
steps/train_deltas.sh: converting alignments from exp/tri1_ali to use current tree
steps/train_deltas.sh: compiling graphs of transcripts
steps/train_deltas.sh: training pass 1
steps/train_deltas.sh: training pass 2
steps/train_deltas.sh: training pass 3
steps/train_deltas.sh: training pass 4
steps/train_deltas.sh: training pass 5
steps/train_deltas.sh: training pass 6
steps/train_deltas.sh: training pass 7
steps/train_deltas.sh: training pass 8
steps/train_deltas.sh: training pass 9
steps/train_deltas.sh: training pass 10
steps/train_deltas.sh: aligning data
steps/train_deltas.sh: training pass 11
steps/train_deltas.sh: training pass 12
steps/train_deltas.sh: training pass 13
steps/train_deltas.sh: training pass 14
steps/train_deltas.sh: training pass 15
steps/train_deltas.sh: training pass 16
steps/train_deltas.sh: training pass 17
steps/train_deltas.sh: training pass 18
steps/train_deltas.sh: training pass 19
steps/train_deltas.sh: training pass 20
steps/train_deltas.sh: aligning data
steps/train_deltas.sh: training pass 21
steps/train_deltas.sh: training pass 22
steps/train_deltas.sh: training pass 23
steps/train_deltas.sh: training pass 24
steps/train_deltas.sh: training pass 25
steps/train_deltas.sh: training pass 26
steps/train_deltas.sh: training pass 27
steps/train_deltas.sh: training pass 28
steps/train_deltas.sh: training pass 29
steps/train_deltas.sh: training pass 30
steps/train_deltas.sh: aligning data
steps/train_deltas.sh: training pass 31
steps/train_deltas.sh: training pass 32
steps/train_deltas.sh: training pass 33
steps/train_deltas.sh: training pass 34
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri2
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri2/log/analyze_alignments.log
2155 warnings in exp/tri2/log/align.*.*.log
515 warnings in exp/tri2/log/acc.*.*.log
1 warnings in exp/tri2/log/build_tree.log
1 warnings in exp/tri2/log/compile_questions.log
exp/tri2: nj=10 align prob=-79.42 over 150.18h [retry=0.4%, fail=0.0%] states=2128 gauss=20034 tree-impr=4.82
steps/train_deltas.sh: Done training system with delta+delta-delta features in exp/tri2
tree-info exp/tri2/tree 
tree-info exp/tri2/tree 
make-h-transducer --disambig-syms-out=exp/tri2/graph/disambig_tid.int --transition-scale=1.0 data/lang_test/tmp/ilabels_3_1 exp/tri2/tree exp/tri2/final.mdl 
fstrmsymbols exp/tri2/graph/disambig_tid.int 
fsttablecompose exp/tri2/graph/Ha.fst data/lang_test/tmp/CLG_3_1.fst 
fstminimizeencoded 
fstdeterminizestar --use-log=true 
fstrmepslocal 
fstisstochastic exp/tri2/graph/HCLGa.fst 
0.000486833 -0.178947
HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true exp/tri2/final.mdl exp/tri2/graph/HCLGa.fst 
steps/decode.sh --cmd run.pl --mem 16G --config conf/decode.config --nj 10 exp/tri2/graph data/dev exp/tri2/decode_dev
decode.sh: feature type is delta
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri2/graph exp/tri2/decode_dev
steps/diagnostic/analyze_lats.sh: see stats in exp/tri2/decode_dev/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,4,30) and mean=11.7
steps/diagnostic/analyze_lats.sh: see stats in exp/tri2/decode_dev/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/dev exp/tri2/graph exp/tri2/decode_dev
steps/score_kaldi.sh --cmd run.pl --mem 16G data/dev exp/tri2/graph exp/tri2/decode_dev
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/dev exp/tri2/graph exp/tri2/decode_dev
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/dev exp/tri2/graph exp/tri2/decode_dev
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/decode.sh --cmd run.pl --mem 16G --config conf/decode.config --nj 10 exp/tri2/graph data/test exp/tri2/decode_test
decode.sh: feature type is delta
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri2/graph exp/tri2/decode_test
steps/diagnostic/analyze_lats.sh: see stats in exp/tri2/decode_test/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,4,39) and mean=15.3
steps/diagnostic/analyze_lats.sh: see stats in exp/tri2/decode_test/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/test exp/tri2/graph exp/tri2/decode_test
steps/score_kaldi.sh --cmd run.pl --mem 16G data/test exp/tri2/graph exp/tri2/decode_test
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/test exp/tri2/graph exp/tri2/decode_test
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/test exp/tri2/graph exp/tri2/decode_test
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/align_si.sh --cmd run.pl --mem 16G --nj 10 data/train data/lang exp/tri2 exp/tri2_ali
steps/align_si.sh: feature type is delta
steps/align_si.sh: aligning data in data/train using model from exp/tri2, putting alignments in exp/tri2_ali
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri2_ali
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri2_ali/log/analyze_alignments.log
steps/align_si.sh: done aligning data.
%WER 36.54 [ 38281 / 104765, 828 ins, 3174 del, 34279 sub ] exp/mono/decode_test/cer_10_0.0
%WER 18.80 [ 19701 / 104765, 881 ins, 1277 del, 17543 sub ] exp/tri1/decode_test/cer_14_0.5
%WER 18.64 [ 19532 / 104765, 949 ins, 1159 del, 17424 sub ] exp/tri2/decode_test/cer_14_0.5

8.Tri3a 模型

# Train the second triphone pass model tri3a on LDA+MLLT features.
steps/train_lda_mllt.sh --cmd "$train_cmd" \
 2500 20000 data/train data/lang exp/tri2_ali exp/tri3a || exit 1;
 
# Run a test decode with the tri3a model.
utils/mkgraph.sh data/lang_test exp/tri3a exp/tri3a/graph || exit 1;
steps/decode.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
  exp/tri3a/graph data/dev exp/tri3a/decode_dev
steps/decode.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
  exp/tri3a/graph data/test exp/tri3a/decode_test

# align tri3a with fMLLR

steps/align_fmllr.sh --cmd "$train_cmd" --nj 10 \
  data/train data/lang exp/tri3a exp/tri3a_ali || exit 1;
  
steps/train_lda_mllt.sh --cmd run.pl --mem 16G 2500 20000 data/train data/lang exp/tri2_ali exp/tri3a
steps/train_lda_mllt.sh: Accumulating LDA statistics.
steps/train_lda_mllt.sh: Accumulating tree stats
steps/train_lda_mllt.sh: Getting questions for tree clustering.
steps/train_lda_mllt.sh: Building the tree
steps/train_lda_mllt.sh: Initializing the model
steps/train_lda_mllt.sh: Converting alignments from exp/tri2_ali to use current tree
steps/train_lda_mllt.sh: Compiling graphs of transcripts
Training pass 1
Training pass 2
steps/train_lda_mllt.sh: Estimating MLLT
Training pass 3
Training pass 4
steps/train_lda_mllt.sh: Estimating MLLT
Training pass 5
Training pass 6
steps/train_lda_mllt.sh: Estimating MLLT
Training pass 7
Training pass 8
Training pass 9
Training pass 10
Aligning data
Training pass 11
Training pass 12
steps/train_lda_mllt.sh: Estimating MLLT
Training pass 13
Training pass 14
Training pass 15
Training pass 16
Training pass 17
Training pass 18
Training pass 19
Training pass 20
Aligning data
Training pass 21
Training pass 22
Training pass 23
Training pass 24
Training pass 25
Training pass 26
Training pass 27
Training pass 28
Training pass 29
Training pass 30
Aligning data
Training pass 31
Training pass 32
Training pass 33
Training pass 34
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri3a
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri3a/log/analyze_alignments.log
312 warnings in exp/tri3a/log/acc.*.*.log
1343 warnings in exp/tri3a/log/align.*.*.log
8 warnings in exp/tri3a/log/lda_acc.*.log
1 warnings in exp/tri3a/log/compile_questions.log
1 warnings in exp/tri3a/log/build_tree.log
exp/tri3a: nj=10 align prob=-48.77 over 150.18h [retry=0.3%, fail=0.0%] states=2136 gauss=20033 tree-impr=5.06 lda-sum=24.60 mllt:impr,logdet=0.96,1.43
steps/train_lda_mllt.sh: Done training system with LDA+MLLT features in exp/tri3a
tree-info exp/tri3a/tree 
tree-info exp/tri3a/tree 
make-h-transducer --disambig-syms-out=exp/tri3a/graph/disambig_tid.int --transition-scale=1.0 data/lang_test/tmp/ilabels_3_1 exp/tri3a/tree exp/tri3a/final.mdl 
fstminimizeencoded 
fstdeterminizestar --use-log=true 
fstrmsymbols exp/tri3a/graph/disambig_tid.int 
fstrmepslocal 
fsttablecompose exp/tri3a/graph/Ha.fst data/lang_test/tmp/CLG_3_1.fst 
fstisstochastic exp/tri3a/graph/HCLGa.fst 
0.000487099 -0.178947
HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true exp/tri3a/final.mdl exp/tri3a/graph/HCLGa.fst 
steps/decode.sh --cmd run.pl --mem 16G --nj 10 --config conf/decode.config exp/tri3a/graph data/dev exp/tri3a/decode_dev
decode.sh: feature type is lda
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri3a/graph exp/tri3a/decode_dev
steps/diagnostic/analyze_lats.sh: see stats in exp/tri3a/decode_dev/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,24) and mean=9.6
steps/diagnostic/analyze_lats.sh: see stats in exp/tri3a/decode_dev/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/dev exp/tri3a/graph exp/tri3a/decode_dev
steps/score_kaldi.sh --cmd run.pl --mem 16G data/dev exp/tri3a/graph exp/tri3a/decode_dev
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/dev exp/tri3a/graph exp/tri3a/decode_dev
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/dev exp/tri3a/graph exp/tri3a/decode_dev
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/decode.sh --cmd run.pl --mem 16G --nj 10 --config conf/decode.config exp/tri3a/graph data/test exp/tri3a/decode_test
decode.sh: feature type is lda
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri3a/graph exp/tri3a/decode_test
steps/diagnostic/analyze_lats.sh: see stats in exp/tri3a/decode_test/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,4,30) and mean=12.0
steps/diagnostic/analyze_lats.sh: see stats in exp/tri3a/decode_test/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/test exp/tri3a/graph exp/tri3a/decode_test
steps/score_kaldi.sh --cmd run.pl --mem 16G data/test exp/tri3a/graph exp/tri3a/decode_test
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/test exp/tri3a/graph exp/tri3a/decode_test
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/test exp/tri3a/graph exp/tri3a/decode_test
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/align_fmllr.sh --cmd run.pl --mem 16G --nj 10 data/train data/lang exp/tri3a exp/tri3a_ali
steps/align_fmllr.sh: feature type is lda
steps/align_fmllr.sh: compiling training graphs
steps/align_fmllr.sh: aligning data in data/train using exp/tri3a/final.mdl and speaker-independent features.
steps/align_fmllr.sh: computing fMLLR transforms
steps/align_fmllr.sh: doing final alignment.
steps/align_fmllr.sh: done aligning data.
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri3a_ali
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri3a_ali/log/analyze_alignments.log
259 warnings in exp/tri3a_ali/log/align_pass1.*.log
275 warnings in exp/tri3a_ali/log/align_pass2.*.log
3 warnings in exp/tri3a_ali/log/fmllr.*.log
%WER 36.54 [ 38281 / 104765, 828 ins, 3174 del, 34279 sub ] exp/mono/decode_test/cer_10_0.0
%WER 18.80 [ 19701 / 104765, 881 ins, 1277 del, 17543 sub ] exp/tri1/decode_test/cer_14_0.5
%WER 18.64 [ 19532 / 104765, 949 ins, 1159 del, 17424 sub ] exp/tri2/decode_test/cer_14_0.5
%WER 16.99 [ 17804 / 104765, 834 ins, 917 del, 16053 sub ] exp/tri3a/decode_test/cer_14_0.0

9.Tri4a模型

# Train the third triphone pass model tri4a on LDA+MLLT+SAT features.
# From now on, we start building a more serious system with Speaker
# Adaptive Training (SAT).
steps/train_sat.sh --cmd "$train_cmd" \
  2500 20000 data/train data/lang exp/tri3a_ali exp/tri4a || exit 1;

# decode tri4a
utils/mkgraph.sh data/lang_test exp/tri4a exp/tri4a/graph
steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
  exp/tri4a/graph data/dev exp/tri4a/decode_dev
steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
  exp/tri4a/graph data/test exp/tri4a/decode_test
  
# align tri4a with fMLLR
steps/align_fmllr.sh  --cmd "$train_cmd" --nj 10 \
  data/train data/lang exp/tri4a exp/tri4a_ali
  
steps/train_sat.sh --cmd run.pl --mem 16G 2500 20000 data/train data/lang exp/tri3a_ali exp/tri4a
steps/train_sat.sh: feature type is lda
steps/train_sat.sh: Using transforms from exp/tri3a_ali
steps/train_sat.sh: Accumulating tree stats  


steps/train_sat.sh --cmd run.pl --mem 16G 2500 20000 data/train data/lang exp/tri3a_ali exp/tri4a
steps/train_sat.sh: feature type is lda
steps/train_sat.sh: Using transforms from exp/tri3a_ali
steps/train_sat.sh: Accumulating tree stats
steps/train_sat.sh: Getting questions for tree clustering.
steps/train_sat.sh: Building the tree
steps/train_sat.sh: Initializing the model
steps/train_sat.sh: Converting alignments from exp/tri3a_ali to use current tree
steps/train_sat.sh: Compiling graphs of transcripts
Pass 1
Pass 2
Estimating fMLLR transforms
Pass 3
Pass 4
Estimating fMLLR transforms
Pass 5
Pass 6
Estimating fMLLR transforms
Pass 7
Pass 8
Pass 9
Pass 10
Aligning data
Pass 11
Pass 12
Estimating fMLLR transforms
Pass 13
Pass 14
Pass 15
Pass 16
Pass 17
Pass 18
Pass 19
Pass 20
Aligning data
...
...
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/test exp/tri4a/graph exp/tri4a/decode_test.si
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/decode_fmllr.sh: feature type is lda
steps/decode_fmllr.sh: getting first-pass fMLLR transforms.
steps/decode_fmllr.sh: doing main lattice generation phase
steps/decode_fmllr.sh: estimating fMLLR transforms a second time.
steps/decode_fmllr.sh: doing a final pass of acoustic rescoring.
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri4a/graph exp/tri4a/decode_test
steps/diagnostic/analyze_lats.sh: see stats in exp/tri4a/decode_test/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,20) and mean=8.2
steps/diagnostic/analyze_lats.sh: see stats in exp/tri4a/decode_test/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/test exp/tri4a/graph exp/tri4a/decode_test
steps/score_kaldi.sh --cmd run.pl --mem 16G data/test exp/tri4a/graph exp/tri4a/decode_test
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/test exp/tri4a/graph exp/tri4a/decode_test
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/test exp/tri4a/graph exp/tri4a/decode_test
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/align_fmllr.sh --cmd run.pl --mem 16G --nj 10 data/train data/lang exp/tri4a exp/tri4a_ali
steps/align_fmllr.sh: feature type is lda
steps/align_fmllr.sh: compiling training graphs
steps/align_fmllr.sh: aligning data in data/train using exp/tri4a/final.alimdl and speaker-independent features.
steps/align_fmllr.sh: computing fMLLR transforms
steps/align_fmllr.sh: doing final alignment.
steps/align_fmllr.sh: done aligning data.
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri4a_ali
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri4a_ali/log/analyze_alignments.log
332 warnings in exp/tri4a_ali/log/align_pass2.*.log
1 warnings in exp/tri4a_ali/log/fmllr.*.log
195 warnings in exp/tri4a_ali/log/align_pass1.*.log
%WER 36.54 [ 38281 / 104765, 828 ins, 3174 del, 34279 sub ] exp/mono/decode_test/cer_10_0.0
%WER 18.80 [ 19701 / 104765, 881 ins, 1277 del, 17543 sub ] exp/tri1/decode_test/cer_14_0.5
%WER 18.64 [ 19532 / 104765, 949 ins, 1159 del, 17424 sub ] exp/tri2/decode_test/cer_14_0.5
%WER 16.99 [ 17804 / 104765, 834 ins, 917 del, 16053 sub ] exp/tri3a/decode_test/cer_14_0.0
%WER 13.73 [ 14385 / 104765, 753 ins, 642 del, 12990 sub ] exp/tri4a/decode_test/cer_13_0.5

10.Tri5a模型

# Train tri5a, which is LDA+MLLT+SAT
# Building a larger SAT system. You can see the num-leaves is 3500 and tot-gauss is 100000

steps/train_sat.sh --cmd "$train_cmd" \
  3500 100000 data/train data/lang exp/tri4a_ali exp/tri5a || exit 1;
  
# decode tri5a
utils/mkgraph.sh data/lang_test exp/tri5a exp/tri5a/graph || exit 1;
steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
   exp/tri5a/graph data/dev exp/tri5a/decode_dev || exit 1;
steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
   exp/tri5a/graph data/test exp/tri5a/decode_test || exit 1;
   
# align tri5a with fMLLR
steps/align_fmllr.sh --cmd "$train_cmd" --nj 10 \
  data/train data/lang exp/tri5a exp/tri5a_ali || exit 1;
 
steps/train_sat.sh --cmd run.pl --mem 16G 3500 100000 data/train data/lang exp/tri4a_ali exp/tri5a
steps/train_sat.sh: feature type is lda
steps/train_sat.sh: Using transforms from exp/tri4a_ali
steps/train_sat.sh: Accumulating tree stats
steps/train_sat.sh: Getting questions for tree clustering.
steps/train_sat.sh: Building the tree
steps/train_sat.sh: Initializing the model
steps/train_sat.sh: Converting alignments from exp/tri4a_ali to use current tree
steps/train_sat.sh: Compiling graphs of transcripts
Pass 1
Pass 2
Estimating fMLLR transforms
Pass 3
Pass 4
Estimating fMLLR transforms
Pass 5
Pass 6
Estimating fMLLR transforms
Pass 7
Pass 8
Pass 9
Pass 10
Aligning data
Pass 11
Pass 12
Estimating fMLLR transforms
Pass 13
Pass 14
Pass 15
Pass 16
Pass 17
Pass 18
Pass 19
Pass 20
Aligning data
Pass 21
Pass 22
Pass 23
Pass 24
Pass 25
Pass 26
Pass 27
Pass 28
Pass 29
Pass 30
Aligning data
Pass 31
Pass 32
Pass 33
Pass 34
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri5a
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri5a/log/analyze_alignments.log
39 warnings in exp/tri5a/log/fmllr.*.*.log
225 warnings in exp/tri5a/log/acc.*.*.log
1 warnings in exp/tri5a/log/compile_questions.log
679 warnings in exp/tri5a/log/align.*.*.log
1 warnings in exp/tri5a/log/build_tree.log
steps/train_sat.sh: Likelihood evolution:
-48.683 -48.7174 -48.6706 -48.4869 -47.9013 -47.191 -46.6464 -46.2649 -45.9935 -45.6753 -45.5116 -45.2495 -45.1217 -45.0272 -44.9426 -44.8655 -44.7949 -44.7309 -44.673 -44.5538 -44.4844 -44.438 -44.3968 -44.3583 -44.3219 -44.2874 -44.2544 -44.2226 -44.1918 -44.1271 -44.0841 -44.0594 -44.042 -44.0297 
exp/tri5a: nj=10 align prob=-47.15 over 150.19h [retry=0.1%, fail=0.0%] states=3016 gauss=100103 fmllr-impr=0.25 over 116.48h tree-impr=7.66
steps/train_sat.sh: done training SAT system in exp/tri5a
tree-info exp/tri5a/tree 
tree-info exp/tri5a/tree 
make-h-transducer --disambig-syms-out=exp/tri5a/graph/disambig_tid.int --transition-scale=1.0 data/lang_test/tmp/ilabels_3_1 exp/tri5a/tree exp/tri5a/final.mdl 
fstrmsymbols exp/tri5a/graph/disambig_tid.int 
fstrmepslocal 
fstdeterminizestar --use-log=true 
fstminimizeencoded 
fsttablecompose exp/tri5a/graph/Ha.fst data/lang_test/tmp/CLG_3_1.fst 
fstisstochastic exp/tri5a/graph/HCLGa.fst 
0.00048784 -0.178947
HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true exp/tri5a/final.mdl exp/tri5a/graph/HCLGa.fst 
steps/decode_fmllr.sh --cmd run.pl --mem 16G --nj 10 --config conf/decode.config exp/tri5a/graph data/dev exp/tri5a/decode_dev
steps/decode.sh --scoring-opts  --num-threads 1 --skip-scoring false --acwt 0.083333 --nj 10 --cmd run.pl --mem 16G --beam 8.0 --model exp/tri5a/final.alimdl --max-active 2000 exp/tri5a/graph data/dev exp/tri5a/decode_dev.si
decode.sh: feature type is lda
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri5a/graph exp/tri5a/decode_test.si
steps/diagnostic/analyze_lats.sh: see stats in exp/tri5a/decode_test.si/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,16) and mean=6.8
steps/diagnostic/analyze_lats.sh: see stats in exp/tri5a/decode_test.si/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/test exp/tri5a/graph exp/tri5a/decode_test.si
steps/score_kaldi.sh --cmd run.pl --mem 16G data/test exp/tri5a/graph exp/tri5a/decode_test.si
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/test exp/tri5a/graph exp/tri5a/decode_test.si
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/test exp/tri5a/graph exp/tri5a/decode_test.si
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/decode_fmllr.sh: feature type is lda
steps/decode_fmllr.sh: getting first-pass fMLLR transforms.
steps/decode_fmllr.sh: doing main lattice generation phase
steps/decode_fmllr.sh: estimating fMLLR transforms a second time.
steps/decode_fmllr.sh: doing a final pass of acoustic rescoring.
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 16G exp/tri5a/graph exp/tri5a/decode_test
steps/diagnostic/analyze_lats.sh: see stats in exp/tri5a/decode_test/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,18) and mean=7.4
steps/diagnostic/analyze_lats.sh: see stats in exp/tri5a/decode_test/log/analyze_lattice_depth_stats.log
+ steps/score_kaldi.sh --cmd 'run.pl --mem 16G' data/test exp/tri5a/graph exp/tri5a/decode_test
steps/score_kaldi.sh --cmd run.pl --mem 16G data/test exp/tri5a/graph exp/tri5a/decode_test
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 16G' data/test exp/tri5a/graph exp/tri5a/decode_test
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 16G data/test exp/tri5a/graph exp/tri5a/decode_test
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
steps/align_fmllr.sh --cmd run.pl --mem 16G --nj 10 data/train data/lang exp/tri5a exp/tri5a_ali
steps/align_fmllr.sh: feature type is lda
steps/align_fmllr.sh: compiling training graphs
steps/align_fmllr.sh: aligning data in data/train using exp/tri5a/final.alimdl and speaker-independent features.
steps/align_fmllr.sh: computing fMLLR transforms
steps/align_fmllr.sh: doing final alignment.
steps/align_fmllr.sh: done aligning data.
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 16G data/lang exp/tri5a_ali
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri5a_ali/log/analyze_alignments.log
72 warnings in exp/tri5a_ali/log/align_pass1.*.log
102 warnings in exp/tri5a_ali/log/align_pass2.*.log
%WER 36.54 [ 38281 / 104765, 828 ins, 3174 del, 34279 sub ] exp/mono/decode_test/cer_10_0.0
%WER 18.80 [ 19701 / 104765, 881 ins, 1277 del, 17543 sub ] exp/tri1/decode_test/cer_14_0.5
%WER 18.64 [ 19532 / 104765, 949 ins, 1159 del, 17424 sub ] exp/tri2/decode_test/cer_14_0.5
%WER 16.99 [ 17804 / 104765, 834 ins, 917 del, 16053 sub ] exp/tri3a/decode_test/cer_14_0.0
%WER 13.73 [ 14385 / 104765, 753 ins, 642 del, 12990 sub ] exp/tri4a/decode_test/cer_13_0.5
%WER 12.01 [ 12585 / 104765, 688 ins, 537 del, 11360 sub ] exp/tri5a/decode_test/cer_14_0.5

11.nnet3模型

# 若提示
This script is intended to be used with GPUs but you have not compiled Kaldi with CUDA
If you want to use GPUs (and have them), go to src/, and configure and make on a machine
where "nvcc" is installed.
# 解决方法
cd kaldi/src/
修改configure文件
# 将你的/usr/local/cuda /home/lnj524/module/cuda/cuda-11.6地址添加进去
function configure_cuda {
  # Check for CUDA toolkit in the system
  if [ ! -d "$CUDATKDIR" ]; then
    for base in /usr/local/share/cuda /usr/local/cuda /home/lnj524/module/cuda/cuda-11.6 /usr/; do
      if [ -f $base/bin/nvcc ]; then
        CUDATKDIR=$base
      fi
    done
  fi
# 再次编译kaldi
./configure --shared --use-cuda=yes --cudatk-dir=/home/lnj524/module/cuda/cuda-11.6/ --cuda-arch=-arch=sm_86 --mathlib=OPENBLAS --openblas-root=/home/lnj524/module/kaldi/tools/OpenBLAS-0.3.13/install/

make -j clean depend
make -j 16
# 正式开始前需要将s5/local/nnet3/run_tdnn.sh中  --use-gpu true 修改为  --use-gpu wait 
# nnet3
local/nnet3/run_tdnn.sh

local/nnet3/run_ivector_common.sh: preparing directory for low-resolution speed-perturbed data (for alignment)
fix_data_dir.sh: kept all 120098 utterances.
fix_data_dir.sh: old files are kept in data/train/.backup
utils/data/perturb_data_dir_speed_3way.sh: making sure the utt2dur and the reco2dur files are present
... in data/train, because obtaining it after speed-perturbing
... would be very slow, and you might need them.
utils/data/get_utt2dur.sh: data/train/utt2dur already exists with the expected length.  We won't recompute it.
utils/data/get_reco2dur.sh: data/train/wav.scp indexed by utt-id; copying utt2dur to reco2dur
utils/data/perturb_data_dir_speed.sh: generated speed-perturbed version of data in data/train, in data/train_sp_speed0.9
fix_data_dir.sh: kept all 120098 utterances.
fix_data_dir.sh: old files are kept in data/train_sp_speed0.9/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_speed0.9
utils/data/perturb_data_dir_speed.sh: generated speed-perturbed version of data in data/train, in data/train_sp_speed1.1
fix_data_dir.sh: kept all 120098 utterances.
fix_data_dir.sh: old files are kept in data/train_sp_speed1.1/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_speed1.1
utils/data/combine_data.sh data/train_sp data/train data/train_sp_speed0.9 data/train_sp_speed1.1
utils/data/combine_data.sh: combined utt2uniq
utils/data/combine_data.sh [info]: not combining segments as it does not exist
utils/data/combine_data.sh: combined utt2spk
utils/data/combine_data.sh [info]: not combining utt2lang as it does not exist
utils/data/combine_data.sh: combined utt2dur
utils/data/combine_data.sh [info]: **not combining utt2num_frames as it does not exist everywhere**
utils/data/combine_data.sh: combined reco2dur
utils/data/combine_data.sh [info]: **not combining feats.scp as it does not exist everywhere**
utils/data/combine_data.sh: combined text
utils/data/combine_data.sh [info]: **not combining cmvn.scp as it does not exist everywhere**
utils/data/combine_data.sh [info]: not combining vad.scp as it does not exist
utils/data/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist
utils/data/combine_data.sh: combined wav.scp
utils/data/combine_data.sh [info]: not combining spk2gender as it does not exist
fix_data_dir.sh: kept all 360294 utterances.
fix_data_dir.sh: old files are kept in data/train_sp/.backup
utils/data/perturb_data_dir_speed_3way.sh: generated 3-way speed-perturbed version of data in data/train, in data/train_sp
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp
local/nnet3/run_ivector_common.sh: making MFCC features for low-resolution speed-perturbed data
steps/make_mfcc_pitch.sh --cmd run.pl --mem 16G --nj 70 data/train_sp exp/make_mfcc/train_sp mfcc_perturbed
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.

# 主动终止进程,太占用CPU资源
# 取消速度扰动,仍然需要大量资源
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

@李思成

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值