CMUSphinx Learn - Adapting the default acoustic model

最新推荐文章于 2018-07-02 19:32:42 发布

IT_FISH629

最新推荐文章于 2018-07-02 19:32:42 发布

阅读量3k

点赞数 1

语音识别 - cmu sphinx 专栏收录该内容

10 篇文章

订阅专栏

Adapting the default acoustic model

调节默认声学模型

This page describes how to do some simple acoustic model adaptation to improve speech recognition in your configuration. Please note that the adaptation doesn't necessary adapt for a particular speaker. It just improves the fit between the adapatation data and the model. For example you can adapt to your own voice to make dictation good, but you also can adapt to your particular recording environement, your audio transmission channel, your accent or accent of your users. You can use model trained with clean broadcast data and telephone data to produce telephone acoustic model by doing adaptation. Cross-language adaptation also make sense, for example you can adapt English model to sounds of other language by creating a phoneset map and creating other language dictionary with English phoneset.

这一页描述如何在你的配置文件中对一些简单声学模型做些改变来提高语音识别率，请注意，这个改变不一定适合于定说话人，它只是使配置中的参考数据和模型更合适。例如，你可以适应自己的声音使听写更准确，但是你也可以使自己适应特定的录音环境、音频传输通道、自己的口音或者用户口音。你可以使用清晰的广播数据和电话数据训练的模型通过做一些改变来产生电话声学模型，跨语言适应也是可行的，比如，可以通过创建一个电话机映射和其他语言的英文电话机字典来使英文模型适应于其他语言的声音。

The adaptation process takes transcribed data and improves the model you already have. It's more robust than training and could lead to a good results even if your adaptation data is small. For example, it's enough to have 5 minutes of speech to significantly improve the dictation accuracy by adaptation to the particular speaker.

适应过程需要转录数据来改善已获得的模型，这样鲁棒性比训练的要好，即使适应数据很小也能使识别结果更好，比如，针对特定说话人，5分钟的语音就可以显著提高听写的精度。

The methods of adaptation are a bit different between PocketSphinx and Sphinx4 due to the different types of acoustic models used. For more technical information on that seeAcousticModelTypes.

由于不同类型声学模型的使用，适应方法在PocketSphinx和Sphinx4之间有点不同。更多的技术信息，参考AcousticModelTypes.

Creating an adaptation corpus

创建适应语料库

The first thing you need to do is create a corpus of adaptation data. This will consist of a list of sentences, a dictionary describing the pronunciation of all the words in that list of sentences, and a recording of you speaking each ofthose sentences.

需要做的第一件事创建一个适应数据语料库，由句子列表、一个描述句子列表中所有单词发音的字典和这些句子的录音组成。

Required files

必要文件

The actual set of sentences you use is somewhat arbitrary, but ideally it should have good coverage of the most frequently used words or phonemes in the set of sentences or the type of text you want to recognize. We have had good results simply using sentences from the CMU ARCTIC text-to-speech databases. To that effect, here are the first 20 sentences from ARCTIC, a control file, a transcription file, and a dictionary for them:

实际使用的句子集合有点乱，比较理想的是，可以很好的覆盖大部分常用单词，或者句子集合，抑或是识别类型的文本，通过使用CMU ARCTIC 的文本到语音数据库中的句子，我们已经获得很好的结果，这里有ARCTIC的前20个句子，一个控制文件，一个转换文件，一个字典文件：

The sections below will refer to these files, so it would be a good idea to download them now. You should also make sure that you have downloaded and compiled SphinxBase and SphinxTrain.

接下来的部分将会涉及到这些文件，所以现在下载它们会比较好，需要确定你已经下载和编译了SphinxBase和SphinxTrain。

Recording your adaptation data

录制你的适应数据

In case you are adapting to a single speaker you can record the adaptation data yourself. This is unfortunately a bit more complicated than it ought to be. Basically, you need to record a single audio file for each sentence in the adaptation corpus, naming the files according to the names listed in arctic20.transcription andarctic20.fileids. In addition, you willNEED TO MAKE SURE THAT YOU RECORD AT A SAMPLING RATE OF 16 KHZ (or 8 kHz if you adapt a telephone model) IN MONO WITH SINGLE CHANNEL.

假如你想适应单个说话人，你可以录制自己的适应数据。不幸的是，这比实际情况要复杂，你需要在适应语料库中为每个句子录制一个音频文件，根据arctic20.transcription和arctic20.fileids文件中列出的名称来命名这些句子文件。另外，你要确定录制的采样率是16KHz(或者如果是电话模型采用8kHz)，单声道。

If you are at a Linux command line, you can accomplish this in very nerdy style with the followingbash one-liner from the directory in which you downloadedarctic20.txt: Since we are redirecting the output the /dev/null in the one-liner, you should verify whether you have thesox package, and if not, install it using this command.

如果在Linux命令行中，可以用单行bash的方式来完成你下载的arctic20.txt文件目录中句子的录音任务，因为我们在单行命令中重定向输出目录/dev/null，所以你应该检查是否有sox软件包，如果没有，使用下面命令安装。

sudo apt-get install sox

Now, the one-liner is as follows

现在，单行命令如下：

for i in `seq 1 20`; do 
       fn=`printf arctic_%04d $i`; 
       read sent; echo $sent; 
       rec -r 16000 -e signed-integer -b 16 -c 1 $fn.wav 2>/dev/null; 
done < arctic20.txt

This will echo each sentence to the screen and start recording immediately. Hit Control-C to move on to the next sentence. You should see the following files in the current directory afterwards:

打印每个句子的内容并立即录音，Control+C转向下一句，你可以在当前目录下看到以下文件:

arctic_0001.wav  
arctic_0002.wav
.....
arctic_0019.wav
arctic20.dic
arctic20.fileids
arctic20.transcription
arctic20.txt

If you've hit Control-C immediately after you finished speaking out a sentence, chances are that your recording might have truncated the last word. You should verify that these recordings sound okay. To do this, you can play them back with:

如果你说出一句话后立即敲击Control-C键，你的录音可能会截断最后的单词，你要保证这些录音听起来是完整的，为此，你可以回放它们：

for i in *.wav; do play $i; done

If you are adapting to a channel, accent or some other generic property of the audio, then you need to collect a little bit more recordings manually. For example, in call center you can record and transcribe hundred calls and use them to improve the recognizer accuracy by means of adaptation.

如果你正在适应声道、口音或者音频的其他的一些通用属性，你需要收集一点手动录音。比如，你可以在呼叫中心录制和转录成百上千的电话录音，通过适应方法，使用它们来提高识别器的准确率。

Adapting the acoustic model

自适应声学模型

First we will copy the default acoustic model from PocketSphinx into the current directory in order to work on it. Assuming that you installed PocketSphinx under/usr/local, the acoustic model directory is/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k. Copy this directory to your working directory:

首先我们从PocketSphinx中拷贝默认声学模型到当前目录中，假设PocketSphinx安装在/user/local下，声学模型的目录是/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k，拷贝这个目录到你的工作目录：

cp -a /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k.

Generating acoustic feature files

产生声学特征文件

In order to run the adaptation tools, you must generate a set of acoustic model feature files from these WAV audio recordings. This can be done with thesphinx_fe tool from SphinxBase. It is imperative that you make sure you are using the same acoustic parameters to extract these features as were used to train the standard acoustic model. Since PocketSphinx 0.4, these are stored in a file calledfeat.params in the acoustic model directory. You can simply add it to the command line forsphinx_fe, like this:

为了运行适应工具，必须从WAV音频录音文件中产生声学模型特征文件集合，可以用SphinxBase的sphinx_fe工具来做这些工作，必须保证训练标准声学模型的参数和从录音文件中提取特征值时的参数是一样的，对于PocketSphinx 0.4，特征值存储在声学模型目录下的feat.params文件中，你可以为sphinx_fe简单添加如下命令行：

sphinx_fe -argfile hub4wsj_sc_8k/feat.params \
        -samprate 16000 -c arctic20.fileids \
       -di . -do . -ei wav -eo mfc -mswav yes

If you are using Sphinx4 and the model doesn't have thefeat.params file, just omit theargfile parameter to use default settings.

如果使用Sphinx4，模型不会产生feat.params文件，会忽略argfile参数，使用默认设置。

You should now have the following files in your working directory:

现在在工作目录中会有以下文件：

arctic_0001.mfc
arctic_0001.wav
arctic_0002.mfc
arctic_0002.wav
arctic_0003.mfc
arctic_0003.wav
.....
arctic_0020.wav
arctic20.dic
arctic20.fileids
arctic20.transcription
arctic20.txt

Converting the sendump and mdef files

转换sendump和mdef文件

Some models don't have enough data for adaptation. There is an extra file which you need which was left out of the PocketSphinx distribution in order to save space. You candownload the it from code repository from the package called pocketsphinx-extra from the folder pocketsphinx-extra/model/hmm/en_US/hub4_wsj_sc_3s_8k.cd_semi_5000 or checkout it from subversion. Copy the mixture_weights file to your acoustic model folder.

有些模型没有足够的适应数据，你需要一个附件，为了节省空间，这个附件不在PocketSphinx中，你可以从代码库中的pocketsphinx-extra/model/hmm/en_US/hub4_wsj_sc_3s_8k.cd_semi_5000文件夹下的pocketsphinx-extra页面下载或者从subversion中检出。拷贝mixture_weights文件到你的声学模型文件夹下。

Sometimes sendump file can be converted back to mixture_weights file. This is only possible for an older sendump files. If you have installed theSphinxTrain Python modules, you can useSphinxTrain/python/cmusphinx/sendump.py to convert thesendump file from the acoustic model to amixture_weights file. For hub4_wsj acoustic model it will not work.

有时候sendump文件可以转回到mixture_weights文件，这仅仅是对较早的sendump文件是可能的，如果已经安装SphinxTrain的Python模块，就可以使用SphinxTrain/python/cmusphinx/sendump.py就可以将sendump文件从声学模型转换到mixture_weights文件，对于hub4_wsj声学模型则不能转换。

You will also need to convert the mdef file from the acoustic model to the plain text format used by the SphinxTrain tools. To do this, use thepocketsphinx_mdef_convert program:

你需要使用SphinxTrain工具将mdef文件从声学模型转换到纯文本格式，使用pocketsphinx_mdef_convert程序：

pocketsphinx_mdef_convert -text hub4wsj_sc_8k/mdef hub4wsj_sc_8k/mdef.txt

Accumulating observation counts

积累观察计数

The next step in adaptation is to collect statistics from the adaptation data. This is done using thebw program from SphinxTrain. You should be able to find bw tool in a sphinxtrain installation in a folder/usr/local/libexec/sphinxtrain (or under other prefix on Linux) or inbin\Release (in sphinxtrain directory on Windows). Copy it to the working directory along with themap_adapt andmk_s2sendump programs.

适应的下一步是从适应数据中收集统计信息，使用SphinxTrain的bw程序来完成此步骤，可以从sphinxtrain的安装文件夹/usr/local/libexec/sphinxtrain(或者Linux的其他文件夹) 或者 bin\Release 中找到bw工具(Windows系统的sphinxtrain目录)，把它连同map_adapt和mk_s2sendump程序一起拷贝到工作目录。

Now, to collect statistics, run:

现在，收集统计信息，运行：

./bw \
 -hmmdir hub4wsj_sc_8k \
 -moddeffn hub4wsj_sc_8k/mdef.txt \
 -ts2cbfn .semi. \
 -feat 1s_c_d_dd \
 -svspec 0-12/13-25/26-38 \
 -cmn current \
 -agc none \
 -dictfn arctic20.dic \
 -ctlfn arctic20.fileids \
 -lsnfn arctic20.transcription \
 -accumdir .

Make sure the arguments in bw command should match the parameters infeat.params file inside the acoustic model folder. Please note that not all the parameters fromfeat.param are supported by bw, only a few of them. bw for example doesn't suppportupperf or other feature extraction params. You only need to use parameters which are accepted, other parameters fromfeat.params should be skipped.

确保bw命令中的参数应该和声学模型文件夹下的feat.params文件中的参数一致，请注意，bw程序并不支持feat.params文件中的所有参数，只支持其中一部分。比如bw不支持upperf或者其他的特征提取参数，你只能使用支持的参数，应该跳过那些feat.params中不支持的参数。

For example, for most continuous models (like the ones used by Sphinx4) you don't need to include thesvspec option. Instead, you need to use just-ts2cbfn .cont. For PTM models use-ts2cbfn .ptm.

比如，对大部分的连续模型(像Sphinx4使用的)，不需要包含svspec选项，相反，应该仅使用-ts2cbfn选项，对于PTM模型应该使用-ts2cbfn .ptm选项。

Sometimes if you miss the file noisedict you also need an extra step, copy thefillerdict file into the directory that you choose in thehmmdir parameter, renaming it tonoisedict.

有时，如果丢失noisedict文件，你需要做些额外的工作，拷贝fillerdict 文件到hmmdir参数的目录中，重新将其命名为noisedict。

Creating transformation with MLLR

用最大似然线性回归算法建立转移矩阵

MLLR transforms are supported by pocketsphinx and sphinx4. MLLR is a cheap adaptation method that is suitable when amount of data is limited. It's a good idea to use MLLR for online adaptation. MLLR works best for continuous model. It's effect for semi-continuous models is very limited since semi-continuous models mostly relies on mixture weights. If you want best accuracy you can combine MLLR adaptation with MAP adaptation below

pocketsphinx和sphinx4支持MLLR变换式，MLLR是一种适用于数据量有限的适应方法，在线适应使用MLLR是非常明智的，MLLR最适合连续模型，对半连续模型的效果有限，因为半连续模型大多数依赖混合权数，如果想得到更好的识别精度，可以将MLLR适应和MAP适应组合在一起。

Next we will generate an MLLR transformation which we will pass to the decoder to adapt the acoustic model at run-time. This is done with the mllr_solve program:

接下来将产生MLLR转换，在运行时，我们将把这个变换传递到解码器以适应声学模型。这个步骤可以有mllr_solve程序：

./mllr_solve \
    -meanfn hub4wsj_sc_8k/means \
    -varfn hub4wsj_sc_8k/variances \
    -outmllrfn mllr_matrix -accumdir .

This command will create an adaptation data file calledmllr_matrix. Now, if you wish to decode with the adapted model, simply add-mllr mllr_matrix (or whatever the path to the mllr_matrix file you created is) to your pocketsphinx command line.

这个命令将建立一个mllr_matrix适应数据文件，现在，如果你想用适应模型来解码，只需简单添加-mllr mllr_matrix(或者任何创建的mllr_matrix文件的路径)到你的pocketsphinx的命令行中。

Updating the acoustic model files with MAP

用MAP更新声学模型文件

MAP is different adaptation method. In this case unlike for MLLR we don't create a generic transform but update each parameter in the model. We will now copy the acoustic model directory and overwrite the newly created directory with adapted model files:

MAP是一种不同的适应方法，和MLLR不同，我们不需要建立一个通用转换就可以更新模型中的每个参数，我们现在可以声学模型的目录和适应文件并覆盖新创建的目录

cp -a hub4wsj_sc_8k hub4wsj_sc_8kadapt

To do adaptation, use the map_adapt program:

为了做适应工作，使用map_adapt程序：

./map_adapt \
    -meanfn hub4wsj_sc_8k/means \
    -varfn hub4wsj_sc_8k/variances \
    -mixwfn hub4wsj_sc_8k/mixture_weights \
    -tmatfn hub4wsj_sc_8k/transition_matrices \
    -accumdir . \
    -mapmeanfn hub4wsj_sc_8kadapt/means \
    -mapvarfn hub4wsj_sc_8kadapt/variances \
    -mapmixwfn hub4wsj_sc_8kadapt/mixture_weights \
    -maptmatfn hub4wsj_sc_8kadapt/transition_matrices

Recreating the adapted sendump file

重建自适应sendump文件

If you want to save space for the model you can use sendump file supported by pocketsphinx. For sphinx4 you don't need that. To recreate thesendump file from the updatedmixture_weights file:

如果你想为模型节约空间，可以使用pocketsphinx支持的sendump文件，对于sphinx4则不需要，为了从更新的mixture_weights文件重建sendump文件：

./mk_s2sendump \
    -pocketsphinx yes \
    -moddeffn hub4wsj_sc_8kadapt/mdef.txt \
    -mixwfn hub4wsj_sc_8kadapt/mixture_weights \
    -sendumpfn hub4wsj_sc_8kadapt/sendump

Congratulations! You now have an adapted acoustic model! You can delete the fileswsj1adapt/mixture_weights andwsj1adapt/mdef.txt to save space if you like, because they are not used by the decoder.

恭喜，你现在有了一个自适应声学模型了！为节省空间，你可以删除wsj1adapt/mixture_weights和wsj1adapt/mdef.txt文件，因为解码器不再需要它们了。

Other acoustic models

其他声学模型

For Sphinx4, the adaptation is the same as for PocketSphinx, except that the model is continuous. You need to take a continuous model from sphinx4 for adaptation. Continuous models often have text mdef file, thus you don't need to unpack the binary mdef and pack it back. Also, during the bw the type of model is .cont. and not .semi. and the feature type is usually 1s_c_d_dd. The rest is the same - bw + map_adapt will do the work.

对于Sphinx4，自适应和PocketSphinx一样，除了模型是连续的。你需要从sphinx4中取得连续模型来做自适应，连续的模型经常会有mdef文本文件，于是你不需要打开mdef二进制文件，然后再重新打包。另外，在bw过程中，模型的类型是.cont.，并不是.semi.特征类型通常是1s_c_d_dd。剩余部分是相同的，bw和map_adapt可以完成工作。

Testing the adaptation

测试自适应模型

After you have done the adaptation, it's critical to test the adaptation quality. To do that you need to setup the database similar to the one used for adaptation. To test the adaptation you need to configure the decoding with the required paramters, in particular, you need to have <your.lm>. For more details see Building Language Model

当你完成自适应的时候，测试自适应模型的性能是很重要的。为了测试工作，需要建立和自适应模型相类似的数据库，为了测试自适应模型，需要配置解码器需要的参数，特别是需要<your,lm>，更多信息请看Building Language Model

Create fileids file adaptation-test.fileids:

创建自适应文件标示 - test.fileids:

test1
test2

Create transcription file adaptation-test.transcription:

创建自适应转换文件 - test.transcription:

some text (test1)
some text (test2)

Put the audio files in wav folder. Make sure those files have proper format and sample rate.

将音频文件放入wav文件夹。确保这些文件格式和采样率正确。

wav/test1.wav
wav/test2.wav

You can also use adaptation data for testing, but it's recommended to create a separate test set. Now, let's run the decoder:

也可以使用自适应数据来做测试，但是推荐创建独立的测试数据集，现在运行解码器：

pocketsphinx_batch \
 -adcin yes \
 -cepdir wav \
 -cepext .wav \
 -ctl adaptation-test.fileids \
 -lm <your.lm> \
 -dict <your.dic, for example arctic.dic> \
 -hmm <your_new_adapted_model, for example hub4wsj_sc_8kadapt> \
 -hyp adapation-test.hyp

word_align.pl adaptation-test.transcription adapation-test.hyp

Make sure to add

 -samprate 8000

to the above command if you are decoding 8kHz files!

如果解码8kHz文件，确保添加上面的命令。

The script word-align.pl from Sphinxtrain will report you the exact error rate which you can use to decide if adaptation worked for you. It will look something like:

Sphinxtrain的word-align.pl脚本会报告精确的错误率，可以根据错误率来判断自适应模型是否可以正常工作，看上去就像：

TOTAL Words: 773 Correct: 669 Errors: 121
TOTAL Percent correct = 86.55% Error = 15.65% Accuracy = 84.35%
TOTAL Insertions: 17 Deletions: 11 Substitutions: 93

You can try to run decoder on the original acoustic model and on new acoustic model to estimate the improvement.

你可以在解码器中运行原始的声学模型和新的声学模型来评估改进的效果。

Using the model

使用模型

After adaptation, the acoustic model is located in the folder

自适应过后，声学模型就在下面文件夹中

hub4wsj_sc_8kadapt

You need only that folder. The model should have the following files:

您只需要那个文件夹，模型应该有以下文件：

mdef
feat.params
mixture_weights
means
noisedict
transition_matrices
variances

Depending on the type of the model you trained.

取决于训练的模型类型.

To use the model in pocketsphinx, simply put the model files to the resources of your application. Then point to it with the -hmm option:

在pocketsphinx中使用模型，就简单地将模型文件加入到应用程序的资源中，用-hmm选项来指向它：

pocketsphinx_continuous -hmm <your_new_model_folder> -lm <your_lm> -dict <your_dict>.

Or with -hmm engine configuration option through cmd_ln_init function. Alternatively you can replace the old model files with the new ones.

或者在cmd_ln_init函数中使用-hmm作为引擎配置选项，或者你可以使用新的参数来取代旧的模型文件。

To use the trained model in sphinx4, you need to update the model location in the config file. Read the documentation on Using SphinxTrain models in sphinx4.

在sphinx4中使用训练的模型，需要在配置文件中更新模型的位置，请读文档 Using SphinxTrain models in sphinx4.

Adaptation didn't improve results. Troubleshooting

自适应并没有提高识别率，故障排除

Now test your accuracy to see it's good.

现在测试可以看到它有很好的精度

I have no idea where to start looking for the problem

不知道从哪里开始寻找问题

test accuracy on adaptation set if it improves
accuracy improves on adaptation set → check if your adaptation set match with your test set
accuracy didn't improve on adaptation set → you made mistake during adaptation

1. 如果精度提高，就在自适应数据集中进行测试

2. 在自适应数据集的精度提高 -> 检查自适应数据集和测试数据集是否匹配。

3. 自适应数据集的精度并未提高 -> 在自适应过程中出现了问题。

or how much improvement I might expect through adaptation

或者我期望通过自适应可以改善多少

From few sentences you should get about 10% relative WER improvement.

从很少的句子中，可以提高10%的单词错误率。

I'm lost

whether it's needing more/better training data, whether I'm not doing the adaptation correctly, whether my language model is the problem here, or whether there is something intrinsically wrong with my configuration

是否需要更多或更好的数据，在自适应过程中是否正确，语言模型是否有问题，我的配置是否存咋错误。

Most likely you just ignored error messages that were printed to you. You obviosly need to provide more information and give access to your experiment files in order to get more definite advise.

可能你忽略了打印出来的错误信息，明显地需要提供更多的信息，访问试验文件来获得更多明确的建议。