KALDI学习笔记(一)——About the Kaldi project

最近一直在看KALDI官网的资料,在看的同时加一些注解,方便自己的理解。

我的学习笔记基本上都是来自KALDI官方网址http://kaldi.sourceforge.net,并加上我的注解,特此说明。

About the Kaldi project

What is Kaldi?

Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. For more detailed history and list of contributors see History of the Kaldi project.

The name Kaldi

According to legend, Kaldi was the Ethiopian goatherder who discovered the coffee plant.

注:传说这个叫Kaldi的人第一个发现了咖啡,在他放羊的时候,发现羊吃了一种树之后,特别有精神,所以就发现了咖啡。

Kaldi's versus other toolkits

Kaldi is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend. Important features include:

  Code-level integration with Finite State Transducers (FSTs)

    注:有限状态机

    We compile against the OpenFst toolkit (using it as a library).

  Extensive linear algebra support

  注:线性代数的支持

    We include a matrix library that wraps standard BLAS and LAPACK routines.

  注:BLAS,基础线性代数程序集。LAPACK,线性代数程序库。

  Extensible design
    As far as possible, we provide our algorithms in the most generic form possible. For instance, our decoders are templated on an object that provides a score indexed by a (frame, fst-input-symbol) tuple. This means the decoder     could work from any suitable source of scores, such as a neural net.
  Open license
   The code is licensed under Apache 2.0, which is one of the least restrictive licenses available.
  Complete recipes
    Our goal is to make available complete recipes for building speech recognition systems, that work from widely available databases such as those provided by the Linguistic Data Consortium (LDC).

    注:Linguistic Data Consortium (LDC),语言资源联盟

The goal of releasing complete recipes is an important aspect of Kaldi. Since the code is publicly available 

under a license that permits modifications and re-release, we would like to encourage people to release 

their code, along with their script directories, in a similar format to Kaldi's own example script.

We have tried to make Kaldi's documentation as complete as possible given time constraints, but in the 

short term we cannot hope to generate documentation that is as thorough as HTK's. In particular there is a 

lot of introductory material in the HTKBook, explaining statistical speech recognition for the uninitiated, 

that will probably never appear in Kaldi's documentation. Much of Kaldi's documentation is written in such 

a way that it will only be accessible to an expert. In the future we hope to make it somewhat more 

accessible, bearing in mind that our intended audience is speech recognition researchers or researchers-in-

training. In general, Kaldi is not a speech recognition toolkit "for dummies." It will allow you to do many 

kinds of operations that don't make sense.


The flavor of Kaldi

In this section we attempt to summarize some of the more generic qualities of the Kaldi toolkit. To some 

extent this describes the goals of the current developers, as much as it descibes the current status of the 、

project. It is not meant to exclude contributions from researchers whose work has a different flavor.


 We emphasize generic algorithms and universal recipes

    By "generic algorithms" we mean things like linear transforms, rather than those that are specific to 

speech in some way. But we don't intend to be too dogmatic about this, if more specific algorithms are 

useful.

    We would like recipes that can be run on any data-set, rather than those that have to be customized.


  We prefer provably correct algorithms

    The recipes have been designed in such a way that in principle they should never fail in a catastophic 

way. There has been an effort to avoid recipes and algorithms that could possibly fail, even if they don't fail 

in the "normal case" (one example: FST weight-pushing, which normally helps but can crash or make things 

much worse in certain cases).


  Kaldi code is thoroughly tested.

    The goal is for all or nearly all the code to have corresponding test routines.


 We try to keep the simple cases simple.

    There is a danger when building a large speech toolkit that the code can become a forest of rarely used 

alternatives. We are trying to avoid this by structuring the toolkit in the following way. Each command-line 

program generally works for a limited set of cases (e.g. a decoder might just work for GMMs). Thus, when 

you add a new type of model, you create a new command-line decoder (that calls the same underlying 

templated code).


  Kaldi code is easy to understand.

    Even though the Kaldi toolkit as a whole may get very large, we aim for each individual part of it to be 

understandable without too much effort. We will accept some code duplication if it improves the 

understandability of individual pieces.


  Kaldi code is easy to reuse and refactor.

    We aim for the toolkit to as loosely coupled as possible. In general this means that any given header 

should need to #include as few other header files as possible. The matrix library, in particular, only depends 

on code in one other subdirectory so it can be used independently of almost all the rest of Kaldi.


Status of the project

Currently, we have code and scripts for most standard techniques, including all standard linear transforms, MMI, boosted MMI and MCE discriminative training, and also feature-space discriminative training (like fMPE, but based on boosted MMI). We have working recipes for Wall Street Journal and Resource Management, and also for Switchboard. The Switchboard recipe is not yet giving state-of-the-art results, due to vocabulary and language model issues– we don't use any external data sources for this.


Note: after an early phase in which we intended to use version numbers for major releases of Kaldi ("v1" and so on), we realized that these type of releases do not mesh well with the natural style of development, which is very continuous. Currently we maintain two major versions of Kaldi: the "trunk" version and the "stable" version. The "trunk" version is the one most people commit to, and contains the most up-to-date features but may also contain partially finished features. The "stable" version is mostly a subset of "trunk" that slightly lags in time, and has more thorough testing. We also maintain several "sandbox" versions that are for projects that are in earlier stages of development. All these versions are available from our subversion repository on Sourceforge; see Downloading and installing Kaldi for more details.


Referencing Kaldi in papers

Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//Proc. ASRU. 2011: 1-4.





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值