Comparing a High and Low-Level Deep Neural Network Implementation for Automatic Speech Recognition

  本文开篇介绍深度学习的发展是由于GPU的发展,以及有很多well-suited 的算法;之后介绍paper的结构。

  我认为本文最重要的在于其对解码和alignment的理解:

  (1)关于解码:Once the acoustic and language models are trained , a decoding process can be applied to unknown speech to generate transcripts. Formally , we want to find the most likely sequence of words w*for a set of observations o:w* = argmax p(o|w)p(w).Decoding is not limited to words;phone decoding can be performed in the same manner.

  (2)关于alignment解释:Applying DNNs to ASR;Human transcriptions consist of words , as opposed to the targets(such as phone-states) needed to train a DNN.Furthermore training requires a target for each frame (once every 10ms in our system), while transcriptions typically only delineate word or segment boundaries. For these reasons ,an alignment is performed prior to training a DNN.During alignment ,words are replaced with sequenses of phones ,which are further divided into phones-state.The phones-state are then time aligned with the audio frames.An external dictionary is useed to provide possible word-to-phone mappings,and an existing GMM-HMM systemsis used for selecting the best phone sequence and determining the time boundaries of the phones-state.(这段很长,需要有一定的概念基础才能理解,所谓的对齐的目的是让我们的语音帧与音素状态对应。在kaldi里面,一般都是在经过两次对齐然后再训练,再让语音帧信号与新生成的模型对齐,这样就得到了更好的标注结果)

  本文对本人对DNN的输入进行了解释:DNNs were trained using 52-component feature vectors consisting of 12 Perceptual Linear Prediction cofficients,along with the zeroth and first ,second ,and third order differentials.Feature extraction was performed using HTK.Feature vectors fron 13 successive frames were combined into a signal DNNinput vectoe of length 676.(这里本人你的理解是采用mSGD算法进行训练,batch为52,然后将mfcc特征进行串联作为DNN的输入) 

  最后总结本文,我认为本文,作者最牛逼的地方并不是他的想法多创新,而是在于他们有先进的设备,以及算法编程能力,将kaldi里面的将近17000行代码硬生生的减少到了800行,而且性能更优。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值