ctc学习论文

History

ICML-2006. Graves et al. [1] introduced the connectionist temporal classification (CTC) objective function for phone recognition. 
ICML-2014. Graves [2] demonstrated that character-level speech transcription can be performed by a recurrent neural network with minimal preprocessing. 
Baidu. 2014 [3] DeepSpeech, 2015 [4] DeepSpeech2. 
ASRU-2015. YaJie Miao [5] presented Eesen framework. 
ASRU-2015. Google [6] extended the application of Context-Dependent (CD) LSTM trained with CTC and sMBR loss. 
ICASSP-2016. Google [7] presented a compact large vocabulary speech recognition system that can run efficiently on mobile devices, accurately and with low latency. 
NIPS-2016. Google [8] used whole words as acoustic units. 
2017, IBM [9] employed direct acoustics-to-word models.

Reference

[1]. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classfification: labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006. 
[2]. Graves, Alex and Jaitly, Navdeep. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772, 2014. 
[3]. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G.,Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates,A., et al. (2014a).Deepspeech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567. 
[4]. D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” CoRR arXiv:1512.02595, 2015. 
[5]. Yajie Miao, Mohammad Gowayyed, Florian Metze. EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding. 2015 Automatic Speech Recognition and Understanding Workshop (ASRU 2015) 
[6]. A. Senior, H. Sak, F. de Chaumont Quitry, T. N. Sainath, and K. Rao, “Acoustic Modelling with CD-CTC-SMBR LSTM RNNS,” in ASRU, 2015 
[7]. I. McGraw, R. Prabhavalkar, R. Alvarez, M. Gonzalez Arenas, K. Rao, D. Rybach, O. Alsharif, H. Sak, A. Gruenstein, F. Beaufays, and C. Parada, “Personalized speech recognition on mobile devices,” in Proc. of ICASSP, 2016. 
[8]. H. Soltau, H. Liao, and H. Sak, “Neural speech recognizer: Acoustic-to-word lstm model for large vocabulary speech recognition,” arXiv preprint arXiv:1610.09975,2016. 

[9]. K. Audhkhasi, B. Ramabhadran, G. Saon, M. Picheny, D. Nahamoo, “Direct Acoustics-to-Word Models for English Conversational Speech Recognition” arXiv preprint arXiv:1703.07754,2017.


参考文献:https://blog.csdn.net/xmdxcsj/article/details/70300591

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值