语音识别数据准备 查看目录下有多少个数据:ls /home/…/ | wc -l查看某个txt的具体内容:ls /home/…/*.txt | head -n 10用file命令查看编码格式是否是utf-8:file /home/…/*.txt查看txt有多少行:wc -l /home/…/*.txtwav.scp:[wav-id][wav-path]生成绝对路径:用find找出所有的绝对路径:find /home/…/ios/wav -iname ‘*.wav’ | head -n 1...
语音增强改进方法总结 1、模型更复杂Mel frequency power spectrum (MFP) was used for speech enhancement in INTERSPEECH 2013 :https://bio-asplab.citi.sinica.edu.tw/paper/conference/lu2013speech.pdfConvolutional maxout neural networks for speech separation:https://ieeexplore.ieee.org/
阅读笔记:基于深层神经网络的语音增强方法(一) 论文:徐勇. (2015). 基于深层神经网络的语音增强方法研究. (Doctoral dissertation).贡献传统语音增强根据前面帧估计噪声,在非平稳噪声的情况下,跟踪和估计噪声的策略失效,同时,为了推导方便,进行了对数据分布的高斯性假设是不合理的,限制了传统算法的上限。基于有监督的语音增强算法,例如浅层人工神经网络的语音增强,受自身规模和数据量的限制,对于不匹配噪声泛化性差。HMM和非负矩阵分解的方法,架设了噪声和语音之间的独立性,限制了语音增强的性能。近年来基于深度学习的语音增强算法,
MOCKINGJAY: UNSUPERVISED SPEECH REPRESENTATION LEARNING WITH DEEP BIDIRECTIONAL TRANSFORMER ENCODERS 文章:MOCKINGJAY: UNSUPERVISED SPEECH REPRESENTATION LEARNING WITH DEEP BIDIRECTIONAL TRANSFORMER ENCODERS作者:Andy T. Liu Shu-wen Yang Po-Han Chi Po-chun Hsu Hung-yi LeeNational Taiwan UniversityGitHub:https://github.com/andi611/Self-Supervised-Speech-Pretr
(IS 15)Convolutional Neural Networks for Small-footprint Keyword Spotting 会议:INTERSPEECH 2015论文:Convolutional Neural Networks for Small-footprint Keyword Spotting作者:Tara N. Sainath, Carolina ParadaAbstract我们探索使用卷积神经网络(CNN)进行小尺寸关键字发现(KWS)任务。 CNN对于KWS具有吸引力,因为它在参数方面要远远优于DN...
(IS 19)On Learning Interpretable CNNs with Parametric Modulated Kernel-based Filters 会议:INTERSPEECH 2019论文:On Learning Interpretable CNNs with Parametric Modulated Kernel-based Filters(基于参数调制的基于核的滤波器学习可解释的CNN)作者:Erfan Loweimi, Peter Bell, Steve RenalsAbstract我们研究了在卷积神经网络(CNN)框架中使用...
(IS 19)Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual b 会议:INTERSPEECH 2019论文:Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders作者:Raghav Menon, Herman Kamper, ...
(IS 19)Automatic Detection of Prosodic Focus in American English 会议:INTERSPEECH 2019论文:Automatic Detection of Prosodic Focus in American English作者:Sunghye Cho, Mark Liberman, Yong-cheol LeeAbstract焦点通常由韵律的突出来调节,突出强调句子中的特定元素以进行强调或对比。尽管它在交流中很重要,但在语音识别领域却很少受到关注。本文...
(IS 19)wav2vec: Unsupervised Pre-training for Speech Recognition 会议:INTERSPEECH 2019论文:wav2vec: Unsupervised Pre-training for Speech Recognition作者:Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael AuliAbstract我们通过学习原始音频的表示,探索语音识别的无监督预训练。 在大量未标记的音频数据上对...
(IS 19)Binary Speech Features for Keyword Spotting Tasks Alexandre Riviello, Jean-Pierre David(重点) 会议:INTERSPEECH 2019论文:Binary Speech Features for Keyword Spotting TasksAlexandre Riviello, Jean-Pierre David作者:Alexandre Riviello, Jean-Pierre DavidAbstract关键字发现是一项分类任务,旨在检测一组特定的口语单词。 通常,此类任务在功耗受...
(IS 19)Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition 会议:INTERSPEECH 2019论文:Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition作者:David B. Ramsay, Kevin Kilgour, Dominik Roblek, Matthew SharifiAbstract低功耗数字信号处理器(DSP)通常具有非常...
(IS 19)Unsupervised Raw Waveform Representation Learning for ASR 会议:INTERSPEECH 2019论文:Unsupervised Raw Waveform Representation Learning for ASR作者:Purvi Agrawal, Sriram GanapathyAbstract在本文中,我们提出了一种在无监督学习范例中使用原始语音波形的深度表示学习方法。提出的深度模型的第一层执行声学滤波,而随后的一层执行调制滤波。使用学习其...
你走的路,每一步都算数 New York is 3 hours ahead of Californiabut it does not make California slowSomeone graduated at the age of 22but waited 5 years before securing a good job!Someone became a CEO at 25and died at 50...
(IS 19)Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech 会议:INTERSPEECH 2019论文:Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech作者:Chenda Li, Yanmin QianAbstract儿童语音识别仍然是自动语音识别的一大挑战。由于处理过程更加困难且数据收集成本较高,因此大多数当前...
(IS 19)Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions 会议:INTERSPEECH 2019论文:Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions作者:Samik Sadhu, Hynek HermanskyAbstract在这项工作中,我们在自动语音识别(ASR)系统中的训练和测试条件之间的域不匹配中,证明了...
(2015)Deep Residual Learning for Image Recognition 会议:CVPR, 2016, pp. 770–778.论文:Deep Residual Learning for Image Recognition作者:Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
(2017)Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting 论文:Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting作者:Raphael Tang, Jimmy LinABSTRACT我们描述了Honk,这是TensorFlow示例中包含的用于关键字识别的卷积神经网络的开源PyTorch重新实现。 这些模型对于识别基于语音的界面(...
(ICASSP 18)DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING(重点) 会议:ICASSP 2018论文:DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING、链接2、GitHub作者:Raphael Tang ; Jimmy Lin
(ICASSP 19)Streaming End-to-end Speech Recognition for Mobile Devices 会议:ICASSP 2019论文:Streaming End-to-end Speech Recognition for Mobile Devices作者:Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yo...