【信息技术】基于基音的鲁棒语音识别技术

在这里插入图片描述

本文为西班牙格拉纳达大学(作者:Juan Andres Morales Cordovilla)的毕业论文,共53页。

本文提出并研究了在噪声环境下利用基音(可以理解为语音的基频)来实现鲁棒自动语音识别(ASR)的各种技术。本文研究的不是基音提取本身,而是利用基音进行鲁棒语音识别的最佳方法。我们研究了相关领域的文献和技术现状,然后,提出了三种基于基音的技术,并与其他类似的技术进行比较。我们的三种技术建议是:将非对称窗应用于噪声信号的自相关(试图提供对噪声不太敏感的频谱)、干净准周期信号自相关的两个估计器(称为平均和筛选估计器)和一个能处理非平稳噪声的噪声估计器技术,该技术利用基音信息估计边缘化MD(Missing Data)识别器所需的可靠性掩码。此外,我们将讨论基于基音的鲁棒ASR技术的性能限制,该技术采用了最小化噪声的假设。为了做到这一点,我们将这些技术用于识别语音帧的基本鲁棒机制,进一步获取最佳实现机制(通过一些等价的方法),并通过应用MD oracle掩码和理想基音实验获得相应的极限结果。我们的一个结论是,用于MD识别的噪声估计技术接近于基于基音的鲁棒ASR技术的极限,尽管它需要额外的信息来实现MD Oracle掩码的性能。最后,我们将从本文提出的观点出发,对未来研究的一些可能性(其中一些与无基音的语音有关)进行评述。

This Thesis proposes and carries out astudy of different techniques which, in some way, use the pitch (which will beunderstood as the fundamental frequency of speech) in order to carry out robustASR (Automatic Speech Recognition) under noise conditions. The Thesis is notconcerned with pitch extraction itself, but with the best way of using pitchfor robust speech recognition. We will also carry out a study of the relatedbibliography and the state of art regarding these pitch-based techniques forrobust ASR. Then, we will propose three pitch-based techniques which will becompared to other similar ones. Our three proposals are: application ofasymmetric windows to the noisy signal autocorrelation which tries to provide aspectrum less sensitive to noise, two estimators, named as averaging andsifting estimators, of the autocorrelation of the clean quasi-periodic signal,and a noise estimation technique which can deal with non stationary noise byemploying pitch information and which is used to estimate the reliability masksrequired by a marginalization MD (Missing Data) recognizer. Additionally, wewill discuss the performance limits of the pitch-based techniques for robustASR which employ minimal assumptions about the noise. In order to do so, wewill identify the basic robust mechanisms employed by these techniques forrecognizing voiced frames, the optimum mechanisms will be identified (by meansof some equivalences), and the corresponding limit results will beexperimentally obtained by applying MD oracle masks and ideal pitch. One of ourconclusions is that our noise estimation technique for MD recognition is closeto the limits of the pitch-based techniques for robust ASR, although it wouldrequire additional information in order to achieve the performance with MDoracle masks. Finally, we will comment some possibilities (some of them relatedto speech without pitch) for future research from the ideas developed in thisThesis.

  1. 引言
  2. 自动语音识别原理
  3. 传统的与基于基音的鲁棒技术
  4. 本文提出的技术
  5. 基音技术的等效与局限性

更多精彩文章请关注公众号:在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值