【计算机科学】【2018】基于深度学习的单麦克风语音增强与分离

在这里插入图片描述
本文为丹麦奥尔堡大学(作者:Morten Kolbæk)的博士论文,共255页。

鸡尾酒会问题包括在复杂的声学环境中聆听和理解语音信号的挑战性任务,在复杂的声学环境中,多个扬声器和背景噪声信号会同时干扰感兴趣的语音信号。在这种复杂的声学环境中,有效提高语音信号可懂度和质量的信号处理算法是非常理想的。特别是在涉及移动通信设备和助听设备的应用中,提高语音清晰度和噪声语音信号的质量已经成为科学家和工程师半个多世纪以来的目标。由于机器学习技术的重新出现,今天被称为深度学习,这种算法所涉及的挑战可能会被克服。

在这篇博士论文中,我们研究了鸡尾酒会问题中的两个主要子学科:单麦克风语音增强和单麦克风多扬声器语音分离的基于深度学习的技术。具体来说,我们对基于深度学习的单麦克风语音增强算法的泛化能力进行了深入的实证分析。结果表明,该算法的性能与训练数据密切相关,通过精心设计训练数据可以获得良好的泛化能力。此外,我们还提出了一种基于深度学习的单麦克风语音分离算法,即全音级置换不变训练(uPIT),并报告了与说话人无关的多说话人语音分离任务的最新结果。此外,我们还发现,uPIT在没有明确的噪声类型或扬声器数量的先验知识的情况下,能够很好地实现联合语音分离和增强。最后,我们证明了基于深度学习的语音增强算法是为了最小化经典的短时谱幅度均方误差而设计的,这使得语音增强信号在短时目标可懂度(STOI)方面基本上是最优的,而短时目标可懂度(STOI)是一种最新的语音可懂度估计器。这是重要的,因为它表明,通过基于深度学习的语音增强算法实现目的是最大化STOI。

The cocktail party problem comprises the challenging task oflistening to and understanding a speech signal in a complex acousticenvironment, where multiple speakers and background noise signalssimultaneously interfere with the speech signal of interest. A signalprocessing algorithm that can effectively increase the speech intelligibilityand quality of speech signals in such complicated acoustic situations is highlydesirable. Especially for applications involving mobile communication devicesand hearing assistive devices, increasing speech intelligibility and quality ofnoisy speech signals has been a goal for scientists and engineers for more thanhalf a century. Due to the re-emergence of machine learning techniques, today,known as deep learning, the challenges involved with such algorithms might beovercome. In this PhD thesis, we study and develop deep learning-basedtechniques for two major sub-disciplines of the cocktail party problem: single-microphone speech enhancement andsingle-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of thegeneralizability capability of modern deep learning-based single-microphonespeech enhancement algorithms. We show that performance of such algorithms is closelylinked to the training data, and good generalizability can be achieved withcarefully designed training data. Furthermore, we propose utterancelevelPermutation Invariant Training (uPIT), a deep learning-based algorithm forsingle-microphone speech separation and we report state-of-the-art results on aspeaker-independent multi-talker speech separation task. Additionally, we showthat uPIT works well for joint speech separation and enhancement withoutexplicit prior knowledge about the noise type or number of speakers, which, atthe time of writing, is a capability only shown by uPIT. Finally, we show thatdeep learning-based speech enhancement algorithms designed to minimize theclassical short-time spectral amplitude mean squared error leads to enhancedspeech signals which are essentially optimal in terms of Short-Time ObjectiveIntelligibility (STOI), a state-of-theart speech intelligibility estimator.This is important as it suggests that no additional improvements in STOI can beachieved by a deep learning-based speech enhancement algorithm, which isdesigned to maximize STOI.

  1. 语音增强与分离
  2. 深度学习
  3. 用于增强与分离的深度学习研究
  4. 科学贡献
  5. 未来研究方向
    附录A 通用和专用深度神经网络语音增强系统的语音可懂度潜力
    附录B 基于长短期记忆的递归神经网络语音增强在抗噪说话人验证中的应用
    附录C 独立说话人语音分离深度模型的置换不变训练
    附录D 基于深度递归神经网络的多人语音分离
    附录E 基于递归神经网络和置换不变训练的带噪多人语音联合分离与去噪
    附录F 基于短时目标可懂度测量的深度神经网络单声道语音增强
    附录G 短时目标可懂度与短时谱幅度均方误差的关系

更多精彩文章请关注公众号:在这里插入图片描述

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值