经典回声消除算法学习笔记

西岸行者

已于 2024-04-19 14:29:07 修改

阅读量1k

点赞数 1

文章标签：算法学习

于 2023-03-26 19:51:31 首次发布

本文链接：https://blog.csdn.net/golfbears/article/details/124076677

版权

本文是一篇关于回声消除技术的学习笔记，涵盖了ERLE（回声回路损耗增强）的概念，线性与非线性回声的区别，腔体影响和回声延时的影响，时域与频域自适应滤波方法，双耦合滤波器，以及双讲检测。文章还探讨了开源项目如WebRTC_AEC、AECM和AEC3的实现，并提到了Kalman滤波器在回声消除中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

为什么三番五次滴研究回声消除，用一句经典“因为山就在那儿。”（“Because it’s there.”）来形容最为贴切，

人类第一次登上海拔8848.43米的珠穆朗玛峰壮举的，也许不是有记录的1953年5月29日著名登山家新西兰人埃德蒙•希拉里以及尼泊尔的夏尔巴人丹增•诺尔盖，而是1924年6月8日在第二次冲顶珠峰时壮烈牺牲的英国著名登山家乔治.马洛里，不仅如此，马洛里在回答记者“为何想要攀登珠穆朗玛峰”提问时，他回答“因为山就在那里！”，更让人无不钦佩，赞叹。《from百度百科》

回声消除或者说抑制是经典语音处理的珠峰，每次尝试攀登都会有不同的感觉，也会有不同的收获，这篇笔记就记录一下所收集的一些零散的背景知识，试图建立一种知识体系，当你面对各种回声的时候，不至于手足无措；当你需要研究相关术语或方法的时候有的放矢。这是综述性的笔记，概念比较分散。

ERLE

ERL是一个术语，可以翻译成回路损耗。那么加了enhancement（增益）有何解。在搜索引擎的帮助下，看到了 10.ERLE,PESQ 回声消除评价指标给出了一个算法，起初觉得这个公式叫做ERL更为合适，后来有翻阅英文网站echo-cancellation-part-1-the-basics-and-acoustic-echo-cancellation，终于算是理解了其中滋味，其实这个公式对两者都是用，而差异是在echo回路的哪个节点来看的

正如这篇文章所讲，ITU’s specifications 要求ERL大于6dB的线性回声都能够被抑制。直接粘贴另一端论述，不翻译了。

Estimating Echo Return Loss and Echo Return Loss Enhancement
Acoustic echo cancellation (AEC) is a signal processing technique 
that is used to achieve echo-free full-duplex communication in a 
telecommunications system that has acoustic coupling between 
the loudspeaker and microphone. The difficulty with AEC over line 
echo cancellation (LEC) is the variability not only in the echo path, 
but also in the implementation. For LEC systems, the coupling 
resulting from the hybrid is relatively steady between implementations. 
Whereas for AEC systems, the coupling between the loudspeaker and 
microphone can vary significantly depending on the design of the loudspeaker
 and microphone enclosure as well as the acoustics of the room in which 
 the device is deployed. Therefore, in order to achieve an ubiquitous solution 
 for an acoustic system, intelligent control of the adaptive filter is required 
 for the echo canceller as well as the post-filter.
In Variable Stepsize and Regularization Parameters for NLMS, it was shown that 
the performance of the echo canceller can be improved with variable step-size control. 
Optimum control of the step-size parameter is based on the convergence state of the 
canceller. In Post Filtering for Residual Echo Control it was concluded that a post-filter 
can be designed to reduce the residual echo from the linear adaptive filter. Optimum 
control of the post-filter is based on the estimate of this residual, which in turned is 
based on the convergence state of the canceller. In addition, systems which employ 
the two-path method require an estimate of the convergence of the foreground and 
background filters to decide which filter set is in the most beneficial state. From the 
three examples above, it is clear that the ability to obtain a quick and accurate 
estimate of convergence of the acoustic echo canceller is crucial to the performance of the entire  system.
To obtain an estimate of convergence or the Echo Return Loss Enhancement (ERLE), 
one must first estimate the coupling factor or the Echo Return Loss (ERL) of the 
loudspeaker-microphone enclosure. An estimate of the ERL is required to determine 
how much attenuation can be attributed to the echo path and how much can be attributed
 to the echo canceller. The coupling factor determines the attenuation or possible gain in the path.
There are two main approaches to estimating the coupling factor of an echo canceller. The first 
method is amplitude based while the second is cross-spectrum based. The amplitude based 
method to estimate ERL is the average spectral energy of the near-end signal over the average
 spectral energy of the far-end signal. This approach should only be updated during periods of 
 known far-end signal energy and should not be updated during periods of double-talk. In the
  cross-spectrum based method, the far-end and near-end spectrum signals are multiplied and 
  summed over a long period of frames. Then it is normalized by the far-end signal energy. 
  This method is unaffected by double-talk of the near-end speaker and far-end speaker 
  as long as they are uncorrelated. The downside to this method is the echo path changes 
  are not followed accurately due to the long averaging period. However using a combination 
  of the two methods will allow for quick and accurate estimation of the ERL, and hence, 
  proper control of the entire echo cancellation system.

线性回声和非线性回声

线性回声和非线性回声，这两个概念一直是回声消除领域经常提及的词汇，非线性声学回声消除技术一文中对两者的关系和引入阶段做了非常明确的概述。除此之外，还可以先果后因的理解：能被自适应滤波器消除的，都可以理解为线性回声，而一般把残留的那部分被称为非线性回声，偷懒吧。。。

腔体影响和回声延时

这里实践性的东西更多一些，但好的经验和设计对回声抑制是绝对起到积极作用的，这里要积累的还很多。

时域自适应滤波

再ANC 与 adaptive filter做过一些笔记，过于复杂的算法暂时没有研究。

频域自适应滤波

webrtc的aec中的算法WebRtc AEC核心算法之一：频域自适应滤波据说是计算效率最优的一个方案，也许孤陋寡闻，但确实没接触过更好的了。

双耦合滤波器

非线性声学回声消除技术提到了这个方法，好像speex中也用了双滤波器方案，但不是双耦合方式。

双讲（通）判断

双讲场景是回声消除过程的难点，处理好了也是亮点，处理不好就是灾难。Double Talk Detection 即双讲检测是处理双讲场景的第一关。用双讲把回声消除的场景分成如下四个象限：

Senario	near noise	near talk
far noise	Double Noise:None filter	FN-NT：None filter
far talk	FT-NT: Adapt filter updated	Double Talk: filter without tuning adaptive coefficient

Double-Talk Detection in Echo Cancellation 一文中总结了经典的两种DTD方法，以Geigel Algorithm为例的基于能量比较的方法和基于Cross correlation的向量比较方法。还有一种利用双耦合滤波器的跟踪滤波器发散场景反向推算双讲出现。

开源世界中的回声消除

大名鼎鼎的webrtc和speex让众多从业者很容易接触到核心算法，但这些理解起来也不是非常容易，下面简单的将aecm中用到的一些流程，画些框图，帮助阅读代码的时候快速理解。

AECM

主要调用函数的联系图：
在这里插入图片描述 buffer管理的关系图：

其中aecm energy calculation的框图如下
在这里插入图片描述这个算法的剖析在网上有很多文章，推荐WEBRTC-AECM算法浅析和LearningWebRTC: AECM两篇大作，但对比试验以及分析下来，这个算法可能有点问题，存在改进空间。

WebRTC_AEC和speex

这两个算法可能是被用来对比和porting最多的互联网神器，把他们放在一起来说，也是因为他们的技术特点很多相似之处，WebRTC_AEC来自《On the implementation of a partitioned block frequency domain adaptive filter (PBFDAF) for long acoustic echo cancellation》 Jose M. Paez Borrallo.etc，简称PBFDAF，speex来自《Multidelay block frequency domain adaptive filter》J.-S. Soo; K.K. Pang。可以看出两者都是block frequency domain adaptive filter范畴，非权威调研根系是下面的两篇文章，1978年Dentino等在《Adaptive filtering in the frequency domain》提出了频域滤波器，紧接着Ferrara在1980年的《Fast implementations of LMS adaptive filters》提出了频域最小均方差方法使得频域地维纳解实现了快速收敛。几年前曾经硬（着头皮）读过webrtc的aec，做过笔记，但speex却没勇气看下去了。【论文笔记之 MDF】Multidelay Block Frequency Domain Adaptive Filter这篇文章写的很有深度，结合论文代码学习的话一定受益匪浅。Speex 一个双声道回声消除的小demo还有一个手把手教程，就着[简话语音识别] 语音前端信号处理——回声消除算法这道菜一起看，想给实践的提供更快捷通道。