Demystifying Echo Cancellation: Part 2(回声消除揭秘2)
Building an echo canceller is only half the challenge. Mishaps can also occur during the implementation stage. Here's a look at some of the implementation challenges designers will face and some solutions to the problems.
|By Alexey Frunze, Spirit Corp. |
A 09, 2003
In Part 1 of this article, we examined the basic elements that cause echo in a networking system design. We also described some of the basic echo cancellation architectures available to designers. Now, in Part 2 below, we'll describe the most typical design and integration mistakes that lead to failures in achieving echo cancellation. That is, it can be not just a question of achieving good echo cancellation, but instead it can be a question of achieving the echo cancellation at all. We'll also look at some testing issues designers should consider when implementing an echo canceller.
Nonlinear Distortions in Hardware
The first thing, which can lead to echo canceller performing very poorly, is the nonlinear distortions in the echo path of the hardware of your device. The echo cancellers perform poorly or don't work at all in systems with the net nonlinear distortions in the echo path higher than -16 dB (typical value). Thus, the smaller the distortion, the better a system will perform.
Nonlinear distortions exist everywhere. Certain nonlinearity is inherent in the hybrids, microphones, speakers, amplifiers, and codecs. It is not recommended to make the design with parts, which are highly nonlinear such that the net nonlinear distortions of the device in the echo path are prohibitively high. If there's some preliminary design available, already in a form of a working device, it is a good practice to measure the level of the nonlinear distortions in it. The sooner such measurements are done, the better.
Usually, in the systems, which are not strictly digital (e.g. those involving use of analog circuitry and transmitting analog signals anywhere inside), there's an analog-to-digital or digital-to-analog converter (ADC or DAC) available so the echo canceller implemented on a DSP can work with samples. The path between the DAC output and the ADC input is made up of analog circuitry, which is subject to nonlinearities.
If we're talking about using an AEC in some hands-free system, then the analog circuitry in question will include the following: microphone, microphone amplifier, the ADC/DAC itself, the loudspeaker amplifier, and the loudspeaker. This entire echo path must be tested. An easy test for this would be feeding a test signal as samples to the DAC so the speaker would produce it and recording samples from the ADC, e.g. recording what the microphone is picking. The recording should then be analyzed.
To test the recording for nonlinear distortions designers can use a test signal, consisting of two sinusoidal signals (for example, tones of 300 and 1800 Hz could be used). The recording must also contain these two frequencies but there will always be other frequencies in the spectrum of the recorded signal because of the tones undergoing nonlinear distortions in the aforementioned hardware. If you run this recording through spectrum analyzing software you will see all of these frequencies. Obviously, due to the distortions, there will be harmonics of each tone, e.g. 2*300=600 Hz and 2*1800=3600 Hz, and there will also be combinations of the two frequencies, e.g. sum and difference: 300+1800=2100 Hz and 1800-300=1500 Hz.
We just showed second-order nonlinearity. If the nonlinearity is of a higher order, which is typically the case in real-world designs, then there will be many more frequencies in the recording. The main thing here is that the amplitudes of the original frequencies (e.g. 300 Hz and 1800 Hz) must exceed the amplitudes of all other frequencies by at least 16 dB (or equivalently, the absolute ratio of the amplitudes must be greater than 1016/20=6.31).
The test presented above is a very simple test and while it can reveal certain nonlinearities, a more thorough method should be used to measure the nonlinear distortions, please see the ITU-T recommendation O.42 for more information on this.
You should perform a similar test to make sure your hardware is OK in terms of nonlinear distortions. If the testing shows that nonlinear distortions are a problem, designers must find and fix the problem before thinking of any AEC integration. To find whether or not the problem is in the ADC/DAC, designers should use the converter's analog loop-back mode when doing this test. Beware, the problem may have to do with incorrect ADC/DAC programming.
Also, the nonlinear distortions can be a result of limiting (clipping) the signal in either ADC/DAC or elsewhere, for example, in the amplifiers. If by expecting the waveform, which is recorded from an ADC, designers see that many of the samples values reach their minimum and maximum values, they must then attenuate the signal somewhere so the clipping doesn't occur.
Note that there can be interference between the digital and analog parts in the device. The interference may be in form of additive noise superimposed on the Vcc if the power supply is overloaded or there is no good power supply decoupling. The decoupling capacitors must be placed as close to the power supply pins of the chips as possible.
Microphone, Speaker Replacement
As already mentioned, the acoustic echo exists between the loudspeaker and the microphone in hands-free phones inside their cases. The echo can be transmitted by both the air inside the case and by the case itself in a form of mechanical waves (vibrations) in the case parts. To reduce this form of echo, there should be a good acoustic decoupling between the loudspeaker and the microphone.
To solve this problem, the microphone should be acoustically and mechanically insulated by a soft material, absorbing the case vibrations and sound coming out of the speaker. The microphone should not be directed to the speaker. It can be useful to have a directional microphone, so it can be directed away from the speaker.
Another important thing is the external echo path (e.g. outside the phone's case). The external echo path is actually a number of different echo paths due to the room objects reflecting the speaker's sound back to the microphone. It has also been noted that these echo paths vary with time as the objects or people move in the room. The changes in the echo path impulse response cause an increase in the residual echo error signal. This forces the AEC to start adapting to the new impulse response and it can even diverge, if the changes are fast or abrupt. In the installed phone, the speaker and microphone should not be directed to the path that is subject to fast changes. It is usually better to direct the speaker and microphone towards the ceiling since this echo path changes rarely.
I/O Signal Requirements
Besides the main hardware questions like the nonlinear distortions, there are also certain input/output (I/O) requirements on the signals, which are fed as samples to the echo cancellers.
The first requirement is that signal delays in software be as short as possible. In general, there should be no signal processing done between the codec and the echo canceller. And there must be no sample accumulation without any good reason for it. Excessive buffering will increase the effective signal delay in the echo path and therefore the utilization of the filter coefficients will be ineffective (some of the coefficients will have to cover the additional delays yet they will be zeroes). Obviously, the reference signal, y(i), delay in software must be smaller than or equal to the software delay for the signal with echo, x(i)+r(i). If this is not true, the echo canceller will not be able to converge and cancel the echo since it doesn't have the reference signal, which is to be subtracted.
Attention must be paid to the signal delays in the software in another respect. All of the delays must be constant throughout the entire session, in which echo cancellation is desirable. Changing the delays during a phone call will cause the echo canceller to diverge and stop canceling the echo until it converges again.
It is also possible to have other problems with signals in software. Echo cancellers usually process linear pulse code modulation (PCM) samples, while the signals in memory or received from the codecs may be compressed to A-law or μ-law samples. Make sure the echo canceller is receiving the samples in the format, which it was designed for. And don't artificially clip the samples on the way between the echo canceller and codec. This all will only contribute to the undesired nonlinear distortions.
Incorrect Codec Synchronization
The last, but not least, problem with echo canceller integration can be again due to the hardware or software design mistakes. What is the problem of incorrect codec synchronization? Well, the problem is easy to understand and relatively easy to solve, provided we know the right solution to it.
Suppose we have a device, which has several different signal sources, each clocked at a different rate, and the signals from one must go through the device to the other. Where is this possible? This is possible in hands-free phones, which have a pair codecs. One of the codecs is used to interface to the phone line and the other one is used to interface to the loudspeaker and microphone.
The problem here is that if both codecs are clocked at different rates (say both have sampling rate at about 8 kHz but they're not exactly equal because they're clocked from different quartz oscillators), then we can't just take each sample from one codec, somehow process it, and pass to the other codec. Eventually, the sampling rate difference will lead to either sample accumulation somewhere in the sample buffers or sample depletion, e.g. there will be nothing to take out of a buffer when a sample is needed.
The first solution is to choose the codecs such that they're clocked from the same clock source, the same quartz oscillator. This is the best solution to the problem and with little provision on the hardware design stage the problem can be completely eliminated. Even if the specifics of the application does not allow for use of the same codecs in both places, it is still better to have the same clock source for both because this will make it possible to use sample rate conversion with a constant upsampling and downsampling ratio and there will be no synchronization issues.
But if codec synchronization via the same clock source is not possible to achieve (as is the case with ISDN phones, where the data rate is not anyhow related to the codec clock), then some different solution is needed. Often the engineers are tempted to solve this problem using one of the following solutions:
- Continuously tuning the codec's sampling rate
- Dropping samples received from the codec and repeating samples to be sent when there's nothing to send
But experience and logical reasoning proves these solutions wrong as they fail to solve the problem they're supposed to. The first solution is not viable because it incurs additional nonlinear distortions in the echo path and also effectively changes the echo path delay. The second solution is not viable because using such an approach we will be abruptly changing the echo path delay. Changing the echo path reduces the quality of echo cancellation and can even force the echo canceller diverge if the residual echo error becomes too big.
The worst case is the double-talk situation, e.g. when both the near and far-end talker signals are present. In such situations the echo canceller usually doesn't adapt the filter coefficients or adapts them very slowly. If the echo path remains constant during the double-talk, the echo canceller performs well, but if the echo path changes, the echo canceller will not be able to adapt to these changes and it will diverge. So, if we want the echo cancellers to operate, we can't use any of these non-solutions.
A solution to this problem is an adaptive sample rate converter, or simply an adaptive interpolator. This converter must be placed between the codecs (or the codec and the ISDN interface) [Figure 6].
Actually, as Figure 6 points out, there are two converters of them needed—one for each signal direction. The interpolator should be initially tuned to do upsampling or downsampling from one frequency to another if they're known to be different (for example, they can be 8 kHz and 9.6 kHz, so the interpolator will know what interpolation is done). As the time goes, it is possible to see the actual rate at which each codec transfers samples. The difference of the rates can be used as a feedback to adapt the interpolator to the actual ratio of the sampling rates.
The interpolation solution is schematically shown in Figure 6 above. As this figure shows, it is important that the echo cancellers be connected to the codecs directly and running at the rate of the codec they're associated with. Placing interpolators between the codec and associated echo canceller will turn this solution into a non-solution described earlier.
Testing Echo Cancellers
It is a good practice for the customer to ask the echo canceller algorithm supplier how well their echo canceller conforms to the appropriate ITU-T recommendations (which are de-facto standards) and provide these figures alone with the resource requirements so a right decision can be made when choosing an echo canceller. The related ITU-T recommendations are: G.168 for LECs and G.167 for AECs.
It is beneficial for the customer to understand the basics of the echo cancellation and maybe even be familiar with the listed recommendations, however, it always makes sense to make a few tests of the echo canceller of interest. If a live test is possible, which is very desirable for AECs, it is good to make it.
Echo canceller suppliers should provide a test or demo suit and a few test waveforms (the reference signal y(i) and the signal with the echo, x(i)+r(i) as per Figure 5 in Part 1), on which the echo canceller can be tested. Such a test can be carried out on either a PC or the customer's target hardware, whichever is arranged. This ensures the echo canceller operation and the suit can also be used to test the echo canceller performance on specific waveforms if the customer has any concerns about particular cases. It's also a good thing to test double-talk performance of the echo canceller to make sure the quality is delivered to the end users.
By the time the echo canceller integration is about to start, the hardware of the target device must have a sufficiently low level of nonlinear distortions. Only after having fixed all of the hardware problems, the echo canceller integration should begin. As soon as the echo canceller integration is finished, the echo canceller test can be repeated in full real-time with true I/O instead of file processing. Should there be any quality problems, the hardware and software must be checked against possible violations of the requirements imposed by the echo canceller, which have been stated earlier.
As we have seen in Part 1 and 2 of this article, there are many possible problems that can arise from designing and implementing an echo canceller in a networking design. But there is no black art or any other magic behind the failures. The reasons for them are well known and perfectly consistent with the echo canceller internal organization and requirements. To prevent delays in the development and reduce the costs, consider designing the system to meet the requirements at the very beginning. Redesigning the whole system at the middle or last stage because of not meeting the requirements will be expensive.
Solid understanding of the basics of the echo cancellation and meeting the general requirements imposed by echo cancellers will avoid all of the echo canceller problems and therefore shorten the development time and product costs, which is always desirable.
Editor's Note: To view Part 1, click here.
About the Author
Alexey Frunze is a senior engineer at Spirit Corp. He has an MS in Physics from Moscow State Pedagogical University and completed graduate studies at the University of Rochester. Alexey can be reached at firstname.lastname@example.org.