Spectral Subtraction

最新推荐文章于 2024-09-13 20:49:38 发布

FxingZh

最新推荐文章于 2024-09-13 20:49:38 发布

阅读量155

点赞数

分类专栏：随笔文章标签：信号处理音频编码解码算法

本文链接：https://blog.csdn.net/weixin_42762173/article/details/125594997

版权

随笔专栏收录该内容

16 篇文章 7 订阅

订阅专栏

Spectral Subtraction is a way to reduce audio noise.Spectral information required to describe the noise spectrum is obtained from the signal measured during nonspeech activity.So,we need get some nonspeech frames to define noise spectrum,It’s about:

$P_s(w) - P_n(w) \\ P_s'(w) = \begin{cases}D(w)& \text{if D(w)>0}\\ 0& \text{otherwise}\end{cases}$

In that, $P_s(w)$ is speech spectrum with noise, $P_n(w)$ is noise spectrum from the signal measured during nonspeech activity. $P_s'(w)$ is the modified signal spectrum.This isn’t a good way. When environment noise changes, $P_n(w)$ will not be noise spectrum in new environment.

A major problem with above implementation of the spectral noise subtraction method has been that a ‘new’ noise appears in the processed speech signal.

Our modification to the noise subtraction method consists in minimizing the perception of the narrow spectral peaks by decreasing thr spectral excursions.This is done by changing the algorithm in the following:

$P_s(w)-\alpha P_n(w) \\ P_s'(w) =\begin{cases}D(w)，if D(w)>\beta P_n(w) \\ \beta P_n(w) ，othersize \end{cases} \\ with \ \ \ \ \alpha \geq 1 \ \ \ \ and \ \ \ \ 0<\beta \ll 1$

Where $\alpha$ is the subtraction factor and $\beta$ is the spectral floor parameter.The modified method is shown in the following figure.

在这里插入图片描述

In practice,we have found that at $S N R = 0 d B$ ,a value of $\alpha$ in the range 3 to 6 is adequate,with $\beta$ int the range 0.005 to 0.1.A large value of $\alpha$ ,such as 5, should not be alarming. This is equivalent to assuming that the noise power to be subtracted is about 7 dB higher than the smoothed estimate. This “inflation” factor represents the fact that, at each frame, the variance of the spectral components of the noise is equal to the noise power itself. Hence, one must subtract more than the expected value of the noise spectrum (the smoothed estimate) in order to make sure that most of the noise peaks have been removed.

In order to reduce the speech distortion caused by large values of $\alpha$ , we decides to let $\alpha$ vary from frame to frame within the same sentence.To understand the rationale behind doing so,consider the graph of following figure.

在这里插入图片描述

The SNR is estimated at each frame from knowledge of the noise spectral estimate and the energy of the input speech.At each frame,the actual value of $\alpha$ used is gived by:

$\alpha = \alpha _0 - (SNR)/s \\ for \ \ \ \ -5 \leq SNR \leq 20$

Where $\alpha _0$ is the desired value of $\alpha$ at $S N R = 0 d B$ ,SNR is the estimated segmental signal-to-noise ratio and $1 / s$ is the slope of the above line(For example, for $\alpha = 4$ , $s = 20 / 3$ ).We found that using a variable subtraction reduces the speech distortion somewhat.If the slope( $1 / s$ ) is too large,however,the temporal dynamic range of the speech becomes too large.