Spectral Subtraction

Spectral Subtraction is a way to reduce audio noise.Spectral information required to describe the noise spectrum is obtained from the signal measured during nonspeech activity.So,we need get some nonspeech frames to define noise spectrum,It’s about:

D ( w ) = P s ( w ) − P n ( w ) P s ′ ( w ) = { D ( w ) if D(w)>0 0 otherwise D(w) = P_s(w) - P_n(w) \\ P_s'(w) = \begin{cases}D(w)& \text{if D(w)>0}\\ 0& \text{otherwise}\end{cases} D(w)=Ps(w)Pn(w)Ps(w)={D(w)0if D(w)>0otherwise

In that, P s ( w ) P_s(w) Ps(w) is speech spectrum with noise, P n ( w ) P_n(w) Pn(w) is noise spectrum from the signal measured during nonspeech activity. P s ′ ( w ) P_s'(w) Ps(w) is the modified signal spectrum.This isn’t a good way. When environment noise changes, P n ( w ) P_n(w) Pn(w) will not be noise spectrum in new environment.

A major problem with above implementation of the spectral noise subtraction method has been that a ‘new’ noise appears in the processed speech signal.

Our modification to the noise subtraction method consists in minimizing the perception of the narrow spectral peaks by decreasing thr spectral excursions.This is done by changing the algorithm in the following:

D ( w ) = P s ( w ) − α P n ( w ) P s ′ ( w ) = { D ( w ) , i f D ( w ) > β P n ( w ) β P n ( w ) , o t h e r s i z e w i t h      α ≥ 1      a n d      0 < β ≪ 1 D(w) = P_s(w)-\alpha P_n(w) \\ P_s'(w) =\begin{cases}D(w),if D(w)>\beta P_n(w) \\ \beta P_n(w) ,othersize \end{cases} \\ with \ \ \ \ \alpha \geq 1 \ \ \ \ and \ \ \ \ 0<\beta \ll 1 D(w)=Ps(w)αPn(w)Ps(w)={D(w)ifD(w)>βPn(w)βPn(w)othersizewith    α1    and    0<β1

Where α \alpha α is the subtraction factor and β \beta β is the spectral floor parameter.The modified method is shown in the following figure.

在这里插入图片描述

In practice,we have found that at S N R = 0 d B SNR=0dB SNR=0dB,a value of α \alpha α in the range 3 to 6 is adequate,with β \beta β int the range 0.005 to 0.1.A large value of α \alpha α,such as 5, should not be alarming. This is equivalent to assuming that the noise power to be subtracted is about 7 dB higher than the smoothed estimate. This “inflation” factor represents the fact that, at each frame, the variance of the spectral components of the noise is equal to the noise power itself. Hence, one must subtract more than the expected value of the noise spectrum (the smoothed estimate) in order to make sure that most of the noise peaks have been removed.

In order to reduce the speech distortion caused by large values of α \alpha α, we decides to let α \alpha α vary from frame to frame within the same sentence.To understand the rationale behind doing so,consider the graph of following figure.

在这里插入图片描述

The SNR is estimated at each frame from knowledge of the noise spectral estimate and the energy of the input speech.At each frame,the actual value of α \alpha α used is gived by:

α = α 0 − ( S N R ) / s f o r      − 5 ≤ S N R ≤ 20 \alpha = \alpha _0 - (SNR)/s \\ for \ \ \ \ -5 \leq SNR \leq 20 α=α0(SNR)/sfor    5SNR20

Where α 0 \alpha _0 α0 is the desired value of α \alpha α at S N R = 0 d B SNR=0dB SNR=0dB,SNR is the estimated segmental signal-to-noise ratio and 1 / s 1/s 1/s is the slope of the above line(For example, for α = 4 \alpha = 4 α=4, s = 20 / 3 s=20/3 s=20/3).We found that using a variable subtraction reduces the speech distortion somewhat.If the slope( 1 / s 1/s 1/s) is too large,however,the temporal dynamic range of the speech becomes too large.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值