语音增强-------------维纳滤波
原理介绍
时域维纳滤波
若输入信号
y
(
n
)
y\left( n \right)
y(n)和期望信号
d
(
n
)
d\left( n \right)
d(n)是联合广义平稳随机过程,那么系统输出对应的误差可以表示为
e
(
n
)
=
d
(
n
)
−
d
^
(
n
)
=
d
(
n
)
−
h
T
y
\begin{aligned} & e\left( n \right)=d\left( n \right)-\hat{d}\left( n \right) \\ & =d\left( n \right)-{{\mathbf{h}}^{T}}\mathbf{y} \end{aligned}
e(n)=d(n)−d^(n)=d(n)−hTy
其中
h
=
[
h
0
h
1
⋯
h
M
−
1
]
T
\mathbf{h}={{\left[ \begin{matrix} {{h}_{0}} & {{h}_{1}} & \cdots & {{h}_{M-1}} \\ \end{matrix} \right]}^{T}}
h=[h0h1⋯hM−1]T为滤波器的系数,
y
=
[
y
(
n
)
y
(
n
−
1
)
⋯
y
(
n
−
M
+
1
)
]
T
\mathbf{y}={{\left[ \begin{matrix} y\left( n \right) & y\left( n-1 \right) & \cdots & y\left( n-M+1 \right) \\ \end{matrix} \right]}^{T}}
y=[y(n)y(n−1)⋯y(n−M+1)]T为输入的向量。
为了找到最优的滤波器系数,以最小均方误差的准则进行求解
J
=
E
[
e
2
(
n
)
]
=
E
[
(
d
(
n
)
−
h
T
y
)
2
]
=
E
[
d
2
(
n
)
]
−
2
h
T
E
[
y
d
(
n
)
]
+
h
T
E
[
y
y
T
]
h
=
E
[
d
2
(
n
)
]
−
2
h
T
r
y
d
+
h
T
R
y
y
h
\begin{aligned} & J=E\left[ {{e}^{2}}\left( n \right) \right]=E\left[ {{\left( d\left( n \right)-{{\mathbf{h}}^{T}}\mathbf{y} \right)}^{2}} \right] \\ & =E\left[ {{d}^{2}}\left( n \right) \right]-2{{\mathbf{h}}^{T}}E\left[ \mathbf{y}d\left( n \right) \right]+{{\mathbf{h}}^{T}}E\left[ \mathbf{y}{{\mathbf{y}}^{T}} \right]\mathbf{h} \\ & =E\left[ {{d}^{2}}\left( n \right) \right]-2{{\mathbf{h}}^{T}}{{\mathbf{r}}_{yd}}+{{\mathbf{h}}^{T}}{{\mathbf{R}}_{yy}}\mathbf{h} \end{aligned}
J=E[e2(n)]=E[(d(n)−hTy)2]=E[d2(n)]−2hTE[yd(n)]+hTE[yyT]h=E[d2(n)]−2hTryd+hTRyyh
r
y
d
{{\mathbf{r}}_{yd}}
ryd为输入信号与期望信号的互相关,
R
y
y
{{\mathbf{R}}_{yy}}
Ryy为输入信号的自相关矩阵,为了使代价函数
J
J
J最小,对其进行求导可以得到
∂
J
∂
h
k
=
2
E
[
e
(
n
)
∂
e
(
n
)
∂
h
k
]
=
0
,
k
=
0
,
1
,
⋯
,
M
−
1
\frac{\partial J}{\partial {{h}_{k}}}=2E\left[ e\left( n \right)\frac{\partial e\left( n \right)}{\partial {{h}_{k}}} \right]=0,\ \ \ \ k=0,1,\cdots ,M-1
∂hk∂J=2E[e(n)∂hk∂e(n)]=0, k=0,1,⋯,M−1
由于
∂
e
(
n
)
∂
h
k
=
−
y
(
n
−
k
)
\frac{\partial e\left( n \right)}{\partial {{h}_{k}}}\text{=}-y\left( n-k \right)
∂hk∂e(n)=−y(n−k),上式可以化简为
∂
J
∂
h
k
=
−
2
E
[
e
(
n
)
y
(
n
−
k
)
]
=
0
,
k
=
0
,
1
,
⋯
,
M
−
1
\frac{\partial J}{\partial {{h}_{k}}}=-2E\left[ e\left( n \right)y\left( n-k \right) \right]=0,\ \ \ \ k=0,1,\cdots ,M-1
∂hk∂J=−2E[e(n)y(n−k)]=0, k=0,1,⋯,M−1
利用矩阵求导可以得到
∂
J
∂
h
=
−
2
r
y
d
+
2
h
T
R
y
y
=
0
\frac{\partial J}{\partial \mathbf{h}}\text{=}-2{{\mathbf{r}}_{yd}}+2{{\mathbf{h}}^{T}}{{\mathbf{R}}_{yy}}=0
∂h∂J=−2ryd+2hTRyy=0
那么滤波器的最优系数为
h
opt
=
R
y
y
−
1
r
y
d
{{\mathbf{h}}_{\text{opt}}}=\mathbf{R}_{yy}^{-1}{{\mathbf{r}}_{yd}}
hopt=Ryy−1ryd
上式称为Wiener-Hopf方程的解。
频域维纳滤波器
如果考虑系统的输出为
d
^
(
n
)
=
h
(
n
)
∗
y
(
n
)
=
∑
k
=
−
∞
+
∞
h
k
y
(
n
−
k
)
,
−
∞
<
n
<
+
∞
\hat{d}\left( n \right)=h\left( n \right)*y\left( n \right)=\sum\limits_{k=-\infty }^{+\infty }{{{h}_{k}}y\left( n-k \right)},\ \ \ \ -\infty <n<+\infty
d^(n)=h(n)∗y(n)=k=−∞∑+∞hky(n−k), −∞<n<+∞
那么相应的Wiener-Hopf方程为
∑
k
=
−
∞
+
∞
h
k
r
y
y
(
m
−
k
)
=
r
y
d
(
−
m
)
,
−
∞
<
m
<
+
∞
\sum\limits_{k=-\infty }^{+\infty }{{{h}_{k}}{{r}_{yy}}\left( m-k \right)={{r}_{yd}}\left( -m \right)},\ \ \ \ -\infty <m<+\infty
k=−∞∑+∞hkryy(m−k)=ryd(−m), −∞<m<+∞
在频域上面,系统的输出可以写为
D
^
(
ω
)
=
H
(
ω
)
Y
(
ω
)
\hat{D}\left( \omega \right)=H\left( \omega \right)Y\left( \omega \right)
D^(ω)=H(ω)Y(ω)
定义估计误差
E
(
ω
)
=
D
(
ω
)
−
D
^
(
ω
)
=
D
(
ω
)
−
H
(
ω
)
Y
(
ω
)
\begin{aligned} & E\left( \omega \right)=D\left( \omega \right)-\hat{D}\left( \omega \right) \\ & =D\left( \omega \right)-H\left( \omega \right)Y\left( \omega \right) \end{aligned}
E(ω)=D(ω)−D^(ω)=D(ω)−H(ω)Y(ω)
那么均方误差可以写作
J
=
E
{
[
D
(
ω
)
−
H
(
ω
)
Y
(
ω
)
]
∗
[
D
(
ω
)
−
H
(
ω
)
Y
(
ω
)
]
}
=
E
[
∣
D
(
ω
)
∣
2
]
−
H
(
ω
)
E
[
D
∗
(
ω
)
Y
(
ω
)
]
−
H
∗
(
ω
)
E
[
Y
∗
(
ω
)
D
(
ω
)
]
+
∣
H
(
ω
)
∣
2
E
[
∣
Y
(
ω
)
∣
2
]
=
E
[
∣
D
(
ω
)
∣
2
]
−
H
(
ω
)
P
y
d
(
ω
)
−
H
∗
(
ω
)
P
d
y
(
ω
)
+
∣
H
(
ω
)
∣
2
P
y
y
(
ω
)
\begin{aligned} & J=E\left\{ {{\left[ D\left( \omega \right)-H\left( \omega \right)Y\left( \omega \right) \right]}^{*}}\left[ D\left( \omega \right)-H\left( \omega \right)Y\left( \omega \right) \right] \right\} \\ & =E\left[ {{\left| D\left( \omega \right) \right|}^{2}} \right]-H\left( \omega \right)E\left[ {{D}^{*}}\left( \omega \right)Y\left( \omega \right) \right]-{{H}^{*}}\left( \omega \right)E\left[ {{Y}^{*}}\left( \omega \right)D\left( \omega \right) \right] \\ & +{{\left| H\left( \omega \right) \right|}^{2}}E\left[ {{\left| Y\left( \omega \right) \right|}^{2}} \right] \\ & \text{=}E\left[ {{\left| D\left( \omega \right) \right|}^{2}} \right]-H\left( \omega \right){{P}_{yd}}\left( \omega \right)-{{H}^{*}}\left( \omega \right){{P}_{dy}}\left( \omega \right)+{{\left| H\left( \omega \right) \right|}^{2}}{{P}_{yy}}\left( \omega \right) \end{aligned}
J=E{[D(ω)−H(ω)Y(ω)]∗[D(ω)−H(ω)Y(ω)]}=E[∣D(ω)∣2]−H(ω)E[D∗(ω)Y(ω)]−H∗(ω)E[Y∗(ω)D(ω)]+∣H(ω)∣2E[∣Y(ω)∣2]=E[∣D(ω)∣2]−H(ω)Pyd(ω)−H∗(ω)Pdy(ω)+∣H(ω)∣2Pyy(ω)
其中,
P
y
d
(
ω
)
{{P}_{yd}}\left( \omega \right)
Pyd(ω)为
y
(
n
)
y\left( n \right)
y(n)与
d
(
n
)
d\left( n \right)
d(n)的互功率谱,
P
y
y
(
ω
)
{{P}_{yy}}\left( \omega \right)
Pyy(ω)为
y
(
n
)
y\left( n \right)
y(n)的功率谱。对误差进行求导,可得
∂
J
∂
H
(
ω
)
=
H
∗
(
ω
)
P
y
y
(
ω
)
−
P
y
d
(
ω
)
=
[
H
(
ω
)
P
y
y
(
ω
)
−
P
d
y
(
ω
)
]
∗
=
0
\begin{aligned} & \frac{\partial J}{\partial H\left( \omega \right)}={{H}^{*}}\left( \omega \right){{P}_{yy}}\left( \omega \right)-{{P}_{yd}}\left( \omega \right) \\ & ={{\left[ H\left( \omega \right){{P}_{yy}}\left( \omega \right)-{{P}_{dy}}\left( \omega \right) \right]}^{*}} \\ & =0 \end{aligned}
∂H(ω)∂J=H∗(ω)Pyy(ω)−Pyd(ω)=[H(ω)Pyy(ω)−Pdy(ω)]∗=0
而
P
y
d
(
ω
)
=
P
d
y
∗
(
ω
)
{{P}_{yd}}\left( \omega \right)=P_{dy}^{*}\left( \omega \right)
Pyd(ω)=Pdy∗(ω),因此频域维纳滤波器的解为
H
(
ω
)
=
P
d
y
(
ω
)
P
y
y
(
ω
)
H\left( \omega \right)=\frac{{{P}_{dy}}\left( \omega \right)}{{{P}_{yy}}\left( \omega \right)}
H(ω)=Pyy(ω)Pdy(ω)
从上面的推导过程可以看出,频域维纳滤波器的解是时域维纳滤波器的解经过傅里叶变换得到的。
对于语音信号来讲,其接收的模型为
x
(
n
)
=
s
(
n
)
+
d
(
n
)
x\left( n \right)=s\left( n \right)+d\left( n \right)
x(n)=s(n)+d(n)
那么维纳滤波器就是要产生对
s
(
n
)
s\left( n \right)
s(n)的一个估计
假设信号与噪声互不相关且具有零均值,那么可以得到时域维纳滤波器为
h
∗
=
(
R
s
s
+
R
d
d
)
−
1
r
s
s
{{\mathbf{h}}^{*}}={{\left( {{\mathbf{R}}_{ss}}+{{\mathbf{R}}_{dd}} \right)}^{-1}}{{\mathbf{r}}_{ss}}
h∗=(Rss+Rdd)−1rss
可以将其重新写为
h
∗
=
(
I
S
N
R
+
R
^
d
d
−
1
R
^
s
s
)
−
1
R
^
d
d
−
1
R
^
s
s
u
{{\mathbf{h}}^{*}}={{\left( \frac{\mathbf{I}}{SNR}+\mathbf{\hat{R}}_{dd}^{-1}{{{\mathbf{\hat{R}}}}_{ss}} \right)}^{-1}}\mathbf{\hat{R}}_{dd}^{-1}{{\mathbf{\hat{R}}}_{ss}}\mathbf{u}
h∗=(SNRI+R^dd−1R^ss)−1R^dd−1R^ssu
其中,
S
N
R
=
E
{
s
2
(
n
)
}
E
{
d
2
(
n
)
}
=
σ
s
2
σ
d
2
SNR=\frac{E\left\{ {{s}^{2}}\left( n \right) \right\}}{E\left\{ {{d}^{2}}\left( n \right) \right\}}=\frac{\sigma _{s}^{2}}{\sigma _{d}^{2}}
SNR=E{d2(n)}E{s2(n)}=σd2σs2为信噪比,
I
\mathbf{I}
I为单位矩阵,
u
=
[
1
0
⋯
0
]
T
\mathbf{u}={{\left[ \begin{matrix} 1 & 0 & \cdots & 0 \\ \end{matrix} \right]}^{T}}
u=[10⋯0]T,
R
^
s
s
=
R
s
s
/
σ
s
2
{{\mathbf{\hat{R}}}_{ss}}\text{=}{{\mathbf{R}}_{ss}}/\sigma _{s}^{2}
R^ss=Rss/σs2 ,
R
^
d
d
=
R
d
d
/
σ
d
2
{{\mathbf{\hat{R}}}_{dd}}\text{=}{{\mathbf{R}}_{dd}}/\sigma _{d}^{2}
R^dd=Rdd/σd2 ,那么可以得到以下关系式
lim
S
N
R
→
+
∞
h
∗
=
u
\underset{SNR\to +\infty }{\mathop{\lim }}\,{{\mathbf{h}}^{*}}=\mathbf{u}
SNR→+∞limh∗=u
lim
S
N
R
→
+
0
h
∗
=
0
\underset{SNR\to +0}{\mathop{\lim }}\,{{\mathbf{h}}^{*}}=\mathbf{0}
SNR→+0limh∗=0
当
S
N
R
→
+
∞
SNR\to +\infty
SNR→+∞时,由于
u
T
x
=
x
(
n
)
{{\mathbf{u}}^{T}}\mathbf{x}=x\left( n \right)
uTx=x(n),此时维纳滤波器并不会滤除噪声,也就是说此时的维纳滤波器不会产生语音失真;当
S
N
R
→
0
SNR\to 0
SNR→0,维纳滤波器的输出将会被严重衰减,此时的维纳滤波器会对语音信号造成失真。
与时域维纳滤波器对应的频域维纳滤波器为
H
(
ω
)
=
P
s
s
(
ω
)
P
s
s
(
ω
)
+
P
d
d
(
ω
)
H\left( \omega \right)=\frac{{{P}_{ss}}\left( \omega \right)}{{{P}_{ss}}\left( \omega \right)+{{P}_{dd}}\left( \omega \right)}
H(ω)=Pss(ω)+Pdd(ω)Pss(ω)
显然,这种滤波器是非因果的。
定义
ξ
=
P
s
s
(
ω
)
P
d
d
(
ω
)
\xi \text{=}\frac{{{P}_{ss}}\left( \omega \right)}{{{P}_{dd}}\left( \omega \right)}
ξ=Pdd(ω)Pss(ω)作为在频率
ω
\omega
ω 处的先验信噪比,那么频域的维纳滤波器可以表示为
H
=
ξ
ξ
+
1
H\text{=}\frac{\xi }{\xi \text{+}1}
H=ξ+1ξ
当
ξ
→
∞
\xi \to \infty
ξ→∞信噪比极大,
H
≈
1
H\approx 1
H≈1,此时维纳滤波器并不会滤除噪声,也就是说此时的维纳滤波器不会产生语音失真;当
ξ
→
0
\xi \to 0
ξ→0(信噪比极小),
H
≈
0
H\approx 0
H≈0,维纳滤波器的输出将会被严重衰减,此时的维纳滤波器会对语音信号造成失真,与之前的时域维纳滤波器相对应。
为了更好地理解维纳滤波器,下面对其进行简单仿真,仿真参数如下
参数名称 | 参数值 |
---|---|
信噪比 | 5dB |
采样率 | 16KHz |
这张图给出的是维纳滤波前后语音信号以及语谱图的对照。从图中可以看出,经过维纳滤波后,被噪声污染的语音信号有了一定的改善,但是从图中也可以看出,维纳滤波后依然存在一些噪声,这也与维纳滤波器的滤波特性相关。
代码如下:
clear;
close all;
clc;
%% 读入数据
[signal,~]=audioread('clean.wav'); %读入干净语音
[noise,fs]=audioread('noise.wav'); %读入噪声
N=3*fs; %选取3秒的语音
signal=signal(1:N);
noise=noise(1:N);
N=length(signal);
t=(0:N-1)/fs;
SNR=5; %信噪比大小
noise=noise/norm(noise,2).*10^(-SNR/20)*norm(signal);
x=signal+noise; %产生固定信噪比的带噪语音
audiowrite('mixed.wav',x,fs); %将混合的语音写入
enhanced_speech=wiener_as('mixed.wav','enhanced_speech.wav');
figure(1)
subplot(321);
plot(t,signal);ylim([-1.5,1.5]);title('干净语音');xlabel('时间/s');ylabel('幅度');
subplot(323);
plot(t,x);ylim([-1.5,1.5]);title('带噪语音');xlabel('时间/s');ylabel('幅度');
subplot(325);
plot(t,real(enhanced_speech));ylim([-1.5,1.5]);title('维纳滤波增强后的语音');xlabel('时间/s');ylabel('幅度');
subplot(322);
spectrogram(signal,256,128,256,16000,'yaxis');
subplot(324);
spectrogram(x,256,128,256,16000,'yaxis');
subplot(326);
spectrogram(enhanced_speech,256,128,256,16000,'yaxis');
wiener_as.m如下
function [enhanced_speech, fs, nbits] = wiener_as(filename,outfile)
%
% Implements the Wiener filtering algorithm based on a priori SNR estimation [1].
%
% Usage: wiener_as(noisyFile, outputFile)
%
% infile - noisy speech file in .wav format
% outputFile - enhanced output file in .wav format
%
% Example call: wiener_as('sp04_babble_sn10.wav','out_wien_as.wav');
%
% References:
% [1] Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori
% signal to noise estimation. Proc. IEEE Int. Conf. Acoust. , Speech, Signal
% Processing, 629-632.
%
% Authors: Yi Hu and Philipos C. Loizou
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $ $Date: 10/09/2006 $
%-------------------------------------------------------------------------
if nargin<2
fprintf('Usage: wiener_as(noisyfile.wav,outFile.wav) \n\n');
return;
end
[noisy_speech, fs]= audioread(filename);
aInfo = audioinfo(filename);
nbits = aInfo.BitsPerSample; % 16 bits resolution
% column vector noisy_speech
% set parameter values
mu= 0.98; % smoothing factor in noise spectrum update
a_dd= 0.98; % smoothing factor in priori update
eta= 0.15; % VAD threshold
frame_dur= 20; % frame duration (20ms Hamming Window)
L= frame_dur* fs/ 1000; % L is frame length (160 for 8k sampling rate)
hamming_win= hamming( L); % hamming window
U= ( hamming_win'* hamming_win)/ L; % normalization factor
% first 120 ms is noise only
len_120ms= fs/ 1000* 120;
% first_120ms= noisy_speech( 1: len_120ms).* ...
% (hann( len_120ms, 'periodic'))';
first_120ms= noisy_speech( 1: len_120ms);
% =============now use Welch's method to estimate power spectrum with
% Hamming window and 50% overlap
nsubframes= floor( len_120ms/ (L/ 2))- 1; % 50% overlap
noise_ps= zeros( L, 1); %noise power spectral
n_start= 1;
for j= 1: nsubframes
noise= first_120ms( n_start: n_start+ L- 1);
noise= noise.* hamming_win;
noise_fft= fft( noise, L); %noise FFT
noise_ps= noise_ps+ ( abs( noise_fft).^ 2)/ (L* U);
n_start= n_start+ L/ 2;
end
noise_ps= noise_ps/ nsubframes;
%==============
% number of noisy speech frames
len1= L/ 2; % with 50% overlap
nframes= floor( length( noisy_speech)/ len1)- 1;
n_start= 1;
for j= 1: nframes
noisy= noisy_speech( n_start: n_start+ L- 1);
noisy= noisy.* hamming_win;
noisy_fft= fft( noisy, L);
noisy_ps= ( abs( noisy_fft).^ 2)/ (L* U);
% ============ voice activity detection
if (j== 1) % initialize posteri
posteri= noisy_ps./ noise_ps; %postriori SNR
posteri_prime= posteri- 1;
posteri_prime( find( posteri_prime< 0))= 0;
priori= a_dd+ (1-a_dd)* posteri_prime; %a priori SNR
else
posteri= noisy_ps./ noise_ps; %postriori SNR
posteri_prime= posteri- 1;
posteri_prime( find( posteri_prime< 0))= 0;
priori= a_dd* (G_prev.^ 2).* posteri_prev+ ...
(1-a_dd)* posteri_prime; %a priori SNR
end
log_sigma_k= posteri.* priori./ (1+ priori)- log(1+ priori);
vad_decision(j)= sum( log_sigma_k)/ L;
if (vad_decision(j)< eta)
% noise only frame found
noise_ps= mu* noise_ps+ (1- mu)* noisy_ps;
vad( n_start: n_start+ L- 1)= 0;
else
vad( n_start: n_start+ L- 1)= 1;
end
% ===end of vad===
G= sqrt( priori./ (1+ priori)); % gain function
enhanced= ifft( noisy_fft.* G, L);
if (j== 1)
enhanced_speech( n_start: n_start+ L/2- 1)= ...
enhanced( 1: L/2);
else
enhanced_speech( n_start: n_start+ L/2- 1)= ...
overlap+ enhanced( 1: L/2);
end
overlap= enhanced( L/ 2+ 1: L);
n_start= n_start+ L/ 2;
G_prev= G;
posteri_prev= posteri;
end
enhanced_speech( n_start: n_start+ L/2- 1)= overlap;
audiowrite( outfile, enhanced_speech, fs, 'BitsPerSample', nbits);
end
关于语音及噪声文件,具体请参考:语音信号处理常用语料库下载地址
参考文献
[1]Loizou P C . Speech Enhancement: Theory and Practice[M]. CRC Press, Inc. 2007.
[2]PhiliposC.Loizou. 语音增强:理论与实践:theory and practice[M]// 语音增强:理论与实践:theory and practice. 电子科技大学出版社, 2012.