Optimum Filters

在这里插入图片描述

The discrete form of the Wiener filtering problem, is to design a filter to recover a signal d ( n ) d (n) d(n) from noisy observations
x ( n ) = d ( n ) + v ( n ) x(n)=d(n)+v(n) x(n)=d(n)+v(n)
Assuming that both d ( n ) d(n) d(n) and v ( n ) v(n) v(n) are wide-sense stationary random processes, Wiener considered the problem of designing the filter that would produce the minimum mean square error estimate of d ( n ) d(n) d(n). Thus, with
ξ = E { ∣ e ( n ) ∣ 2 } = E { ∣ d ( n ) − d ^ ( n ) ∣ 2 } \xi=E\left\{|e(n)|^2\right\}=E\left\{|d(n)-\hat d (n)|^2\right\} ξ=E{e(n)2}=E{d(n)d^(n)2}
the problem is to find the filter that minimizes ξ \xi ξ.

Depending upon how the signals x ( n ) x(n) x(n) and d ( n ) d(n) d(n) are related to each other, a number of different and important problems may be cast into a Wiener filtering framework. Some of the problems that will be considered in this chapter include:

  • Filtering: estimate d ( n ) d(n) d(n) from x ( n ) = d ( n ) + v ( n ) x(n)=d(n)+v(n) x(n)=d(n)+v(n).
  • Prediction: estimate x ( n + α ) x(n+\alpha) x(n+α) from x ( n ) , x ( n − 1 ) , x ( n − 2 ) , … x(n), x(n-1), x(n-2), \ldots x(n),x(n1),x(n2),
  • Deconvolution: estimate d ( n ) d(n) d(n) from x ( n ) = g ( n ) ∗ d ( n ) + v ( n ) x(n)=g(n) * d(n)+v(n) x(n)=g(n)d(n)+v(n)
  • Noise cancellation: estimate v 1 ( n ) v_{1}(n) v1(n) from v 2 ( n ) v_{2}(n) v2(n) and subtract it from x ( n ) = d ( n ) + v 1 ( n ) x(n)=d(n)+v_{1}(n) x(n)=d(n)+v1(n)

The FIR Wiener Filter

In this section we consider the design of an FIR Wiener filter that produces the minimum mean-square estimate of a given process d ( n ) d(n) d(n) by filtering a set of observations of a statistically related process x ( n ) x(n) x(n).

Note that we didn’t introduce any assumption about x ( n ) x(n) x(n), it can be any process statistically related to d ( n ) d(n) d(n).

It is assumed that x ( n ) x (n) x(n) and d ( n ) d(n) d(n) are jointly wide-sense stationary with known autocorrelations, r x ( k ) r_x(k) rx(k) and r d ( k ) r_d(k) rd(k), and known cross-correlation r d x ( k ) r_{dx}(k) rdx(k). Denoting the unit sample response of the Wiener filter by w ( n ) w(n) w(n), and assuming a ( p − l ) (p - l) (pl)st-order filter, the system function is
W ( z ) = ∑ n = 0 p − 1 w ( n ) z − n W(z)=\sum_{n=0}^{p-1} w(n)z^{-n} W(z)=n=0p1w(n)zn
With x ( n ) x(n) x(n) the input to the filter, the output, which we denote by d ^ ( n ) \hat d (n) d^(n), is the convolution of w ( n ) w(n) w(n) with x ( n ) x(n) x(n),
d ^ ( n ) = ∑ l = 0 p − 1 w ( l ) x ( n − l ) (FWF.1) \hat d(n)=\sum_{l=0}^{p-1}w(l)x(n-l) \tag{FWF.1} d^(n)=l=0p1w(l)x(nl)(FWF.1)
The Wiener filter design problem requires that we find the filter coefficients, w ( k ) w(k) w(k), that minimize the mean-square error
ξ = E { ∣ e ( n ) ∣ 2 } = E { ∣ d ( n ) − d ^ ( n ) ∣ 2 } (FWF.2) \xi=E\left\{|e(n)|^2\right\}=E\left\{|d(n)-\hat d (n)|^2\right\} \tag{FWF.2} ξ=E{e(n)2}=E{d(n)d^(n)2}(FWF.2)
In order for a set of filter coefficients to minimize ξ \xi ξ it is necessary and sufficient that the derivative of ξ \xi ξ with respect to w ∗ ( k ) w^* (k) w(k) be equal to zero for k = 0 , 1 , ⋯   , p − 1 k=0,1,\cdots,p-1 k=0,1,,p1,
∂ ξ ∂ w ∗ ( k ) = ∂ ∂ w ∗ ( k ) E { e ( n ) e ∗ ( n ) } = E { e ( n ) ∂ e ∗ ( n ) ∂ w ∗ ( k ) } = 0 (FWF.3) \frac{\partial \xi}{\partial w^*(k)}=\frac{\partial}{\partial w^*(k)}E\{e(n)e^*(n )\}=E\left\{e(n)\frac{\partial e^*(n)}{\partial w^*(k)}\right\}=0 \tag{FWF.3} w(k)ξ=w(k)E{e(n)e(n)}=E{e(n)w(k)e(n)}=0(FWF.3)
With
e ( n ) = d ( n ) − ∑ l = 0 p − 1 w ( l ) x ( n − l ) (FWF.4) e(n)=d(n)-\sum_{l=0}^{p-1} w(l)x(n-l)\tag{FWF.4} e(n)=d(n)l=0p1w(l)x(nl)(FWF.4)
it follows that
∂ e ∗ ( n ) ∂ w ∗ ( k ) = − x ∗ ( n − k ) \frac{\partial e^*(n)}{\partial w^*(k)}=-x^*(n-k) w(k)e(n)=x(nk)
and ( F W F . 3 ) (FWF.3) (FWF.3) becomes
E { e ( n ) x ∗ ( n − k ) } = 0 ; k = 0 , 1 , ⋯   , p − 1 (FWF.5) E\{e(n)x^*(n-k) \}=0;\quad k=0,1,\cdots,p-1\tag{FWF.5} E{e(n)x(nk)}=0;k=0,1,,p1(FWF.5)
which is known as the orthogonality principle or the projection theorem.

Substituting ( F W F . 4 ) (FWF.4) (FWF.4) into ( F W F . 5 ) (FWF.5) (FWF.5) we have
E { d ( n ) x ∗ ( n − k ) } − ∑ l = 0 p − 1 w ( l ) E { x ( n − l ) x ∗ ( n − k ) } = 0 (FWF.6) E\{d(n)x^*(n-k)\}-\sum_{l=0}^{p-1}w(l)E\{x(n-l)x^*(n-k)\}=0\tag{FWF.6} E{d(n)x(nk)}l=0p1w(l)E{x(nl)x(nk)}=0(FWF.6)
Finally, since x ( n ) x(n) x(n) and d ( n ) d(n) d(n) are jointly WSS then
∑ l = 0 p − 1 w ( l ) r x ( k − l ) = r d x ( k ) ; k = 0 , 1 , ⋯   , p − 1 (FWF.7) \sum_{l=0}^{p-1} w(l)r_x(k-l)=r_{dx}(k);\quad k=0,1,\cdots,p-1 \tag{FWF.7} l=0p1w(l)rx(kl)=rdx(k);k=0,1,,p1(FWF.7)
In matrix form, using the fact that the autocorrelation sequence is conjugate symmetric, r x ( k ) = r x ∗ ( k ) r_x(k)=r_x^*(k) rx(k)=rx(k), ( F W F . 7 ) (FWF.7) (FWF.7) becomes
[ r x ( 0 ) r x ∗ ( 1 ) ⋯ r x ∗ ( p − 1 ) r x ( 1 ) r x ( 0 ) ⋯ r x ∗ ( p − 2 ) r x ( 2 ) r x ( 1 ) ⋯ r x ∗ ( p − 3 ) ⋮ ⋮ ⋮ r x ( p − 1 ) r x ( p − 2 ) ⋯ r x ( 0 ) ] [ w ( 0 ) w ( 1 ) w ( 2 ) ⋮ w ( p − 1 ) ] = [ r d x ( 0 ) r d x ( 1 ) r d x ( 2 ) ⋮ r d x ( p − 1 ) ] (FWF.8) \left[\begin{array}{cccc} r_{x}(0) & r_{x}^{*}(1) & \cdots & r_{x}^{*}(p-1) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}^{*}(p-2) \\ r_{x}(2) & r_{x}(1) & \cdots & r_{x}^{*}(p-3) \\ \vdots & \vdots & & \vdots \\ r_{x}(p-1) & r_{x}(p-2) & \cdots & r_{x}(0) \end{array}\right]\left[\begin{array}{c} w(0) \\ w(1) \\ w(2) \\ \vdots \\ w(p-1) \end{array}\right]=\left[\begin{array}{c} r_{d x}(0) \\ r_{d x}(1) \\ r_{d x}(2) \\ \vdots \\ r_{d x}(p-1) \end{array}\right]\tag{FWF.8} rx(0)rx(1)rx(2)rx(p1)rx(1)rx(0)rx(1)rx(p2)rx(p1)rx(p2)rx(p3)rx(0)w(0)w(1)w(2)w(p1)=rdx(0)rdx(1)rdx(2)rdx(p1)(FWF.8)
which is the matrix form of Wiener-Hopf equations. It may be written more concisely as
R x w = r d x (FWF.9) \mathbf R_x \mathbf w=\mathbf r_{dx}\tag{FWF.9} Rxw=rdx(FWF.9)
where R x = E { x ∗ x T } , x = [ x ( n ) , ⋯   , x ( n − p + 1 ) ] \mathbf R_x=E\{\mathbf x^* \mathbf x^T\},\mathbf x=[x(n),\cdots,x(n-p+1)] Rx=E{xxT},x=[x(n),,x(np+1)].

The minimum mean-square error in the estimate of d ( n ) d(n) d(n) may be evaluated from ( F W F . 2 ) (FWF.2) (FWF.2) as follows. With
ξ m i n = E { ∣ e ( n ) ∣ 2 } = E { e ( n ) [ d ( n ) − ∑ l = 0 p − 1 w ( l ) x ( n − l ) ] ∗ } = a E { e ( n ) d ∗ ( n ) } = E { [ d ( n ) − ∑ l = 0 p − 1 w ( l ) x ( n − l ) ] d ∗ ( n ) } = r d ( 0 ) − ∑ l = 0 p − 1 w ( l ) r d x ∗ ( l ) (FWF.10) \begin{aligned} \xi_{\mathrm{min}}&=E\left\{|e(n)|^2\right\}=E\left\{e(n)\left[d(n)-\sum_{l=0}^{p-1} w(l)x(n-l) \right]^*\right\}\\ &\stackrel{a}{=}E\{e(n)d^*(n)\}=E\left\{\left[d(n)-\sum_{l=0}^{p-1} w(l)x(n-l) \right]d^*(n)\right\}\\ &=r_d(0)-\sum_{l=0}^{p-1}w(l)r^*_{dx}(l) \end{aligned} \tag{FWF.10} ξmin=E{e(n)2}=Ee(n)[d(n)l=0p1w(l)x(nl)]=aE{e(n)d(n)}=E{[d(n)l=0p1w(l)x(nl)]d(n)}=rd(0)l=0p1w(l)rdx(l)(FWF.10)
where = a \stackrel{a}{=} =a is due to ( F W F . 5 ) (FWF.5) (FWF.5), i.e., orthogonality theorem. Or using vector notation
ξ m i n = r d ( 0 ) − r d x H w = r d ( 0 ) − r d x H R x − 1 r d x (FWF.11) \xi_{\mathrm{min}}=r_d(0)-\mathbf r_{dx}^H \mathbf w=r_d(0)-\mathbf r_{dx}^H \mathbf R_x^{-1} \mathbf r_{dx}\tag{FWF.11} ξmin=rd(0)rdxHw=rd(0)rdxHRx1rdx(FWF.11)

在这里插入图片描述


Filtering

In the filtering problem, a signal d ( n ) d(n) d(n) is to be estimated from a noise corrupted observation
x ( n ) = d ( n ) + v ( n )  or  x = d + v x(n)=d(n)+v(n) \text{ or } \mathbf x=\mathbf d+\mathbf v x(n)=d(n)+v(n) or x=d+v
where x = [ x ( n ) , ⋯   , x ( n − p + 1 ) ] \mathbf x=[x(n),\cdots,x(n-p+1)] x=[x(n),,x(np+1)], d = [ d ( n ) , ⋯   , d ( n − p + 1 ) ] \mathbf d=[d(n),\cdots,d(n-p+1)] d=[d(n),,d(np+1)], v = [ v ( n ) , ⋯   , v ( n − p + 1 ) ] \mathbf v=[v(n),\cdots,v(n-p+1)] v=[v(n),,v(np+1)].

It will be assumed that the noise v ( n ) v(n) v(n) has zero mean and that it is uncorrelated with d ( n ) d(n) d(n). Therefore, E { d ( n ) v ∗ ( n − k ) } = 0 E\{d(n)v^*(n - k)\} = 0 E{d(n)v(nk)}=0 and R x \mathbf R_x Rx and r d x \mathbf r_{dx} rdx becomes
R x = E { x ∗ x T } = E { ( d + v ) ∗ ( d + v ) T } = E { d ∗ d T } + E { v ∗ v T } = R d + R v r d x = E { d ( n ) x ∗ } = E { d ( n ) d ∗ } = r d \begin{aligned} \mathbf R_x&=E\{\mathbf x^* \mathbf x^T\}=E\{(\mathbf d+\mathbf v)^* (\mathbf d+\mathbf v)^T\}=E\{\mathbf d^* \mathbf d^T\}+E\{\mathbf v^* \mathbf v^T\}=\mathbf R_d+\mathbf R_v\\ \mathbf r_{dx}&=E\{d(n)\mathbf x^*\}=E\{d(n)\mathbf d^*\}=\mathbf r_d \end{aligned} Rxrdx=E{xxT}=E{(d+v)(d+v)T}=E{ddT}+E{vvT}=Rd+Rv=E{d(n)x}=E{d(n)d}=rd
The Wiener-Hopf equations then become
( R d + R v ) w = r d (FWF.12) (\mathbf R_d+\mathbf R_v)\mathbf w=\mathbf r_d \tag{FWF.12} (Rd+Rv)w=rd(FWF.12)


Prediction

Noise-free observations

We consider the following data model ( α \alpha α-step prediction)
d ( n ) = x ( n + α ) x ^ ( n + α ) = ∑ k = 0 p − 1 w ( k ) x ( n − k ) d(n)=x(n+\alpha)\\ \hat x(n+\alpha)=\sum_{k=0}^{p-1}w(k)x(n-k) d(n)=x(n+α)x^(n+α)=k=0p1w(k)x(nk)
This results in the following expression for r d x \mathbf r_{dx} rdx:
r d x = E { d ( n ) x ∗ } = E { x ( n + α ) x ∗ } ≜ r α \mathbf r_{dx}=E\{d(n)\mathbf x^*\}=E\{x(n+\alpha)\mathbf x^*\}\triangleq \mathbf r_\alpha rdx=E{d(n)x}=E{x(n+α)x}rα
The Wiener-Hopf equations then become
[ r x ( 0 ) r x ∗ ( 1 ) ⋯ r x ∗ ( p − 1 ) r x ( 1 ) r x ( 0 ) ⋯ r x ∗ ( p − 2 ) ⋮ ⋮ ⋱ ⋮ r x ( p − 1 ) r x ( p − 2 ) ⋯ r x ( 0 ) ] ⏟ R x [ w ( 0 ) w ( 1 ) ⋮ w ( p − 1 ) ] ⏟ w = [ r x ( α ) r x ( α + 1 ) ⋮ r x ( α + p − 1 ) ] ⏟ r α (FWF.13) \underbrace{\left[\begin{array}{cccc} r_{x}(0) & r_{x}^{*}(1) & \cdots & r_{x}^{*}(p-1) \\ r_{x}(1) & r_{x}(0) & \cdots & r_{x}^{*}(p-2) \\ \vdots & \vdots & \ddots & \vdots \\ r_{x}(p-1) & r_{x}(p-2) & \cdots & r_{x}(0) \end{array}\right]}_{\mathbf{R}_{x}} \underbrace{\left[\begin{array}{c} w(0) \\ w(1) \\ \vdots \\ w(p-1) \end{array}\right]}_{\mathbf{w}}= \underbrace{\left[\begin{array}{c} r_{x}(\alpha) \\ r_{x}(\alpha+1) \\ \vdots \\ r_{x}(\alpha+p-1) \end{array}\right]}_{\mathbf{r}_{\alpha}}\tag{FWF.13} Rx rx(0)rx(1)rx(p1)rx(1)rx(0)rx(p2)rx(p1)rx(p2)rx(0)w w(0)w(1)w(p1)=rα rx(α)rx(α+1)rx(α+p1)(FWF.13)
For α = 1 \alpha=1 α=1, this is similar to the all-pole modeling using Prony’s method, the autocorrelation method, or the Yule-Walker method (the minus sign simply changes the sign of the coefficients).

Observations with noise

We consider the following data model ( α (\alpha (α -step prediction)
y ( n ) = x ( n ) + v ( n )  or  y = x + v  and  d ( n ) = x ( n + α ) y(n)=x(n)+v(n) \text { or } \mathbf{y}=\mathbf{x}+\mathbf{v} \text { and } d(n)=x(n+\alpha) y(n)=x(n)+v(n) or y=x+v and d(n)=x(n+α)
where x = [ x ( n ) , ⋯   , x ( n − p + 1 ) ] \mathbf x=[x(n),\cdots,x(n-p+1)] x=[x(n),,x(np+1)], y = [ y ( n ) , … , y ( n − p + 1 ) ] T \mathbf{y}=[y(n), \ldots, y(n-p+1)]^{T} y=[y(n),,y(np+1)]T and v = [ v ( n ) , … , v ( n − p + 1 ) ] T \mathbf{v}=[v(n), \ldots, v(n-p+1)]^{T} v=[v(n),,v(np+1)]T
This results in the following expressions for R y \mathbf{R}_{y} Ry and r d y \mathbf{r}_{d y} rdy
R y = E { y ∗ y T } = E { ( x + v ) ∗ ( x + v ) T } = E { x ∗ x T } + E { v ∗ v T } = R x + R v r d y = E { d ( n ) y ∗ } = E { x ( n + α ) ( x ∗ + v ∗ ) } = E { x ( n + α ) x ∗ ) = r α \begin{array}{l} \mathbf{R}_{y}=E\left\{\mathbf{y}^* \mathbf{y}^{T}\right\}=E\left\{(\mathbf{x}+\mathbf{v})^* (\mathbf{x}+\mathbf{v})^{T}\right\}=E\left\{\mathbf{x}^* \mathbf{x}^{T}\right\}+E\left\{\mathbf{v}^*\mathbf{v}^{T}\right\}=\mathbf{R}_{x}+\mathbf{R}_{v} \\ \mathbf{r}_{d y}=E\left\{d(n) \mathbf{y}^{*}\right\}=E\left\{x(n+\alpha)\left(\mathbf{x}^{*}+\mathbf{v}^{*}\right)\right\}=E\left\{x(n+\alpha) \mathbf{x}^{*}\right)=\mathbf{r}_{\alpha} \end{array} Ry=E{yyT}=E{(x+v)(x+v)T}=E{xxT}+E{vvT}=Rx+Rvrdy=E{d(n)y}=E{x(n+α)(x+v)}=E{x(n+α)x)=rα
The Wiener-Hopf equations then become
( R x + R v ) w = r α (FWF.14) \left(\mathbf{R}_{x}+\mathbf{R}_{v}\right) \mathbf{w}=\mathbf{r}_{\alpha}\tag{FWF.14} (Rx+Rv)w=rα(FWF.14)


Deconvolution

We consider a noisy convolutive model, with an FIR filter g ( n ) g(n) g(n) of order L : L: L:
x ( n ) = g ( n ) ∗ d ( n ) + v ( n )  or  x = G d + v x(n)=g(n) * d(n)+v(n) \text { or } \mathbf{x}=\mathbf{G} \mathbf{d}+\mathbf{v} x(n)=g(n)d(n)+v(n) or x=Gd+v
where d ( p + L ) × 1 = [ d ( n ) , … , d ( n − p + 1 ) , … , d ( n − L − p + 1 ) ] T , v p × 1 = [ v ( n ) , … , v ( n − p + 1 ) ] T \mathbf{d}_{(p+L)\times 1}=[d(n), \ldots, d(n-p+1), \ldots, d(n-L-p+1)]^{T}, \mathbf{v}_{p\times 1}=[v(n), \ldots, v(n-p+1)]^{T} d(p+L)×1=[d(n),,d(np+1),,d(nLp+1)]T,vp×1=[v(n),,v(np+1)]T and
G p × ( p + L ) = [ g ( 0 ) ⋯ g ( L ) ⋯ 0 ⋮ ⋱ ⋱ ⋱ ⋮ 0 ⋯ g ( 0 ) ⋯ g ( L ) ] . \mathrm{G}_{p\times (p+L)}=\left[\begin{array}{ccccc} g(0) & \cdots & g(L) & \cdots & 0 \\ \vdots & \ddots & \ddots & \ddots & \vdots \\ 0 & \cdots & g(0) & \cdots & g(L) \end{array}\right] . Gp×(p+L)=g(0)0g(L)g(0)0g(L).
This results in the following expressions for R x \mathbf{R}_{x} Rx and r d x \mathrm{r}_{d x} rdx :
R x = E { x ∗ x T } = G ∗ E { d ∗ d T } G T + E { v ∗ v T } = G ∗ R d G T + R v r d x = E { d ( n ) x ∗ } = G ∗ E { d ( n ) d ∗ } = G ∗ r d \begin{array}{l} \mathbf{R}_{x}=E\left\{\mathbf{x}^{*} \mathbf{x}^{T}\right\}=\mathbf{G}^{*} E\left\{\mathbf{d}^{*} \mathbf{d}^{T}\right\} \mathbf{G}^{T}+E\left\{\mathbf{v}^{*} \mathbf{v}^{T}\right\}=\mathbf{G}^{*} \mathbf{R}_{d} \mathbf{G}^{T}+\mathbf{R}_{v} \\ \mathbf{r}_{d x}=E\left\{d(n) \mathbf{x}^{*}\right\}=\mathbf{G}^{*} E\left\{d(n) \mathbf{d}^{*}\right\}=\mathbf{G}^{*} \mathbf{r}_{d} \end{array} Rx=E{xxT}=GE{ddT}GT+E{vvT}=GRdGT+Rvrdx=E{d(n)x}=GE{d(n)d}=Grd
The Wiener-Hopf equations then become
( G ∗ R d G T + R v ) w = G ∗ r d (FWF.15) \left(\mathbf{G}^{*} \mathbf{R}_{d} \mathbf{G}^{T}+\mathbf{R}_{v}\right) \mathbf{w}=\mathbf{G}^{*} \mathbf{r}_{d}\tag{FWF.15} (GRdGT+Rv)w=Grd(FWF.15)


Noise cancellation

在这里插入图片描述

We consider the same data model as for filtering:
x ( n ) = d ( n ) + v 1 ( n )  or  x = d + v 1 x(n)=d(n)+v_{1}(n) \text { or } \mathbf{x}=\mathbf{d}+\mathbf{v}_{1} x(n)=d(n)+v1(n) or x=d+v1
where d = [ d ( n ) , … , d ( n − p + 1 ) ] T \mathrm{d}=[d(n), \ldots, d(n-p+1)]^{T} d=[d(n),,d(np+1)]T and v 1 = [ v 1 ( n ) , … , v 1 ( n − p + 1 ) ] T \mathrm{v}_{1}=\left[v_{1}(n), \ldots, v_{1}(n-p+1)\right]^{T} v1=[v1(n),,v1(np+1)]T
This time we estimate v 1 ( n ) v_{1}(n) v1(n) from a correlated noise source v 2 ( n ) , v_{2}(n), v2(n), and estimate d ( n ) d(n) d(n) as
d ^ ( n ) = x ( n ) − v ^ 1 ( n )  with  v ^ 1 ( n ) = w T v 2 \hat{d}(n)=x(n)-\hat{v}_{1}(n) \text { with } \hat{v}_{1}(n)=\mathbf{w}^{T} \mathbf{v}_{2} d^(n)=x(n)v^1(n) with v^1(n)=wTv2
where v 2 = [ v 2 ( n ) , … , v 2 ( n − p + 1 ) ] T \mathbf{v}_{2}=\left[v_{2}(n), \ldots, v_{2}(n-p+1)\right]^{T} v2=[v2(n),,v2(np+1)]T

To estimate v 1 ( n ) v_{1}(n) v1(n) from v 2 ( n ) , v_{2}(n), v2(n), we start from the Wiener-Hopf equations
R v 2 w = r v 1 v 2 \mathbf{R}_{v_{2}} \mathbf{w}=\mathbf{r}_{v_{1} v_{2}} Rv2w=rv1v2
since r v 1 v 2 \mathbf{r}_{v 1 v 2} rv1v2 is generally not known, we can rewrite this as
r v 1 v 2 = E { v 1 ( n ) v 2 ∗ } = E { ( d ( n ) + v 1 ( n ) ) v 2 ∗ } = E { x ( n ) v 2 ∗ } = r x v 2 \mathbf{r}_{v_{1} v_{2}}=E\left\{v_{1}(n) \mathbf{v}_{2}^{*}\right\}=E\left\{\left(d(n)+v_{1}(n)\right) \mathbf{v}_{2}^{*}\right\}=E\left\{x(n) \mathbf{v}_{2}^{*}\right\}=\mathbf{r}_{x v_{2}} rv1v2=E{v1(n)v2}=E{(d(n)+v1(n))v2}=E{x(n)v2}=rxv2
and thus the Wiener-Hopf equations can be written as
R v 2 w = r x v 2 (FWF.16) \mathbf{R}_{v_{2}} \mathbf{w}=\mathbf{r}_{x v_{2}}\tag{FWF.16} Rv2w=rxv2(FWF.16)
As already mentioned, d ( n ) d(n) d(n) is then estimated as
d ^ ( n ) = x ( n ) − v ^ 1 ( n )  with  v ^ 1 ( n ) = w T v 2 \hat{d}(n)=x(n)-\hat{v}_{1}(n) \text { with } \hat{v}_{1}(n)=\mathbf{w}^{T} \mathbf{v}_{2} d^(n)=x(n)v^1(n) with v^1(n)=wTv2

Discrete Kalman Filter

In section [The FIR Wiener Filter](# The FIR Wiener Filter) we considered the problem of designing a casual Wiener filter to estimate a process d ( n ) d(n) d(n) from a set of noisy observations x ( n ) = d ( n ) + v ( n ) x(n)=d(n)+v(n) x(n)=d(n)+v(n). The primary limitation with the solution that was derived is that it requires that d ( n ) d(n) d(n) and x ( n ) x(n) x(n) be jointly wide-sense stationary processes. Since most processes encountered in practice are nonstationary, this constraint limits the usefulness of the Wiener filter. Therefore, in this section we re-examine this estimation problem within the context of nonstationary processes and derive what is known as the discrete Kalman filter.

Consider the following nonstationary state space model:
x ( n ) = A ( n − 1 ) x ( n − 1 ) + w ( n ) y ( n ) = C ( n ) x ( n ) + v ( n ) (DKF.1) \begin{array}{l} \mathbf{x}(n)=\mathbf{A}(n-1) \mathbf{x}(n-1)+\mathbf{w}(n) \\ \mathbf{y}(n)=\mathbf{C}(n) \mathbf{x}(n)+\mathbf{v}(n) \end{array}\tag{DKF.1} x(n)=A(n1)x(n1)+w(n)y(n)=C(n)x(n)+v(n)(DKF.1)
x ( n ) \mathbf{x}(n) x(n): the p × 1 p \times 1 p×1 state vector

A ( n − 1 ) \mathbf{A}(n-1) A(n1): the p × p p \times p p×p state transition matrix

w ( n ) \mathbf{w}(n) w(n): the state noise with E { w ( n ) w H ( n ) } = Q w ( n ) δ ( n − k ) E\left\{\mathbf{w}(n) \mathbf{w}^{H}(n)\right\}=\mathbf{Q}_{w}(n) \delta(n-k) E{w(n)wH(n)}=Qw(n)δ(nk)

y ( n ) \mathbf{y}(n) y(n): the q × 1 q \times 1 q×1 observation vector

C ( n ) \mathbf{C}(n) C(n): the q × p q \times p q×p observation matrix

v ( n ) \mathbf{v}(n) v(n): the observation noise with E { v ( n ) v H ( k ) } = Q v ( n ) δ ( n − k ) E\left\{\mathbf{v}(n) \mathbf{v}^{H}(k)\right\}=\mathbf{Q}_{v}(n) \delta(n-k) E{v(n)vH(k)}=Qv(n)δ(nk), independent of the observation noise

It is assumed that A ( n ) , C ( n ) , Q w ( n ) , \mathbf A(n), \mathbf C(n), \mathbf Q_w(n), A(n),C(n),Qw(n), and Q v ( n ) \mathbf Q_v(n) Qv(n) are known.

We are going to show that the optimum linear estimate of x ( n ) \mathbf x(n) x(n) to be expressed in the form
x ^ ( n ) = A ( n − 1 ) x ^ ( n − 1 ) + K ( n ) [ y ( n ) − C ( n ) A ( n − 1 ) x ^ ( n − 1 ) ] (DKF.2) \hat {\mathbf x}(n)=\mathbf A(n-1)\hat{\mathbf x}(n-1)+\mathbf K(n)\big[\mathbf y(n)-\mathbf C(n)\mathbf A(n-1)\hat {\mathbf x}(n-1)\big]\tag{DKF.2} x^(n)=A(n1)x^(n1)+K(n)[y(n)C(n)A(n1)x^(n1)](DKF.2)
With the appropriate Kalman gain matrix K ( n ) \mathbf K(n) K(n), this recursion corresponds to the discrete Kalman filter.

Let us define x ^ ( n ∣ n − 1 ) \hat{\mathbf{x}}(n | n-1) x^(nn1) and x ^ ( n ∣ n ) \hat{\mathbf{x}}(n | n) x^(nn) as the best linear estimate of x ( n ) \mathbf{x}(n) x(n) given the observations y ( n ) \mathbf{y}(n) y(n) up to time n − 1 n-1 n1 and n , n, n, respectively.
Let us denote the corresponding errors as
e ( n ∣ n − 1 ) = x ( n ) − x ^ ( n ∣ n − 1 ) e ( n ∣ n ) = x ( n ) − x ^ ( n ∣ n ) (DKF.3) \begin{aligned} \mathbf{e}(n | n-1) &=\mathbf{x}(n)-\hat{\mathbf{x}}(n | n-1) \\ \mathbf{e}(n | n) &=\mathbf{x}(n)-\hat{\mathbf{x}}(n | n) \end{aligned}\tag{DKF.3} e(nn1)e(nn)=x(n)x^(nn1)=x(n)x^(nn)(DKF.3)
with covariance matrices
P ( n ∣ n − 1 ) = E { e ( n ∣ n − 1 ) e H ( n ∣ n − 1 ) } P ( n ∣ n ) = E { e ( n ∣ n ) e H ( n ∣ n ) } (DKF.4) \begin{aligned} \mathbf{P}(n| n-1)&=E\left\{\mathbf{e}(n| n-1) \mathbf{e}^{H}(n | n-1)\right\}\\ \mathbf{P}(n | n)&=E\left\{\mathbf{e}(n | n) \mathbf{e}^{H}(n | n)\right\} \end{aligned}\tag{DKF.4} P(nn1)P(nn)=E{e(nn1)eH(nn1)}=E{e(nn)eH(nn)}(DKF.4)
For each n > 0 n >0 n>0, given x ^ ( n − 1 ∣ n − 1 ) \hat {\mathbf x}(n-1|n-1) x^(n1n1) and P ( n − 1 ∣ n − 1 ) \mathbf P(n-1|n-1) P(n1n1), when a new observation, y ( n ) y(n) y(n), becomes available, the problem is to find the minimum mean-square estimate x ^ ( n ∣ n ) \hat {\mathbf x} (n|n) x^(nn) for the state vector x ( n ) \mathbf x(n) x(n), i.e., to solve
min ⁡ x ^ ( n ∣ n ) t r ( P ( n ∣ n ) ) (DKF.5) \min_{\hat {\mathbf x}(n|n)} \mathrm{tr}(\mathbf P(n|n))\tag{DKF.5} x^(nn)mintr(P(nn))(DKF.5)
The solution to this problem will be derived in two steps.

  • Given x ^ ( n − 1 ∣ n − 1 ) \hat{\mathbf x}(n-1|n-1) x^(n1n1) we will find x ^ ( n ∣ n − 1 ) \hat {\mathbf x}(n|n-1) x^(nn1), which is the best estimate of x ( n ) \mathbf x(n) x(n) without the observation y ( n ) \mathbf y(n) y(n).
  • Given y ( n ) \mathbf y(n) y(n) and x ^ ( n ∣ n − 1 ) \hat{\mathbf x}(n|n - 1) x^(nn1) we will estimate x ^ ( n ∣ n ) \hat {\mathbf x}(n|n) x^(nn).

In the first step, since no new measurements are used to estimate x ( n ) \mathbf x(n) x(n), all that is known is that x ( n ) \mathbf x(n) x(n) evolves according to the state equation
x ( n ) = A ( n − 1 ) x ( n − 1 ) + w ( n ) \mathbf{x}(n)=\mathbf{A}(n-1) \mathbf{x}(n-1)+\mathbf{w}(n) x(n)=A(n1)x(n1)+w(n)
since w ( n ) \mathbf{w}(n) w(n) is a zero mean white noise process (and the values of w ( n ) \mathbf{w}(n) w(n) are unknown), then we may predict x ( n ) \mathbf{x}(n) x(n) as follows,
x ^ ( n ∣ n − 1 ) = A ( n − 1 ) x ^ ( n − 1 ∣ n − 1 ) (DKF.6) \hat{\mathbf{x}}(n|n-1)=\mathbf{A}(n-1) \hat{\mathbf{x}}(n-1| n-1)\tag{DKF.6} x^(nn1)=A(n1)x^(n1n1)(DKF.6)
which has an estimation error given by
e ( n ∣ n − 1 ) = x ( n ) − x ^ ( n ∣ n − 1 ) = A ( n − 1 ) x ( n − 1 ) + w ( n ) − A ( n − 1 ) x ^ ( n − 1 ∣ n − 1 ) = A ( n − 1 ) e ( n − 1 ∣ n − 1 ) + w ( n ) (DKF.7) \begin{aligned} \mathbf{e}(n |n-1) &=\mathbf{x}(n)-\hat{\mathbf{x}}(n|n-1) \\ &=\mathbf{A}(n-1) \mathbf{x}(n-1)+\mathbf{w}(n)-\mathbf{A}(n-1) \hat{\mathbf{x}}(n-1 | n-1) \\ &=\mathbf{A}(n-1) \mathbf{e}(n-1 | n-1)+\mathbf{w}(n) \end{aligned}\tag{DKF.7} e(nn1)=x(n)x^(nn1)=A(n1)x(n1)+w(n)A(n1)x^(n1n1)=A(n1)e(n1n1)+w(n)(DKF.7)
Since the estimation error e ( n − 1 ∣ n − 1 ) \mathbf{e}(n-1 |n-1) e(n1n1) is uncorrelated with w ( n ) \mathbf{w}(n) w(n) (a consequence of the fact that w ( n ) \mathbf w(n) w(n) is a white noise sequence), then
P ( n ∣ n − 1 ) = A ( n − 1 ) P ( n − 1 ∣ n − 1 ) A H ( n − 1 ) + Q w ( n ) (DKF.8) \mathbf{P}(n |n-1)=\mathbf{A}(n-1) \mathbf{P}(n-1 | n-1) \mathbf{A}^{H}(n-1)+\mathbf{Q}_{w}(n)\tag{DKF.8} P(nn1)=A(n1)P(n1n1)AH(n1)+Qw(n)(DKF.8)
In the second step, we incorporate the new measurement y ( n ) \mathbf y(n) y(n) into the estimate x ^ ( n ∣ n − 1 ) \hat {\mathbf x}(n|n-1) x^(nn1).

A linear estimate of x ( n ) \mathbf x(n) x(n) that is based on x ^ ( n ∣ n − 1 ) \hat {\mathbf x}(n|n-1) x^(nn1) and y ( n ) \mathbf y(n) y(n) is of the form
x ^ ( n ∣ n ) = K ′ ( n ) x ^ ( n ∣ n − 1 ) + K ( n ) y ( n ) (DKF.9) \hat {\mathbf x}(n|n)=\mathbf K'(n)\hat {\mathbf x}(n|n-1)+\mathbf K(n)\mathbf y(n)\tag{DKF.9} x^(nn)=K(n)x^(nn1)+K(n)y(n)(DKF.9)
where K ( n ) \mathbf K(n) K(n) and K ′ ( n ) \mathbf K'(n) K(n) are matrices to be specified.

  • Requirement 1: x ^ ( n ∣ n ) , x ^ ( n ∣ n − 1 ) \hat {\mathbf x}(n|n),\hat {\mathbf x}(n|n-1) x^(nn),x^(nn1) are unbiased
  • Requirement 2: x ^ ( n ∣ n ) \hat {\mathbf x}(n|n) x^(nn) minimize the mean-square error, E ∥ e ( n ∣ n ) ∥ 2 = t r ( P ( n ∣ n ) ) E\|\mathbf e(n|n)\|^2=\mathrm{tr}(\mathbf P(n|n)) Ee(nn)2=tr(P(nn))

From Requirement 1,
E ( x ( n ) ) = K ′ ( n ) E ( x ( n ) ) + K ( n ) ( C ( n ) E ( x ( n ) ) + E ( v ( n ) ) ) E(\mathbf x (n))=\mathbf K'(n)E(\mathbf x (n))+\mathbf K(n)(\mathbf C(n)E(\mathbf x (n))+E(\mathbf v (n))) E(x(n))=K(n)E(x(n))+K(n)(C(n)E(x(n))+E(v(n)))
Since E ( v ( n ) ) = 0 E(\mathbf v (n))=\mathbf 0 E(v(n))=0, we have
K ′ ( n ) = I − K ( n ) C ( n ) (DKF.10) \mathbf K'(n)=\mathbf I -\mathbf K(n) \mathbf C(n) \tag{DKF.10} K(n)=IK(n)C(n)(DKF.10)
Substituting ( D K F . 10 ) (DKF.10) (DKF.10) into ( D K F . 9 ) (DKF.9) (DKF.9), we have
x ^ ( n ∣ n ) = x ^ ( n ∣ n − 1 ) + K ( n ) [ y ( n ) − C ( n ) x ^ ( n ∣ n − 1 ) ] (DKF.11) \hat {\mathbf x}(n|n)=\hat {\mathbf x}(n|n-1)+\mathbf K(n)\big[\mathbf y(n)-\mathbf C(n)\hat {\mathbf x}(n|n-1) \big]\tag{DKF.11} x^(nn)=x^(nn1)+K(n)[y(n)C(n)x^(nn1)](DKF.11)
and the error
e ( n ∣ n ) = K ′ ( n ) e ( n ∣ n − 1 ) − K ( n ) v ( n ) = [ I − K ( n ) C ( n ) ] e ( n ∣ n − 1 ) − K ( n ) v ( n ) (DKF.12) \begin{aligned} \mathbf{e}(n | n) &=\mathbf{K}^{\prime}(n) \mathbf{e}(n| n-1)-\mathbf{K}(n) \mathbf{v}(n) \\ &=[\mathbf{I}-\mathbf{K}(n) \mathbf{C}(n)] \mathbf{e}(n | n-1)-\mathbf{K}(n) \mathbf{v}(n) \end{aligned}\tag{DKF.12} e(nn)=K(n)e(nn1)K(n)v(n)=[IK(n)C(n)]e(nn1)K(n)v(n)(DKF.12)
Thus, the error covariance matrix for e ( n ∣ n ) \mathbf{e}(n \mid n) e(nn) is
P ( n ∣ n ) = E { e ( n ∣ n ) e H ( n ∣ n ) } = [ I − K ( n ) C ( n ) ] P ( n ∣ n − 1 ) [ I − K ( n ) C ( n ) ] H + K ( n ) Q v ( n ) K H ( n ) (DKF.13) \begin{aligned} \mathbf{P}(n | n) &=E\left\{\mathbf{e}(n | n) \mathbf{e}^{H}(n | n)\right\} \\ &=[\mathbf{I}-\mathbf{K}(n) \mathbf{C}(n)] \mathbf{P}(n | n-1)[\mathbf{I}-\mathbf{K}(n) \mathbf{C}(n)]^{H}+\mathbf{K}(n) \mathbf{Q}_{v}(n) \mathbf{K}^{H}(n) \end{aligned}\tag{DKF.13} P(nn)=E{e(nn)eH(nn)}=[IK(n)C(n)]P(nn1)[IK(n)C(n)]H+K(n)Qv(n)KH(n)(DKF.13)
Next, we must find the value for the Kalman gain K ( n ) \mathbf{K}(n) K(n) that minimizes the mean-square error
ξ ( n ) = tr ⁡ { P ( n ∣ n ) } \xi(n)=\operatorname{tr}\{\mathbf{P}(n | n)\} ξ(n)=tr{P(nn)}
Differentiating ξ ( n ) \xi(n) ξ(n) with respect to K ( n ) , \mathbf{K}(n), K(n), setting the derivative to zero,
d d K tr ⁡ { P ( n ∣ n ) } = − 2 [ I − K ( n ) C ( n ) ] P ( n ∣ n − 1 ) C H ( n ) + 2 K ( n ) Q v ( n ) = 0 (DKF.14) \frac{d}{d \mathbf{K}} \operatorname{tr}\{\mathbf{P}(n |n)\}=-2[\mathbf{I}-\mathbf{K}(n) \mathbf{C}(n)] \mathbf{P}(n | n-1) \mathbf{C}^{H}(n)+2 \mathbf{K}(n) \mathbf{Q}_{v}(n)=0 \tag{DKF.14} dKdtr{P(nn)}=2[IK(n)C(n)]P(nn1)CH(n)+2K(n)Qv(n)=0(DKF.14)
Solving for K ( n ) \mathbf{K}(n) K(n) gives the desired expression for the Kalman gain,
K ( n ) = P ( n ∣ n − 1 ) C H ( n ) [ C ( n ) P ( n ∣ n − 1 ) C H ( n ) + Q v ( n ) ] − 1 (DKF.15) \mathbf{K}(n)=\mathbf{P}(n | n-1) \mathbf{C}^{H}(n)\left[\mathbf{C}(n) \mathbf{P}(n | n-1) \mathbf{C}^{H}(n)+\mathbf{Q}_{v}(n)\right]^{-1}\tag{DKF.15} K(n)=P(nn1)CH(n)[C(n)P(nn1)CH(n)+Qv(n)]1(DKF.15)
Having found the Kalman gain vector, we may simplify the expression given in ( D K F . 13 ) (DKF.13) (DKF.13) for the error covariance. First, we rewrite the expression for P ( n ∣ n ) \mathbf{P}(n | n) P(nn) as follows,
P ( n ∣ n ) = [ I − K ( n ) C ( n ) ] P ( n ∣ n − 1 ) − { [ I − K ( n ) C ( n ) ] P ( n ∣ n − 1 ) C H ( n ) + K ( n ) Q v ( n ) } K H ( n ) \begin{aligned} \mathbf{P}(n | n)=&[\mathbf{I}-\mathbf{K}(n) \mathbf{C}(n)] \mathbf{P}(n | n-1) \\ &-\left\{[\mathbf{I}-\mathbf{K}(n) \mathbf{C}(n)] \mathbf{P}(n | n-1) \mathbf{C}^{H}(n)+\mathbf{K}(n) \mathbf{Q}_{v}(n)\right\} \mathbf{K}^{H}(n) \end{aligned} P(nn)=[IK(n)C(n)]P(nn1){[IK(n)C(n)]P(nn1)CH(n)+K(n)Qv(n)}KH(n)
From ( D K F . 14 ) (DKF.14) (DKF.14) , however, it follows that the second term is equal to zero, which leads to the desired expression for the error covariance matrix
P ( n ∣ n ) = [ I − K ( n ) C ( n ) ] P ( n ∣ n − 1 ) (DKF.16) \mathbf{P}(n |n)=[\mathbf{I}-\mathbf{K}(n) \mathbf{C}(n)] \mathbf{P}(n | n-1) \tag{DKF.16} P(nn)=[IK(n)C(n)]P(nn1)(DKF.16)
Note: another derivation method can be found in the Slides 7c_Kalman

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值