在开始之前,先回顾一下正交、不相关和独立之间的联系与差别
-
正交
随机变量: R ( x , y ) = E [ x y ] \mathcal R(x, y) = \mathbb E[xy] R(x,y)=E[xy]为相关函数,若 R ( x y ) = 0 \mathcal R(xy)=0 R(xy)=0,则认为 x , y x,y x,y正交。(类比内积,注意,相关函数为0,是正交,不是不相关)
随机过程: R ( X ( t ) , Y ( t ) ) = E [ X ( t ) Y ( t ) ] \mathcal R(X(t), Y(t)) = \mathbb E[X(t)Y(t)] R(X(t),Y(t))=E[X(t)Y(t)],若 R ( X ( t ) , Y ( t ) ) = 0 \mathcal R(X(t), Y(t)) =0 R(X(t),Y(t))=0,则认为 X ( t ) , Y ( t ) X(t), Y(t) X(t),Y(t)正交。 -
不相关
随机变量: E [ x y ] = E [ x ] E [ y ] \mathbb E[xy] = \mathbb E [x] \mathbb E[y] E[xy]=E[x]E[y],则认为 x , y x,y x,y不相关。
随机过程: E [ X ( t ) Y ( t ) ] = E [ X ( t ) ] E [ Y ( t ) ] \mathbb E[X(t)Y(t)] = \mathbb E [X(t)] \mathbb E[Y(t)] E[X(t)Y(t)]=E[X(t)]E[Y(t)],则认为 X ( t ) , Y ( t ) X(t), Y(t) X(t),Y(t)不相关。
注意:当随机变量为高斯随机变量,或随机过程为高斯随机过程时,不相关与独立等价。 -
独立
若联合分布 p ( x , y ) = p ( x ) ⋅ p ( y ) p(x,y)=p(x) \cdot p(y) p(x,y)=p(x)⋅p(y),则认为 x , y x,y x,y独立。 -
协方差的相关和独立
协方差函数 Cov ( x , y ) = E [ ( x − E [ x ] ) ( y − E [ y ] ) ] \text{Cov}(x,y) = \mathbb E\left [ (x - \mathbb E[x])(y - \mathbb E[y]) \right] Cov(x,y)=E[(x−E[x])(y−E[y])],若 Cov ( x , y ) = 0 \text{Cov}(x,y) = 0 Cov(x,y)=0,则称 x , y x,y x,y不相关(不相关只是说明两者没有线性关系,但是不代表有任何关系)
正交、不相关与独立之间的关系:
- 独立 ⇒ \Rightarrow ⇒不相关
- 高斯随机变量时,独立 ⇔ \Leftrightarrow ⇔不相关
- 当其中一个变量的均值为0时,不相关 ⇔ \Leftrightarrow ⇔正交,否则没关系
Kalman滤波:标量形式
考虑标量的状态方程(scalar state equation)和标量观测方程(scalar observation equation):
s
[
n
]
=
a
s
[
n
−
1
]
+
u
[
n
]
(1)
s[n] = a s[n-1] + u[n] \tag{1}
s[n]=as[n−1]+u[n](1)
x [ n ] = s [ n ] + w [ n ] (2) x[n] = s[n] + w[n] \tag{2} x[n]=s[n]+w[n](2)
其中,我们假设
s
[
−
1
]
∼
N
(
μ
s
,
σ
s
)
s[-1] \sim \mathcal{N}(\mu_s,\sigma_s)
s[−1]∼N(μs,σs)。
u
[
n
]
u[n]
u[n]是零均值的高斯噪声,
E
[
u
2
[
n
]
]
=
σ
u
2
\mathbb{E}[u^2[n]]=\sigma_u^2
E[u2[n]]=σu2,且
{
u
[
n
]
}
\{u[n]\}
{u[n]}之间相互独立。
w
[
n
]
w[n]
w[n]是零均值的高斯噪声,
E
[
w
2
[
n
]
]
=
σ
n
2
\mathbb{E}[w^2[n]]=\sigma_n^2
E[w2[n]]=σn2,且
{
w
[
n
]
}
\{w[n]\}
{w[n]}之间相互独立。为了简化过程,我们假设
μ
s
=
0
\mu_s=0
μs=0。我们要从观测值
{
x
[
0
]
,
x
[
1
]
,
⋯
,
x
[
n
]
}
\{x[0],x[1],\cdots,x[n]\}
{x[0],x[1],⋯,x[n]}中估计出
s
[
n
]
s[n]
s[n]。我们指定基于
{
x
[
0
]
,
x
[
1
]
,
⋯
,
x
[
n
]
}
\{x[0],x[1],\cdots,x[n]\}
{x[0],x[1],⋯,x[n]}来估计
s
[
n
]
s[n]
s[n]的估计器为
s
^
[
n
∣
m
]
\hat{s}[n|m]
s^[n∣m]。我们的最优准则(criterion of optimality)基于最小化贝叶斯MSE(minimum Bayes MSE),用公式表示为
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
]
)
2
]
\mathbb{E} \left [ (s[n] - \hat{s}[n|n])^2 \right]
E[(s[n]−s^[n∣n])2]
求该期望所对应的概率为联合概率密度函数 p ( x [ 0 ] , x [ 1 ] , ⋯ , x [ n ] , s [ n ] ) p(x[0],x[1],\cdots,x[n],s[n]) p(x[0],x[1],⋯,x[n],s[n]) (在这一点上要区别于经典的MSE,经典的MSE与Bayes-MSE区别在于如何看待 s [ n ] s[n] s[n]:经典的MSE是把 s [ n ] s[n] s[n]看作是一个未知的参数,所以MSE求期望的是基于 p ( x [ 0 ] , x [ 1 ] , ⋯ , x [ n ] ; s [ n ] ) p(x[0],x[1],\cdots,x[n];s[n]) p(x[0],x[1],⋯,x[n];s[n]);而Bayes-MSE把 s [ n ] s[n] s[n]看作是一个随机变量。)
MMSE估计器是后验均值:
s
^
[
n
∣
n
]
=
E
[
s
[
n
]
∣
x
[
0
]
,
x
[
1
]
,
⋯
,
x
[
n
]
]
(3)
\hat{s}[n|n] = \mathbb{E} \left [ s[n]| x[0],x[1],\cdots, x[n] \right] \tag{3}
s^[n∣n]=E[s[n]∣x[0],x[1],⋯,x[n]](3)
令
θ
=
s
[
n
]
\theta=s[n]
θ=s[n]和
x
=
[
x
[
0
]
,
x
[
1
]
,
⋯
,
x
[
n
]
]
T
\boldsymbol{x} = [x[0],x[1],\cdots,x[n]]^T
x=[x[0],x[1],⋯,x[n]]T是联合高斯的,所以有
s
^
[
n
∣
n
]
=
C
θ
x
C
x
x
−
1
x
(4)
\hat{s} [n|n] = \boldsymbol C_{\theta x} \boldsymbol C^{-1}_{x x} \boldsymbol{ x} \tag{4}
s^[n∣n]=CθxCxx−1x(4)
因为我们假设的统计特征都是基于高斯的,所以MMSE估计器是线性的,也就与LMMSE估计器一致。
关于MMSE估计器:估计 θ \theta θ,我们给出两个性质:
-
性质1:基于两个不相关数据向量 x 1 , x 2 \boldsymbol{x}_1,\boldsymbol{x}_2 x1,x2,假设他们服从联合高斯分布,那么
θ ^ = E [ θ ∣ x 1 , x 2 ] = E [ θ ∣ x 1 ] + E [ θ ∣ x 2 ] \begin{aligned} \hat{\theta} & = \mathbb{E} \left [ \theta| \boldsymbol{x}_1,\boldsymbol{x}_2 \right] \\ &= \mathbb{E} \left [ \theta| \boldsymbol{x}_1 \right] + \mathbb{E} \left [ \theta| \boldsymbol{x}_2 \right] \end{aligned} θ^=E[θ∣x1,x2]=E[θ∣x1]+E[θ∣x2]关于该性质,我们做出两种证明或解释,如下所述:
解释1:因为 x = [ x 1 T , x 2 T ] T \boldsymbol{x} = [\boldsymbol{x}_1^T, \boldsymbol{x}_2^T]^T x=[x1T,x2T]T服从高斯分布,所以
θ ^ = E [ θ ∣ x ] = E [ θ ] + C θ x C x x − 1 ( x − E [ x ] ) = C θ x C x x − 1 x \begin{aligned} \hat{\theta} = \mathbb{E}[\theta|\boldsymbol x] &= \mathbb{E}[\theta] + \boldsymbol C_{\theta x} \boldsymbol C^{-1}_{x x} (\boldsymbol x - \mathbb{E}[\boldsymbol x]) \\ &= \boldsymbol C_{\theta x} \boldsymbol C^{-1}_{x x} \boldsymbol x \end{aligned} θ^=E[θ∣x]=E[θ]+CθxCxx−1(x−E[x])=CθxCxx−1x因为我们假设 E [ θ ] = 0 \mathbb{E}[\theta]=0 E[θ]=0, E [ x ] = 0 \mathbb{E}[\boldsymbol x]=0 E[x]=0,这样的假设是合理的,因为我们可以在开始处理之前先减掉均值。
考虑到 x 1 , x 2 \boldsymbol{x}_1,\boldsymbol{x}_2 x1,x2不相关,且 E [ x 1 ] = E [ x 2 ] = 0 \mathbb{E}[\boldsymbol x_1]=\mathbb{E}[\boldsymbol x_2]=\boldsymbol{0} E[x1]=E[x2]=0,所以 E [ x 1 x 2 T ] = E [ x 1 ] E [ x 2 T ] = 0 \mathbb{E}[\boldsymbol x_1 \boldsymbol{x}^T_2] = \mathbb{E}[\boldsymbol x_1] \mathbb{E}[\boldsymbol{x}^T_2]=\boldsymbol{0} E[x1x2T]=E[x1]E[x2T]=0,因此可以得到,
C x x − 1 = [ C x 1 x 1 C x 1 x 2 C x 2 x 1 C x 2 x 2 ] − 1 = [ C x 1 x 1 0 0 C x 2 x 2 ] − 1 = [ C x 1 x 1 − 1 0 0 C x 2 x 2 − 1 ] \begin{aligned} \boldsymbol{C}_{xx}^{-1}&=\left[ \begin{matrix} {\boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}}}_{\boldsymbol{x}_{\boldsymbol{1}}}& {\boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}}}_{\boldsymbol{x}_2}\\ {\boldsymbol{C}_{\boldsymbol{x}_2}}_{\boldsymbol{x}_{\boldsymbol{1}}}& {\boldsymbol{C}_{\boldsymbol{x}_2}}_{\boldsymbol{x}_2}\\ \end{matrix} \right] ^{-1} \\ &=\left[ \begin{matrix} {\boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}}}_{\boldsymbol{x}_{\boldsymbol{1}}}& \boldsymbol{0}\\ \boldsymbol{0}& {\boldsymbol{C}_{\boldsymbol{x}_2}}_{\boldsymbol{x}_2}\\ \end{matrix} \right] ^{-1} \\ &=\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}\boldsymbol{x}_{\boldsymbol{1}}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{C}_{\boldsymbol{x}_2\boldsymbol{x}_2}^{-1}\\ \end{matrix} \right] \end{aligned} Cxx−1=[Cx1x1Cx2x1Cx1x2Cx2x2]−1=[Cx1x100Cx2x2]−1=[Cx1x1−100Cx2x2−1]并且,
C θ x = E [ θ [ x 1 x 2 ] T ] = [ C θ x 1 C θ x 2 ] \boldsymbol C_{\theta x} = \mathbb{E} \left[ \boldsymbol{\theta }\left[ \begin{array}{c} \boldsymbol{x}_1\\ \boldsymbol{x}_2\\ \end{array} \right] ^T \right] =\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{\theta x}_1}& \boldsymbol{C}_{\boldsymbol{\theta x}_2}\\ \end{matrix} \right] Cθx=E[θ[x1x2]T]=[Cθx1Cθx2]因此,
θ = [ C θ x 1 C θ x 2 ] [ C x 1 x 1 − 1 0 0 C x 2 x 2 − 1 ] [ x 1 x 2 ] = C θ x 1 C x 1 x 1 − 1 x 1 + C θ x 2 C x 2 x 2 − 1 x 2 = E [ θ ∣ x 1 ] + E [ θ ∣ x 2 ] \begin{aligned} \boldsymbol \theta &= \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{\theta x}_1}& \boldsymbol{C}_{\boldsymbol{\theta x}_2}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}\boldsymbol{x}_{\boldsymbol{1}}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{C}_{\boldsymbol{x}_2\boldsymbol{x}_2}^{-1}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}_1\\ \boldsymbol{x}_2\\ \end{array} \right] \\ & = \boldsymbol{C}_{\boldsymbol{\theta x}_1} \boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}\boldsymbol{x}_{\boldsymbol{1}}}^{-1} \boldsymbol x_1 + \boldsymbol{C}_{\boldsymbol{\theta x}_2} \boldsymbol{C}_{\boldsymbol{x}_2\boldsymbol{x}_2}^{-1} \boldsymbol x_2 \\ & = \mathbb{E} \left [ \theta| \boldsymbol{x}_1 \right] + \mathbb{E} \left [ \theta| \boldsymbol{x}_2 \right] \end{aligned} θ=[Cθx1Cθx2][Cx1x1−100Cx2x2−1][x1x2]=Cθx1Cx1x1−1x1+Cθx2Cx2x2−1x2=E[θ∣x1]+E[θ∣x2]解释2:从线性空间的角度来看,应该会比较形象,因为 E [ x 1 x 2 T ] = E [ x 1 ] E [ x 2 T ] = 0 \mathbb{E}[\boldsymbol x_1 \boldsymbol{x}^T_2] = \mathbb{E}[\boldsymbol x_1] \mathbb{E}[\boldsymbol{x}^T_2]=\boldsymbol{0} E[x1x2T]=E[x1]E[x2T]=0,我们知道 x 1 \boldsymbol{x}_1 x1与 x 2 \boldsymbol{x}_2 x2是相互正交的,所以可以表征为各自估计的结果的和。 -
性质2:MMSE估计器是可加的,如果 θ = θ 1 + θ 2 \theta = \theta_1 + \theta_2 θ=θ1+θ2,那么
θ ^ = E [ θ ∣ x ] = E [ θ 1 + θ 2 ∣ x ] = E [ θ 1 ∣ x ] + E [ θ 2 ∣ x ] \begin{aligned} \hat{\theta} &= \mathbb{E}[\theta|\boldsymbol x] \\ &= \mathbb{E}[\theta_1+\theta_2|\boldsymbol x] \\ & = \mathbb{E}[\theta_1|\boldsymbol x] + \mathbb{E}[\theta_2|\boldsymbol x] \end{aligned} θ^=E[θ∣x]=E[θ1+θ2∣x]=E[θ1∣x]+E[θ2∣x]
在描述完两个性质后,我们令
X
[
n
]
=
[
x
[
0
]
,
x
[
1
]
,
⋯
,
x
[
n
]
]
T
\boldsymbol{ X}[n] = [x[0],x[1],\cdots,x[n]]^T
X[n]=[x[0],x[1],⋯,x[n]]T,令
x
~
[
n
]
\tilde{x}[n]
x~[n]为innovation(The innovation is the part of
x
[
n
]
x[n]
x[n] that is uncorrelated with the previous samples
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
}
\{x[0],\cdots,x[n-1]\}
{x[0],⋯,x[n−1]}):
x
~
[
n
]
=
x
[
n
]
−
x
^
[
n
∣
n
−
1
]
(5)
\tilde {x}[n] = x[n] - \hat{x}[n|n-1] \tag{5}
x~[n]=x[n]−x^[n∣n−1](5)
这里我想强调一下为什么
x
~
[
n
]
\tilde{x}[n]
x~[n]与
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
}
\{x[0],\cdots,x[n-1]\}
{x[0],⋯,x[n−1]}不相关,因为
x
^
[
n
∣
n
−
1
]
\hat{x}[n|n-1]
x^[n∣n−1]是基于观测数据
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
}
\{x[0],\cdots,x[n-1]\}
{x[0],⋯,x[n−1]}所做的关于
x
[
n
]
x[n]
x[n]的MMSES估计,根据正交原理:估计误差
x
~
[
n
]
\tilde{ x}[n]
x~[n]与观测数据的线性组合(这里为数据本身)正交,所以得到
x
~
[
n
]
\tilde{x}[n]
x~[n]与
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
}
\{x[0],\cdots,x[n-1]\}
{x[0],⋯,x[n−1]}不相关。事实上,我们可以把
X
[
n
]
\boldsymbol{X}[n]
X[n]和
x
~
[
n
]
\tilde{x}[n]
x~[n]等效为集合
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
,
x
[
n
]
}
\{x[0],\cdots,x[n-1],x[n]\}
{x[0],⋯,x[n−1],x[n]},因为
x
[
n
]
x[n]
x[n]可以被恢复为:
x
[
n
]
=
x
~
[
n
]
+
x
^
[
n
∣
n
−
1
]
=
x
~
[
n
]
+
∑
k
=
0
n
−
1
a
k
x
[
k
]
\begin{aligned} x[n] &= \tilde {x}[n] + \hat{x}[n|n-1] \\ &= \tilde {x}[n] + \sum_{k=0}^{n-1} a_k x[k] \end{aligned}
x[n]=x~[n]+x^[n∣n−1]=x~[n]+k=0∑n−1akx[k]
其中
a
k
a_k
ak是MMSE估计器对应的相关系数,我们可以把式(3)写为:
s
^
[
n
∣
n
]
=
E
[
s
[
n
]
∣
X
[
n
−
1
]
,
x
~
[
n
]
]
\hat{s}[n|n] = \mathbb{E} \left [ s[n] | \boldsymbol X[n-1], \tilde x[n] \right]
s^[n∣n]=E[s[n]∣X[n−1],x~[n]]
又因为
X
[
n
−
1
]
\boldsymbol{X}[n-1]
X[n−1]与
x
~
[
n
]
\tilde{x}[n]
x~[n]不相关,根据性质1可以得到:
s
^
[
n
∣
n
]
=
E
[
s
[
n
]
∣
X
[
n
−
1
]
]
+
E
[
s
[
n
]
∣
x
~
[
n
]
]
\hat{s}[n|n] = \mathbb{E} \left [ s[n] | \boldsymbol X[n-1] \right] + \mathbb{E} \left [ s[n] | \tilde x[n] \right]
s^[n∣n]=E[s[n]∣X[n−1]]+E[s[n]∣x~[n]]
其中,
E
[
s
[
n
]
∣
X
[
n
−
1
]
]
\mathbb{E}[s[n]|\boldsymbol{X}[n-1]]
E[s[n]∣X[n−1]]是基于先前观测数据对
s
[
n
]
s[n]
s[n]的预测,令其为
s
^
[
n
∣
n
−
1
]
\hat{s}[n|n-1]
s^[n∣n−1],根据式(1)和性质2,我们可以进一步得到:
s
^
[
n
∣
n
−
1
]
=
E
[
s
[
n
]
∣
X
[
n
−
1
]
]
=
E
[
a
s
[
n
−
1
]
+
u
[
n
]
∣
X
[
n
−
1
]
]
=
a
E
[
s
[
n
−
1
]
∣
X
[
n
−
1
]
]
=
a
s
^
[
n
−
1
∣
n
−
1
]
\begin{aligned} \hat{s}[n|n-1] &= \mathbb{E} \left [ s[n] | \boldsymbol X[n-1] \right] \\ &= \mathbb{E} \left [ as[n-1] + u[n] | \boldsymbol X[n-1] \right] \\ & = a \mathbb{E} \left [ s[n-1] | \boldsymbol X[n-1] \right] \\ &= a \hat{s}[n-1|n-1] \end{aligned}
s^[n∣n−1]=E[s[n]∣X[n−1]]=E[as[n−1]+u[n]∣X[n−1]]=aE[s[n−1]∣X[n−1]]=as^[n−1∣n−1]
因为
E
[
u
[
n
]
∣
X
[
n
−
1
]
]
=
0
\mathbb{E} \left [ u[n] | \boldsymbol X[n-1] \right]=0
E[u[n]∣X[n−1]]=0,这是因为
E
[
u
[
n
]
∣
X
[
n
−
1
]
]
=
E
[
u
[
n
]
]
=
0
\mathbb{E} \left [ u[n] | \boldsymbol X[n-1] \right] = \mathbb{E} [u[n]] = 0
E[u[n]∣X[n−1]]=E[u[n]]=0
这是因为
u
[
n
]
u[n]
u[n]独立于
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
}
\{x[0],\cdots,x[n-1]\}
{x[0],⋯,x[n−1]}(该独立性来源于两个方面:首先,
u
[
n
]
u[n]
u[n]独立于所有的
w
[
n
]
w[n]
w[n];其次,
s
[
0
]
,
s
[
1
]
,
⋯
,
s
[
n
−
1
]
s[0],s[1],\cdots,s[n-1]
s[0],s[1],⋯,s[n−1]是随机变量
{
u
[
0
]
,
u
[
1
]
,
⋯
,
u
[
n
−
1
]
,
s
[
−
1
]
}
\{u[0],u[1],\cdots,u[n-1],s[-1]\}
{u[0],u[1],⋯,u[n−1],s[−1]}的线性组合,这些随机变量独立于
u
[
n
]
u[n]
u[n])。现在,我们有
s
^
[
n
∣
n
]
=
s
^
[
n
∣
n
−
1
]
+
E
[
s
[
n
]
∣
x
~
[
n
]
]
(6)
\hat{s}[n|n] = \hat{s}[n|n-1] + \mathbb{E} \left [ s[n]| \tilde x[n] \right] \tag{6}
s^[n∣n]=s^[n∣n−1]+E[s[n]∣x~[n]](6)
其中,
s
^
[
n
∣
n
−
1
]
=
a
s
^
[
n
−
1
∣
n
−
1
]
\hat{s}[n|n-1] = a \hat{s}[n-1|n-1]
s^[n∣n−1]=as^[n−1∣n−1]
注意到,
E
[
s
[
n
]
∣
x
~
[
n
]
]
\mathbb{E} \left [ s[n]| \tilde x[n] \right]
E[s[n]∣x~[n]]是基于
x
~
[
n
]
\tilde{x}[n]
x~[n]对
s
[
n
]
s[n]
s[n]的MMSE估计,因此该估计器是线性的,
E
[
s
[
n
]
∣
x
~
[
n
]
]
\mathbb{E} \left [ s[n]| \tilde x[n] \right]
E[s[n]∣x~[n]]可以被表征为:
E
[
s
[
n
]
∣
x
~
[
n
]
]
=
K
[
n
]
x
~
[
n
]
=
K
[
n
]
(
x
[
n
]
−
x
^
[
n
∣
n
−
1
]
)
\begin{aligned} \mathbb{E} \left [ s[n]| \tilde x[n] \right] & = K[n] \tilde x[n]\\ & = K[n] (x[n] - \hat{x}[n|n-1] ) \end{aligned}
E[s[n]∣x~[n]]=K[n]x~[n]=K[n](x[n]−x^[n∣n−1])
(因为
s
[
n
]
s[n]
s[n]的均值为0,所以这里没有所谓的“截距”项),其中
K
[
n
]
=
E
[
s
[
n
]
x
~
[
n
]
]
E
[
x
~
2
[
n
]
]
(7)
K[n] = \frac{\mathbb{E} \left [ s[n] \tilde{x}[n] \right]}{\mathbb{E}[\tilde x^2[n]]} \tag{7}
K[n]=E[x~2[n]]E[s[n]x~[n]](7)
上式是对
θ
,
x
\theta,x
θ,x联合高斯分布的MMSE估计器,即
θ
^
=
C
θ
x
C
x
x
−
1
x
=
E
[
θ
x
]
E
[
x
~
2
[
n
]
]
\hat{\theta} = C_{\theta x} C^{-1}_{x x} x = \frac{\mathbb{E}[\theta x]}{\mathbb{E}[\tilde x^2[n]]}
θ^=CθxCxx−1x=E[x~2[n]]E[θx]
又因为标量观测方程:
x
[
n
]
=
s
[
n
]
+
w
[
n
]
x[n] = s[n] + w[n]
x[n]=s[n]+w[n],根据性质2,我们可以得到
x
^
[
n
∣
n
−
1
]
=
s
^
[
n
∣
n
−
1
]
+
w
^
[
n
∣
n
−
1
]
=
s
^
[
n
∣
n
−
1
]
\begin{aligned} \hat x[n|n-1] &= \hat s[n|n-1] + \hat w[n|n-1] \\ &= \hat{s}[n|n-1] \end{aligned}
x^[n∣n−1]=s^[n∣n−1]+w^[n∣n−1]=s^[n∣n−1]
根据式(6),我们知道
s
^
[
n
∣
n
]
=
s
^
[
n
∣
n
−
1
]
+
K
[
n
]
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
(8)
\hat{s}[n|n] = \hat{s}[n|n-1] + K[n](x[n] - \hat s[n|n-1]) \tag{8}
s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1])(8)
其中
s
^
[
n
∣
n
−
1
]
=
a
s
^
[
n
−
1
∣
n
−
1
]
(9)
\hat{ s}[n|n-1] = a \hat{s}[n-1|n-1] \tag{9}
s^[n∣n−1]=as^[n−1∣n−1](9)
现在只剩增益因子
K
[
n
]
K[n]
K[n]需要决定,根据式(7),我们知道
K
[
n
]
=
E
[
s
[
n
]
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
]
E
[
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
(10)
K[n] = \frac{\mathbb{E}\left [ s[n] (x[n] - \hat{s}[n|n-1]) \right ]}{\mathbb{E} \left [ (x[n] - \hat{s}[n|n-1])^2 \right]} \tag{10}
K[n]=E[(x[n]−s^[n∣n−1])2]E[s[n](x[n]−s^[n∣n−1])](10)
为了进一步完善 K [ n ] K[n] K[n],我们先给出两个结论:
-
- E [ s [ n ] ( x [ n ] − s ^ [ n ∣ n − 1 ] ) ] = E [ ( s [ n ] − s ^ [ n ∣ n − 1 ] ) ( x [ n ] − s ^ [ n ∣ n − 1 ] ) ] \mathbb{E} \left [ s[n] (x[n] - \hat{s}[n|n-1]) \right ] = \mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ] E[s[n](x[n]−s^[n∣n−1])]=E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]
-
- E [ w [ n ] ( s [ n ] − s ^ [ n ∣ n − 1 ] ) ] = 0 \mathbb{E} \left [ w[n]\left ( s[n] - \hat{s}[n|n-1] \right) \right ] = 0 E[w[n](s[n]−s^[n∣n−1])]=0
第一个结论是因为
x
~
[
n
]
=
x
[
n
]
−
x
^
[
n
∣
n
−
1
]
=
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
(11)
\begin{aligned} \tilde x [n] &= x[n] - \hat{x}[n|n-1] \\ &= x[n] - \hat{s}[n|n-1] \tag{11} \end{aligned}
x~[n]=x[n]−x^[n∣n−1]=x[n]−s^[n∣n−1](11)
与之前的观测数据
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
}
\{x[0],\cdots,x[n-1]\}
{x[0],⋯,x[n−1]}不相关,必然也就与
s
^
[
n
∣
n
−
1
]
\hat{s}[n|n-1]
s^[n∣n−1](为
{
x
[
0
]
,
⋯
,
x
[
n
−
1
]
}
\{x[0],\cdots,x[n-1]\}
{x[0],⋯,x[n−1]}的线性组合)不相关,因此
E
[
s
^
[
n
∣
n
−
1
]
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
]
=
0
\mathbb{E}[\hat{s}[n|n-1](x[n] - \hat{s}[n|n-1])]=0
E[s^[n∣n−1](x[n]−s^[n∣n−1])]=0,也就得到了结论1。第二个结论比较直接,这里不做解释。把这两个结论代入到式
(
10
)
(10)
(10)中,增益因子变为:
K
[
n
]
=
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
]
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
+
w
[
n
]
)
2
]
=
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
σ
n
2
+
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
(12)
\begin{aligned} K[n] &= \frac{\mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ]}{\mathbb{E} \left [{\left( s[n] - \hat{s}[n|n-1] + w[n] \right)}^2 \right ]} \\ & = \frac{\mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right]}{ \sigma^2_n + \mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right] } \tag{12} \end{aligned}
K[n]=E[(s[n]−s^[n∣n−1]+w[n])2]E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]=σn2+E[(s[n]−s^[n∣n−1])2]E[(s[n]−s^[n∣n−1])2](12)
上式的分子变为平方项是因为
x
[
n
]
=
s
[
n
]
+
w
[
n
]
x[n] = s[n]+w[n]
x[n]=s[n]+w[n],而
w
[
n
]
w[n]
w[n]独立于
s
[
n
]
s[n]
s[n]和
s
^
[
n
∣
n
−
1
]
\hat{s}[n|n-1]
s^[n∣n−1]。另外,注意到,分子项
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
\mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right]
E[(s[n]−s^[n∣n−1])2]就是基于先前观测数据MMSE估计所对应的最小MSE,记为
M
[
n
∣
n
−
1
]
M[n|n-1]
M[n∣n−1],那么
K
[
n
]
=
M
[
n
∣
n
−
1
]
σ
n
2
+
M
[
n
∣
n
−
1
]
(13)
K[n] = \frac{M[n|n-1]}{\sigma^2_n + M[n|n-1]} \tag{13}
K[n]=σn2+M[n∣n−1]M[n∣n−1](13)
因为
s
[
n
]
=
a
s
[
n
−
1
]
+
u
[
n
]
,
s
^
[
n
∣
n
−
1
]
=
a
s
^
[
n
−
1
∣
n
−
1
]
s[n]=as[n-1]+u[n], \hat{ s}[n|n-1] = a \hat{s}[n-1|n-1]
s[n]=as[n−1]+u[n],s^[n∣n−1]=as^[n−1∣n−1],我们有
M
[
n
∣
n
−
1
]
=
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
=
E
[
(
a
s
[
n
−
1
]
+
u
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
=
E
[
(
a
(
s
[
n
−
1
]
−
s
^
[
n
−
1
∣
n
−
1
]
)
+
u
[
n
]
)
2
]
\begin{aligned} M[n|n-1] & = \mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right] \\ & = \mathbb{E} \left [ (as[n-1] + u[n] - \hat{s}[n|n-1])^2 \right] \\ & = \mathbb{E} \left [ \left(a(s[n-1] - \hat{s}[n-1|n-1] ) + u[n] \right)^2 \right] \end{aligned}
M[n∣n−1]=E[(s[n]−s^[n∣n−1])2]=E[(as[n−1]+u[n]−s^[n∣n−1])2]=E[(a(s[n−1]−s^[n−1∣n−1])+u[n])2]
不难发现,
E
[
(
s
[
n
−
1
]
−
s
^
[
n
−
1
∣
n
−
1
]
)
u
[
n
]
]
=
0
\mathbb{E} \left [ \left (s[n-1] - \hat{s}[n-1|n-1] \right) u [n]\right] = 0
E[(s[n−1]−s^[n−1∣n−1])u[n]]=0
因此,我们可以得到
M
[
n
∣
n
−
1
]
=
a
2
M
[
n
−
1
∣
n
−
1
]
+
σ
u
2
M[n|n-1] = a^2 M[n-1|n-1] + \sigma^2_u
M[n∣n−1]=a2M[n−1∣n−1]+σu2
最终,我们需要对
M
[
n
∣
n
]
M[n|n]
M[n∣n]进行迭代,利用式(8):
s
^
[
n
∣
n
]
=
s
^
[
n
∣
n
−
1
]
+
K
[
n
]
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
\hat{s}[n|n] = \hat{s}[n|n-1] + K[n](x[n] - \hat s[n|n-1])
s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1]),我们有
M
[
n
∣
n
]
=
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
]
)
2
]
=
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
−
K
[
n
]
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
)
2
]
=
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
−
2
K
[
n
]
⋅
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
]
+
K
2
[
n
]
⋅
E
[
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
2
]
\begin{aligned} M[n|n] & = \mathbb{E} \left [ (s[n] - \hat{s}[n|n])^2 \right] \\ &= \mathbb{E} \left [ \left ( s[n] - \hat{s}[n|n-1] - K[n](x[n] - \hat s[n|n-1]) \right)^2 \right] \\ & = \mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right] - 2 K[n] \cdot \mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ] \\ & \ \ \ \ + K^2[n] \cdot \mathbb{E} \left [ (x[n] - \hat{s}[n|n-1])^2 \right] \end{aligned}
M[n∣n]=E[(s[n]−s^[n∣n])2]=E[(s[n]−s^[n∣n−1]−K[n](x[n]−s^[n∣n−1]))2]=E[(s[n]−s^[n∣n−1])2]−2K[n]⋅E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])] +K2[n]⋅E[(x[n]−s^[n∣n−1])2]
注意到,第二项的期望就是式(12)中
K
[
n
]
K[n]
K[n]的分子,最后一项的期望是
K
[
n
]
K[n]
K[n]的分母项,得到
E
[
(
s
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
]
=
K
[
n
]
(
M
[
n
∣
n
−
1
]
+
σ
n
2
)
\mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ] = K[n](M[n|n-1] + \sigma_n^2)
E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]=K[n](M[n∣n−1]+σn2)
E [ ( x [ n ] − s ^ [ n ∣ n − 1 ] ) 2 ] = M [ n ∣ n − 1 ] K [ n ] \mathbb{E} \left [ (x[n] - \hat{s}[n|n-1])^2 \right] = \frac{M[n|n-1]}{K[n]} E[(x[n]−s^[n∣n−1])2]=K[n]M[n∣n−1]
因此,
M
[
n
∣
n
]
=
M
[
n
∣
n
−
1
]
−
2
K
2
[
n
]
(
M
[
n
∣
n
−
1
]
+
σ
n
2
)
+
K
[
n
]
M
[
n
∣
n
−
1
]
=
M
[
n
∣
n
−
1
]
−
2
K
[
n
]
M
[
n
∣
n
−
1
]
+
K
[
n
]
M
[
n
∣
n
−
1
]
=
(
1
−
K
[
n
]
)
M
[
n
∣
n
−
1
]
\begin{aligned} M[n|n] & = M[n|n-1] - 2K^2[n] (M[n|n-1] + \sigma^2_n) + K[n]M[n|n-1] \\ & = M[n|n-1] - 2K[n] M[n|n-1] + K[n] M[n|n-1] \\ & = (1-K[n]) M[n|n-1] \end{aligned}
M[n∣n]=M[n∣n−1]−2K2[n](M[n∣n−1]+σn2)+K[n]M[n∣n−1]=M[n∣n−1]−2K[n]M[n∣n−1]+K[n]M[n∣n−1]=(1−K[n])M[n∣n−1]
至此,我们完成了标量形式Kalman滤波的推导,总结为:
∀
n
≥
0
\forall n \geq 0
∀n≥0,
Prediction:
s
^
[
n
∣
n
−
1
]
=
a
s
^
[
n
−
1
∣
n
−
1
]
(14)
\hat{s}[n|n-1] = a \hat{s} [n-1|n-1] \tag{14}
s^[n∣n−1]=as^[n−1∣n−1](14)
Minimum Prediction MSE:
M
[
n
∣
n
−
1
]
=
a
2
M
[
n
−
1
∣
n
−
1
]
+
σ
u
2
(15)
M[n|n-1] = a^2 M[n-1|n-1] + \sigma^2_u \tag{15}
M[n∣n−1]=a2M[n−1∣n−1]+σu2(15)
Kalman Gain:
K
[
n
]
=
M
[
n
∣
n
−
1
]
σ
n
2
+
M
[
n
∣
n
−
1
]
(16)
K[n] = \frac{M[n|n-1]}{\sigma^2_n + M[n|n-1]} \tag{16}
K[n]=σn2+M[n∣n−1]M[n∣n−1](16)
Correction:
s
^
[
n
∣
n
]
=
s
^
[
n
∣
n
−
1
]
+
K
[
n
]
(
x
[
n
]
−
s
^
[
n
∣
n
−
1
]
)
(17)
\hat{s}[n|n] = \hat{s}[n|n-1] + K[n] (x[n] - \hat{s}[n|n-1]) \tag{17}
s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1])(17)
Minimum MSE:
M
[
n
∣
n
]
=
(
1
−
K
[
n
]
)
M
[
n
∣
n
−
1
]
(18)
M[n|n] = (1-K[n]) M[n|n-1] \tag{18}
M[n∣n]=(1−K[n])M[n∣n−1](18)
回顾之前的推导,我们知道均值为0的假设(包括
μ
s
=
0
,
E
[
s
[
n
]
]
=
0
\mu_s=0,\mathbb{E}[s[n]]=0
μs=0,E[s[n]]=0)是为了利用正交性原理,但事实上,即使
μ
s
≠
0
\mu_s \neq 0
μs=0,最终得到的公式与(14-18)式是一致的。在初始化过程中,我们使用
s
^
[
−
1
∣
−
1
]
=
E
[
s
[
−
1
]
]
=
μ
s
\hat{s}[-1|-1] = \mathbb{E}[s[-1]] = \mu_s
s^[−1∣−1]=E[s[−1]]=μs和
M
[
−
1
∣
−
1
]
=
σ
s
2
M[-1|-1] = \sigma^2_s
M[−1∣−1]=σs2,因为这是没有观测数据之前所能掌握的数据。另外,我们可以把增益部分的估计视为对
u
[
n
]
u[n]
u[n]的估计
u
^
[
n
]
\hat{u}[n]
u^[n],公式表征为:
s
^
[
n
∣
n
]
=
a
s
^
[
n
−
1
∣
n
−
1
]
+
u
^
[
n
]
\hat{s}[n|n] = a \hat{s}[n-1|n-1] + \hat{u} [n]
s^[n∣n]=as^[n−1∣n−1]+u^[n]
其中 u ^ [ n ] = K [ n ] ( x [ n ] − s ^ [ n ∣ n − 1 ] ) \hat{u}[n] = K[n] (x[n] - \hat{s}[n|n-1]) u^[n]=K[n](x[n]−s^[n∣n−1]),某种程度上来说,该估计可以认为是对 u [ n ] u[n] u[n]的估计,所以合理地认为 s ^ [ n ∣ n ] ≈ s [ n ] \hat{s}[n|n] \approx s[n] s^[n∣n]≈s[n]。