贝叶斯一般线性模型(Bayesian general linear model)
贝叶斯线性模型可以表征为:
y
=
H
x
+
w
(1)
\boldsymbol y = \boldsymbol {H x} + \boldsymbol w \tag{1}
y=Hx+w(1)
其中 y ∈ R N \boldsymbol{y} \in \mathbb{R}^{N} y∈RN, H ∈ R N × p \boldsymbol{H} \in \mathbb{R}^{N \times p} H∈RN×p已知, x ∈ R p \boldsymbol x \in \mathbb{R}^{p} x∈Rp且 x ∼ N ( μ x , C x ) \boldsymbol x \sim \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) x∼N(μx,Cx), w ∈ R N \boldsymbol{w} \in \mathbb{R}^N w∈RN是噪声向量, w ∼ N ( 0 , C w ) \boldsymbol w \sim \mathcal{N}(\boldsymbol 0, \boldsymbol {C_w}) w∼N(0,Cw), x \boldsymbol x x与 w \boldsymbol{w} w相互独立。与传统的线性模型相比,贝叶斯线性模型将 x \boldsymbol{x} x看作是一个随机向量。
我们考虑
x
,
y
\boldsymbol{x,y}
x,y的联合概率分布,令
z
=
[
y
T
,
x
T
]
T
\boldsymbol{z} = [\boldsymbol{y}^T,\boldsymbol{x}^T]^T
z=[yT,xT]T,则
z
=
[
H
x
+
w
x
]
=
[
H
I
N
I
p
0
]
[
x
w
]
=
A
[
x
w
]
(2)
\begin{aligned} \boldsymbol{z}&=\left[ \begin{array}{c} \boldsymbol{Hx}+\boldsymbol{w}\\ \boldsymbol{x}\\ \end{array} \right] \,\, \\ &= \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \\ & = \boldsymbol{A} \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \tag{2} \end{aligned}
z=[Hx+wx]=[HIpIN0][xw]=A[xw](2)
因为 x \boldsymbol{x} x与 w \boldsymbol{w} w都服从高斯分布且各自独立,所以 [ x T , w T ] T [\boldsymbol{x}^T,\boldsymbol{w}^T]^T [xT,wT]T的联合分布也是高斯,因为 z \boldsymbol{z} z是由 [ x T , w T ] T [\boldsymbol{x}^T,\boldsymbol{w}^T]^T [xT,wT]T经过线性变换(矩阵 A \boldsymbol{A} A)得到的,所以 z \boldsymbol{z} z也服从高斯分布。关于高斯线性变换的性质,我们利用矩母函数,做出如下解释和证明。
高斯随机向量的线性变换仍为高斯:解释与证明
给定随机变量
X
∼
f
X
(
x
)
{X} \sim f_{{X}}(x)
X∼fX(x),其矩母函数为:
ϕ
X
(
w
)
=
∫
f
X
(
x
)
e
j
w
x
d
x
=
E
[
e
j
w
x
]
\begin{aligned} \phi_{X}(w) &= \int f_{{X}}(x) e^{jwx} \text{d}x \\ & = \mathbb{E} \left [ e^{jwx} \right] \end{aligned}
ϕX(w)=∫fX(x)ejwxdx=E[ejwx]
我们假设
X
∈
R
N
,
X
∼
N
(
μ
,
Σ
)
,
Y
=
A
X
∈
R
m
,
A
∈
R
m
×
n
\boldsymbol X \in \mathbb{R}^{N}, \boldsymbol X \sim \mathcal{N}(\boldsymbol \mu, \boldsymbol \Sigma), \boldsymbol Y = \boldsymbol {AX} \in \mathbb{R}^{m}, \boldsymbol A \in \mathbb{R}^{m \times n}
X∈RN,X∼N(μ,Σ),Y=AX∈Rm,A∈Rm×n
由于矩阵
A
\boldsymbol{A}
A不是方阵,无法从概率密度函数的角度分析
Y
\boldsymbol{Y}
Y的分布,我们借助矩母函数:
ϕ
Y
(
w
)
=
E
[
exp
(
j
w
T
y
)
]
=
E
[
exp
(
j
w
T
A
x
)
]
=
E
[
exp
(
j
(
A
T
w
)
T
x
)
]
=
ϕ
X
(
A
T
w
)
=
exp
(
j
w
T
A
μ
−
1
2
w
T
A
Σ
A
T
w
)
⇒
Y
∼
N
(
A
μ
,
A
Σ
A
T
)
(3)
\begin{aligned} \phi_{\boldsymbol Y}(\boldsymbol w) &= \mathbb{E} \left [ \exp(j \boldsymbol w^T \boldsymbol y) \right ] \\ & = \mathbb{E} \left [ \exp(j \boldsymbol w^T \boldsymbol {Ax}) \right ] \\ & = \mathbb{E} \left [ \exp(j (\boldsymbol A^T \boldsymbol w)^T \boldsymbol {x}) \right ] \\ & = \phi_{\boldsymbol X}(\boldsymbol A^T \boldsymbol w) \\ & = \exp(j \boldsymbol w^T \boldsymbol A \boldsymbol \mu - \frac{1}{2} \boldsymbol w^T \boldsymbol {A \Sigma A}^T \boldsymbol w) \\ \Rightarrow \boldsymbol Y &\sim \mathcal{N}(\boldsymbol {A\mu},\boldsymbol {A \Sigma A}^T) \end{aligned} \tag{3}
ϕY(w)⇒Y=E[exp(jwTy)]=E[exp(jwTAx)]=E[exp(j(ATw)Tx)]=ϕX(ATw)=exp(jwTAμ−21wTAΣATw)∼N(Aμ,AΣAT)(3)
因此高斯分布的线性变换仍然是高斯的。反过来,我们可以通过构造矩阵
A
\boldsymbol{A}
A来求解边际概率,比如要求
x
1
x_1
x1的边际概率,只需把矩阵
A
\boldsymbol{A}
A构造为第一个对角元素为1,其他元素都为0的矩阵即可。可以表述为:如果联合分布是高斯分布,则边际分布一定是高斯分布。但是反过来不一定成立:如果边际分布是高斯分布,则联合分布不一定是高斯分布,反例如下:构造
f
X
1
,
X
2
(
x
1
,
x
2
)
=
1
2
π
exp
(
−
x
1
2
+
x
2
2
2
)
+
K
(
x
1
,
x
2
)
f_{X_1,X_2}(x_1,x_2) = \frac{1}{2\pi} \exp \left( -\frac{x_1^2 + x_2^2}{2} \right) + K(x_1,x_2)
fX1,X2(x1,x2)=2π1exp(−2x12+x22)+K(x1,x2)
其中
∫
K
(
x
1
,
x
2
)
d
x
1
=
∫
K
(
x
1
,
x
2
)
d
x
2
=
0
\int K(x_1,x_2) \text{d} x_1 = \int K(x_1,x_2) \text{d} x_2 = 0
∫K(x1,x2)dx1=∫K(x1,x2)dx2=0
那么关于
x
1
,
x
2
x_1,x_2
x1,x2的边际分布是高斯分布,但是不满足高斯分布的一种构造方式为:
f
X
1
,
X
2
(
x
1
,
x
2
)
=
1
2
π
exp
(
−
x
1
2
+
x
2
2
2
)
(
1
+
sin
x
1
sin
x
2
)
f_{X_1,X_2}(x_1,x_2) = \frac{1}{2\pi} \exp \left( -\frac{x_1^2 + x_2^2}{2} \right)(1 + \sin x_1\sin x_2)
fX1,X2(x1,x2)=2π1exp(−2x12+x22)(1+sinx1sinx2)
可以看出,构造的例子满足边际概率为高斯分布,但是不满足联合概率为高斯。
贝叶斯线性模型下的联合高斯分布和边际分布
延续式(1)和式(2),因为独立性,不难得到
[
x
w
]
∼
N
(
[
μ
0
]
,
[
C
x
0
0
C
w
]
)
(5)
\left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{0}\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol 0\\ \boldsymbol 0& \boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \right) \tag{5}
[xw]∼N([μ0],[Cx00Cw])(5)
再考虑线性变换,根据式(4),可以得到:
z
=
[
H
I
N
I
p
0
]
[
x
w
]
∼
N
(
[
H
I
N
I
p
0
]
[
μ
0
]
,
[
H
I
N
I
p
0
]
[
C
x
0
0
C
w
]
[
H
T
I
p
I
N
0
]
)
\boldsymbol{z} = \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{0}\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol 0\\ \boldsymbol 0& \boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{H}^T& \boldsymbol{I}_p\\ \boldsymbol{I}_N& \boldsymbol{0}\\ \end{matrix} \right] \right)
z=[HIpIN0][xw]∼N([HIpIN0][μ0],[HIpIN0][Cx00Cw][HTINIp0])
即
[
y
x
]
∼
N
(
[
H
I
N
I
p
0
]
[
μ
0
]
,
[
H
I
N
I
p
0
]
[
C
x
0
0
C
w
]
[
H
T
I
p
I
N
0
]
)
(6)
\left[ \begin{array}{c} \boldsymbol{y}\\ \boldsymbol{x}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{0}\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol 0\\ \boldsymbol 0& \boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{H}^T& \boldsymbol{I}_p\\ \boldsymbol{I}_N& \boldsymbol{0}\\ \end{matrix} \right] \right) \tag{6}
[yx]∼N([HIpIN0][μ0],[HIpIN0][Cx00Cw][HTINIp0])(6)
式(6)也就是
x
,
y
\boldsymbol{x,y}
x,y的联合概率分布,化简为
[
x
y
]
∼
N
(
[
μ
H
μ
]
,
[
C
x
C
x
H
T
H
C
x
H
C
x
H
T
+
C
w
]
)
(7)
\left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{H\mu }\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{x}}\boldsymbol{H}^T\\ \boldsymbol{HC}_{\boldsymbol{x}}& \boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \right) \tag{7}
[xy]∼N([μHμ],[CxHCxCxHTHCxHT+Cw])(7)
基于(7),构造线性变换
[
0
0
0
I
]
[
x
y
]
\left[ \begin{matrix} \boldsymbol{0}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right]
[000I][xy]
可以得到关于
y
\boldsymbol{y}
y的边际概率:
y
∼
N
(
H
μ
,
H
C
x
H
T
+
C
w
)
(8)
\boldsymbol y \sim \mathcal{N}(\boldsymbol{H \mu},\boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}) \tag{8}
y∼N(Hμ,HCxHT+Cw)(8)
贝叶斯线性模型下的条件高斯分布
为了方便描述,我们令
[
x
y
]
∼
N
(
[
E
[
x
]
E
[
y
]
]
,
C
)
\left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \mathbb{E} [\boldsymbol{x}]\\ \mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] ,\boldsymbol{C} \right)
[xy]∼N([E[x]E[y]],C)
那么条件概率
p
(
y
∣
x
)
p(\boldsymbol{y}|\boldsymbol{x})
p(y∣x)可以表示为:
p
(
y
∣
x
)
=
p
(
x
,
y
)
p
(
x
)
=
1
(
2
π
)
N
+
p
2
det
1
2
(
C
)
exp
[
−
1
2
[
x
−
E
[
x
]
y
−
E
[
y
]
]
T
C
−
1
[
x
−
E
[
x
]
y
−
E
[
y
]
]
]
1
(
2
π
)
p
2
det
1
2
(
C
x
)
exp
[
−
1
2
(
x
−
E
[
x
)
T
C
x
−
1
(
x
−
E
[
x
)
]
\begin{aligned} p(\boldsymbol{y}|\boldsymbol{x}) & = \frac{p(\boldsymbol{x},\boldsymbol y)}{p(\boldsymbol x)} \\ & = \frac{\frac{1}{(2 \pi)^{{\frac{N+p}{2}}} \text{det}^{\frac{1}{2}} (\boldsymbol C)} \exp \left [ -\frac{1}{2} \left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] ^T\boldsymbol{C}^{-1}\left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] \right]} { \frac{1}{(2 \pi)^{\frac{p}{2}} \text{det}^{\frac{1}{2}} (\boldsymbol C_x)} \exp \left[ -\frac{1}{2} (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x})^T \boldsymbol C^{-1}_x (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x}) \right ] } \end{aligned}
p(y∣x)=p(x)p(x,y)=(2π)2pdet21(Cx)1exp[−21(x−E[x)TCx−1(x−E[x)](2π)2N+pdet21(C)1exp[−21[x−E[x]y−E[y]]TC−1[x−E[x]y−E[y]]]
将协方差矩阵构造为分块矩阵的形式(对应到式(7)):
C
=
[
C
x
C
x
y
C
y
x
C
y
]
\boldsymbol C = \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{C}_{\boldsymbol{yx}}& \boldsymbol{C}_{\boldsymbol{y}}\\ \end{matrix} \right]
C=[CxCyxCxyCy]
那么依据分块矩阵的行列式分解公式:
det
(
[
A
11
A
12
A
21
A
22
]
)
=
det
(
A
11
)
det
(
A
22
−
A
21
A
11
−
1
A
12
)
\text{det} \left ( \left[ \begin{matrix} \boldsymbol{A}_{11}& \boldsymbol{A}_{12}\\ \boldsymbol{A}_{21}& \boldsymbol{A}_{22}\\ \end{matrix} \right] \right) = \text{det} (\boldsymbol{A}_{11}) \text{det} (\boldsymbol{A}_{22} - \boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1} \boldsymbol{A}_{12})
det([A11A21A12A22])=det(A11)det(A22−A21A11−1A12)
因此
det
(
C
)
=
det
(
C
x
)
det
(
C
y
−
C
y
x
C
x
−
1
C
x
y
)
⇒
det
(
C
)
det
(
C
x
)
=
det
(
C
y
−
C
y
x
C
x
−
1
C
x
y
)
\begin{aligned} \text{det}(\boldsymbol C) &= \text{det}(\boldsymbol {C}_{\boldsymbol x}) \text{det} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}} ) \\ \Rightarrow \frac{\text{det}(\boldsymbol C)}{\text{det}(\boldsymbol {C}_{\boldsymbol x})} &= \text{det} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}} ) \end{aligned}
det(C)⇒det(Cx)det(C)=det(Cx)det(Cy−CyxCx−1Cxy)=det(Cy−CyxCx−1Cxy)
因此,我们可以进一步把
p
(
y
∣
x
)
p(\boldsymbol{y}|\boldsymbol{x})
p(y∣x)表示为:
p
(
y
∣
x
)
=
1
(
2
π
)
N
2
det
1
2
(
C
y
−
C
y
x
C
x
−
1
C
x
y
)
exp
(
−
1
2
Q
)
p(\boldsymbol{y}|\boldsymbol{x}) = \frac{1} {(2\pi)^{\frac{N}{2}} \text{det}^{\frac{1}{2}} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}})} \exp \left( -\frac{1}{2} Q \right)
p(y∣x)=(2π)2Ndet21(Cy−CyxCx−1Cxy)1exp(−21Q)
其中
Q
=
[
x
−
E
[
x
]
y
−
E
[
y
]
]
T
C
−
1
[
x
−
E
[
x
]
y
−
E
[
y
]
]
−
(
x
−
E
[
x
]
)
T
C
x
−
1
(
x
−
E
[
x
]
)
Q = \left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] ^T\boldsymbol{C}^{-1}\left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] - (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x}])^T \boldsymbol C^{-1}_x (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x}])
Q=[x−E[x]y−E[y]]TC−1[x−E[x]y−E[y]]−(x−E[x])TCx−1(x−E[x])
对于分块对称矩阵
C
\boldsymbol{C}
C,其求逆公式为:
[
A
11
A
12
A
21
A
22
]
−
1
=
[
(
A
11
−
A
12
A
22
−
1
A
21
)
−
1
−
A
11
−
1
A
12
(
A
22
−
A
21
A
11
−
1
A
12
)
−
1
−
(
A
22
−
A
21
A
11
−
1
A
12
)
−
1
A
21
A
11
−
1
(
A
22
−
A
21
A
11
−
1
A
12
)
−
1
]
\left[ \begin{matrix} \boldsymbol{A}_{11}& \boldsymbol{A}_{12}\\ \boldsymbol{A}_{21}& \boldsymbol{A}_{22}\\ \end{matrix} \right] ^{-1}=\left[ \begin{matrix} \left( \boldsymbol{A}_{11}-\boldsymbol{A}_{12}\boldsymbol{A}_{22}^{-1}\boldsymbol{A}_{21} \right) ^{-1}& -\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12}\left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\\ -\left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}& \left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\\ \end{matrix} \right]
[A11A21A12A22]−1=[(A11−A12A22−1A21)−1−(A22−A21A11−1A12)−1A21A11−1−A11−1A12(A22−A21A11−1A12)−1(A22−A21A11−1A12)−1]
根据矩阵求逆引理
(
A
11
−
A
12
A
22
−
1
A
21
)
−
1
=
A
11
−
1
+
A
11
−
1
A
12
(
A
22
−
A
21
A
11
−
1
A
12
)
−
1
A
21
A
11
−
1
\left( \boldsymbol{A}_{11}-\boldsymbol{A}_{12}\boldsymbol{A}_{22}^{-1}\boldsymbol{A}_{21} \right) ^{-1}\,\,=\,\,\boldsymbol{A}_{11}^{-1}\,\,+\,\,\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12}\left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}
(A11−A12A22−1A21)−1=A11−1+A11−1A12(A22−A21A11−1A12)−1A21A11−1
我们代入可以得到
C
−
1
=
[
C
x
−
1
−
C
x
−
1
C
x
y
B
−
1
C
y
x
C
x
−
1
−
C
x
−
1
C
x
y
B
−
1
−
B
−
1
C
y
x
C
x
−
1
B
−
1
]
\boldsymbol{C}^{-1}=\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}-\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\boldsymbol{B}^{-1}\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& -\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\boldsymbol{B}^{-1}\\ -\boldsymbol{B}^{-1}\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{B}^{-1}\\ \end{matrix} \right]
C−1=[Cx−1−Cx−1CxyB−1CyxCx−1−B−1CyxCx−1−Cx−1CxyB−1B−1]
其中
B
=
C
y
y
−
C
y
x
C
x
x
−
1
C
x
y
\boldsymbol B = \boldsymbol C_{\boldsymbol {y y}} - \boldsymbol C_{\boldsymbol {y x}} \boldsymbol C^{-1}_{\boldsymbol {x x}} \boldsymbol C_{\boldsymbol {x y}}
B=Cyy−CyxCxx−1Cxy
进一步,我们把
C
−
1
\boldsymbol{C}^{-1}
C−1分解为:
C
−
1
=
[
I
−
C
x
−
1
C
x
y
0
I
]
[
C
x
−
1
0
0
B
−
1
]
[
I
0
−
C
y
x
C
x
−
1
I
]
\boldsymbol{C}^{-1}=\,\,\left[ \begin{matrix} \boldsymbol{I}& -\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{0}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{B}^{-1}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{I}& \boldsymbol{0}\\ -\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{I}\\ \end{matrix} \right]
C−1=[I0−Cx−1CxyI][Cx−100B−1][I−CyxCx−10I]
令
x
~
=
x
−
E
[
x
]
\tilde{ \boldsymbol x} = \boldsymbol{x} - \mathbb{E}[\boldsymbol{x}]
x~=x−E[x],
y
~
=
y
−
E
[
y
]
\tilde{ \boldsymbol y} = \boldsymbol{y} - \mathbb{E}[\boldsymbol{y}]
y~=y−E[y],我们有
Q
=
[
x
~
y
~
]
T
[
I
−
C
x
−
1
C
x
y
0
I
]
[
C
x
−
1
0
0
B
−
1
]
[
I
0
−
C
y
x
C
x
−
1
I
]
[
x
~
y
~
]
−
x
~
T
C
x
−
1
x
~
=
[
x
~
y
~
−
C
y
x
C
x
−
1
x
~
]
T
[
C
x
−
1
0
0
B
−
1
]
[
x
~
y
~
−
C
y
x
C
x
−
1
x
~
]
−
x
~
T
C
x
−
1
x
~
=
(
y
~
−
C
y
x
C
x
−
1
x
~
)
T
B
−
1
(
y
~
−
C
y
x
C
x
−
1
x
~
)
\begin{aligned} Q &= \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}\\ \end{array} \right] ^T\left[ \begin{matrix} \boldsymbol{I}& -\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{0}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{B}^{-1}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{I}& \boldsymbol{0}\\ -\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}\\ \end{array} \right] \,\,-\,\,\boldsymbol{\tilde{x}}^T\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \\ & = \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}}\\ \end{array} \right] ^T\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{B}^{-1}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}}\\ \end{array} \right] -\,\,\boldsymbol{\tilde{x}}^T\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \\ & = \left( \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \right) ^T\boldsymbol{B}^{-1}\left( \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \right) \end{aligned}
Q=[x~y~]T[I0−Cx−1CxyI][Cx−100B−1][I−CyxCx−10I][x~y~]−x~TCx−1x~=[x~y~−CyxCx−1x~]T[Cx−100B−1][x~y~−CyxCx−1x~]−x~TCx−1x~=(y~−CyxCx−1x~)TB−1(y~−CyxCx−1x~)
因此,条件概率
p
(
y
∣
x
)
p(\boldsymbol{y}|\boldsymbol{x})
p(y∣x)表示为:
p
(
y
∣
x
)
=
1
(
2
π
)
N
2
det
1
2
(
C
y
−
C
y
x
C
x
−
1
C
x
y
)
exp
(
−
1
2
∥
(
C
y
−
C
y
x
C
x
x
−
1
C
x
y
)
−
1
2
(
y
−
(
E
[
y
]
+
C
y
x
C
x
−
1
(
x
−
E
[
x
]
)
)
)
∥
2
2
)
p(\boldsymbol{y}|\boldsymbol{x}) = \frac{1} {(2\pi)^{\frac{N}{2}} \text{det}^{\frac{1}{2}} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}})} \exp \left( -\frac{1}{2} { \left \Vert \left (\boldsymbol C_{\boldsymbol {y }} - \boldsymbol C_{\boldsymbol {y x}} \boldsymbol C^{-1}_{\boldsymbol {x x}} \boldsymbol C_{\boldsymbol {x y}}\right )^{-\frac{1}{2}} \left( \boldsymbol y - \left(\mathbb{E}[\boldsymbol y] + \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} (\boldsymbol{x} - \mathbb{E}[\boldsymbol{x}]) \right) \right) \right \Vert}^2_2 \right)
p(y∣x)=(2π)2Ndet21(Cy−CyxCx−1Cxy)1exp(−21∥∥∥(Cy−CyxCxx−1Cxy)−21(y−(E[y]+CyxCx−1(x−E[x])))∥∥∥22)
即
y
∣
x
∼
N
(
E
[
y
]
+
C
y
x
C
x
−
1
(
x
−
E
[
x
]
)
,
C
y
−
C
y
x
C
x
x
−
1
C
x
y
)
(9)
\boldsymbol y| \boldsymbol x \sim \mathcal{N}\left(\mathbb{E}[\boldsymbol y] + \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} (\boldsymbol{x} - \mathbb{E}[\boldsymbol{x}]), \boldsymbol C_{\boldsymbol {y}} - \boldsymbol C_{\boldsymbol {y x}} \boldsymbol C^{-1}_{\boldsymbol {x x}} \boldsymbol C_{\boldsymbol {x y}} \right) \tag{9}
y∣x∼N(E[y]+CyxCx−1(x−E[x]),Cy−CyxCxx−1Cxy)(9)
类似地,我们可以得到
x
∣
y
∼
N
(
E
[
x
]
+
C
x
y
C
y
−
1
(
y
−
E
[
y
]
)
,
C
x
−
C
x
y
C
y
y
−
1
C
y
x
)
(10)
\boldsymbol x| \boldsymbol y \sim \mathcal{N}\left(\mathbb{E}[\boldsymbol x] + \boldsymbol{C}_{\boldsymbol{xy}} \boldsymbol {C}_{\boldsymbol y}^{-1} (\boldsymbol{y} - \mathbb{E}[\boldsymbol{y}]), \boldsymbol C_{\boldsymbol {x}} - \boldsymbol C_{\boldsymbol {x y}} \boldsymbol C^{-1}_{\boldsymbol {y y}} \boldsymbol C_{\boldsymbol {y x}} \right) \tag{10}
x∣y∼N(E[x]+CxyCy−1(y−E[y]),Cx−CxyCyy−1Cyx)(10)
结合条件高斯分布的线性模型
贝叶斯线性模型中协方差的对应关系:
C
=
[
C
x
C
x
y
C
y
x
C
y
]
=
[
C
x
C
x
H
T
H
C
x
H
C
x
H
T
+
C
w
]
(11)
\boldsymbol C = \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{C}_{\boldsymbol{yx}}& \boldsymbol{C}_{\boldsymbol{y}}\\ \end{matrix} \right] = \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{x}}\boldsymbol{H}^T\\ \boldsymbol{HC}_{\boldsymbol{x}}& \boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \tag{11}
C=[CxCyxCxyCy]=[CxHCxCxHTHCxHT+Cw](11)
(1)似然函数
似然分布对应式(9),把式(11)代入到式(9)中,我们发现:
y
∣
x
∼
N
(
y
;
H
x
,
C
w
)
(12)
\boldsymbol y | \boldsymbol x \sim \mathcal{N} \left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \tag{12}
y∣x∼N(y;Hx,Cw)(12)
我们要说明上述似然函数是由标准的联合分布推导的,很巧的是,该式与贝叶斯线性模型
y
=
H
x
+
w
\boldsymbol y = \boldsymbol {H x} + \boldsymbol w
y=Hx+w直观意义上的似然形式一致,如果
x
\boldsymbol{x}
x和
w
\boldsymbol{w}
w都服从高斯分布(这是大前提,对于一般的
x
\boldsymbol{x}
x的分布,我现在还不确定是否可以直接这样写,感觉应该是不能,具体问题可能得写出(2)的线性转换模型,再根据矩母函数和相应的逆变换判断),我们可以根据概率公式直接写出联合概率
p
(
y
,
x
)
=
p
(
y
∣
x
)
p
(
x
)
=
N
(
y
;
H
x
,
C
w
)
⋅
N
(
μ
x
,
C
x
)
(13)
p(\boldsymbol y, \boldsymbol x)=p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x)=\mathcal{N}\left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \cdot \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) \tag{13}
p(y,x)=p(y∣x)p(x)=N(y;Hx,Cw)⋅N(μx,Cx)(13)
(2)后验函数
后验分布对应式(10),把式(11)代入到式(10)中,我们发现:
E
[
x
∣
y
]
=
E
[
x
]
+
C
x
H
T
(
H
C
x
H
T
+
C
w
)
−
1
(
y
−
E
[
y
]
)
=
μ
x
+
C
x
H
T
(
H
C
x
H
T
+
C
w
)
−
1
(
y
−
H
μ
x
)
(14)
\begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] & = \mathbb{E} [\boldsymbol x] + \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} (\boldsymbol y - \mathbb{E} [\boldsymbol y]) \\ &= \boldsymbol \mu_{\boldsymbol x} + \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \tag{14} \end{aligned}
E[x∣y]=E[x]+CxHT(HCxHT+Cw)−1(y−E[y])=μx+CxHT(HCxHT+Cw)−1(y−Hμx)(14)
与之对应的协方差矩阵为
C
x
∣
y
=
C
x
−
C
x
H
T
(
H
C
x
H
T
+
C
w
)
−
1
H
C
x
(15)
\boldsymbol C_{\boldsymbol x|\boldsymbol y} = \boldsymbol C_{\boldsymbol x} - \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} \boldsymbol H \boldsymbol C_{\boldsymbol x} \tag{15}
Cx∣y=Cx−CxHT(HCxHT+Cw)−1HCx(15)
借助求逆定理
(
E
+
B
C
D
)
−
1
=
E
−
1
−
E
−
1
B
(
C
−
1
+
D
E
−
1
B
)
−
1
D
E
−
1
(\pmb E + \pmb B \pmb C \pmb D)^{-1}=\pmb E^{-1}- \pmb E^{-1} \pmb B (\pmb C^{-1}+ D \pmb E^{-1} \pmb B)^{-1} \pmb D \pmb E^{-1}
(EEE+BBBCCCDDD)−1=EEE−1−EEE−1BBB(CCC−1+DEEE−1BBB)−1DDDEEE−1
经过一系列化简,式(14)(15)还可以化为:
E
[
x
∣
y
]
=
μ
x
+
(
C
x
−
1
+
H
T
C
w
−
1
H
)
−
1
H
T
C
w
−
1
(
y
−
H
μ
x
)
C
x
∣
y
=
(
C
x
−
1
+
H
T
C
w
−
1
H
)
−
1
(16)
\begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] &= \boldsymbol \mu_{\boldsymbol x} + \left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \\ \boldsymbol C_{\boldsymbol x|\boldsymbol y} &= \left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \end{aligned} \tag{16}
E[x∣y]Cx∣y=μx+(Cx−1+HTCw−1H)−1HTCw−1(y−Hμx)=(Cx−1+HTCw−1H)−1(16)
(3)似然函数 → \rightarrow →联合分布 → \rightarrow →后验分布
根据Bayes公式:
p
(
x
∣
y
)
=
p
(
y
∣
x
)
p
(
x
)
p
(
y
)
=
p
(
y
∣
x
)
p
(
x
)
∫
p
(
y
∣
x
)
p
(
x
)
d
x
\begin{aligned} p( \boldsymbol x|\boldsymbol y) &= \frac{p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x)}{p(\boldsymbol y)} \\ & = \frac{p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x)}{\int p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x) \text{d} \boldsymbol x} \end{aligned}
p(x∣y)=p(y)p(y∣x)p(x)=∫p(y∣x)p(x)dxp(y∣x)p(x)
因为分母是归一化因子(或者理解为
y
\boldsymbol{y}
y已经被观测到了,所以认为
p
(
y
)
p(\boldsymbol y)
p(y)已知),所以有:
p
(
x
∣
y
)
∝
p
(
y
∣
x
)
p
(
x
)
=
N
(
y
;
H
x
,
C
w
)
⋅
N
(
μ
x
,
C
x
)
\begin{aligned} p( \boldsymbol x|\boldsymbol y) & \propto p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x) \\ & = \mathcal{N}\left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \cdot \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) \end{aligned}
p(x∣y)∝p(y∣x)p(x)=N(y;Hx,Cw)⋅N(μx,Cx)
根据之前我写的博客两个复高斯分布的乘积,我们可以得到
N
(
y
;
H
x
,
C
w
)
⋅
N
(
μ
x
,
C
x
)
\mathcal{N}\left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \cdot \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x})
N(y;Hx,Cw)⋅N(μx,Cx)的均值和方差为
E
[
x
∣
y
]
=
(
C
x
−
1
+
H
T
C
w
−
1
H
)
−
1
(
C
x
−
1
μ
x
+
H
T
C
w
−
1
y
)
C
x
∣
y
=
(
C
x
−
1
+
H
T
C
w
−
1
H
)
−
1
(17)
\begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] &={\left ( \boldsymbol C^{-1}_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H ^{} \right )}^{-1} \left( \boldsymbol C^{-1}_{\boldsymbol x}\boldsymbol \mu_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C_{\boldsymbol w}^{-1}\boldsymbol{y } \right) \\ \boldsymbol C_{\boldsymbol x|\boldsymbol y} &= {\left ( \boldsymbol C^{-1}_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C_{\boldsymbol w}^{-1} \boldsymbol H ^{} \right )}^{-1} \end{aligned} \tag{17}
E[x∣y]Cx∣y=(Cx−1+HTCw−1H)−1(Cx−1μx+HTCw−1y)=(Cx−1+HTCw−1H)−1(17)
总结
贝叶斯线性模型:
y
=
H
x
+
w
\boldsymbol y = \boldsymbol {H x} + \boldsymbol w
y=Hx+w
其中 y ∈ R N \boldsymbol{y} \in \mathbb{R}^{N} y∈RN, H ∈ R N × p \boldsymbol{H} \in \mathbb{R}^{N \times p} H∈RN×p已知, x ∈ R p \boldsymbol x \in \mathbb{R}^{p} x∈Rp且 x ∼ N ( μ x , C x ) \boldsymbol x \sim \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) x∼N(μx,Cx), w ∈ R N \boldsymbol{w} \in \mathbb{R}^N w∈RN是噪声向量, w ∼ N ( 0 , C w ) \boldsymbol w \sim \mathcal{N}(\boldsymbol 0, \boldsymbol {C_w}) w∼N(0,Cw), x \boldsymbol x x与 w \boldsymbol{w} w相互独立。
(1)
x
,
y
\boldsymbol {x,y}
x,y的联合分布
[
x
y
]
∼
N
(
[
μ
H
μ
]
,
[
C
x
C
x
H
T
H
C
x
H
C
x
H
T
+
C
w
]
)
\left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{H\mu }\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{x}}\boldsymbol{H}^T\\ \boldsymbol{HC}_{\boldsymbol{x}}& \boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \right)
[xy]∼N([μHμ],[CxHCxCxHTHCxHT+Cw])
(2)
y
\boldsymbol {y}
y的边际分布
y
∼
N
(
H
μ
,
H
C
x
H
T
+
C
w
)
\boldsymbol y \sim \mathcal{N}(\boldsymbol{H \mu},\boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}})
y∼N(Hμ,HCxHT+Cw)
(3)
y
∣
x
\boldsymbol y | \boldsymbol x
y∣x似然分布
y
∣
x
∼
N
(
y
;
H
x
,
C
w
)
\boldsymbol y | \boldsymbol x \sim \mathcal{N} \left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right)
y∣x∼N(y;Hx,Cw)
我们要说明上述似然函数是由标准的联合分布推导的,很巧的是,该式与贝叶斯线性模型 y = H x + w \boldsymbol y = \boldsymbol {H x} + \boldsymbol w y=Hx+w直观意义上的似然形式一致( x \boldsymbol{x} x和 w \boldsymbol{w} w都服从高斯分布是大前提,对于一般的 x \boldsymbol{x} x的分布,我现在还不确定是否可以直接这样写,感觉应该是不能,具体问题可能得写出(2)的线性转换模型,再根据矩母函数和相应的逆变换判断)
(4)
x
∣
y
\boldsymbol x | \boldsymbol y
x∣y后验分布
E
[
x
∣
y
]
=
a
μ
x
+
C
x
H
T
(
H
C
x
H
T
+
C
w
)
−
1
(
y
−
H
μ
x
)
=
b
μ
x
+
(
C
x
−
1
+
H
T
C
w
−
1
H
)
−
1
H
T
C
w
−
1
(
y
−
H
μ
x
)
=
c
(
C
x
−
1
+
H
T
C
w
−
1
H
)
−
1
(
C
x
−
1
μ
x
+
H
T
C
w
−
1
y
)
\begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] & \overset{a}{=} \boldsymbol \mu_{\boldsymbol x} + \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \\ & \overset{b}{=} \boldsymbol \mu_{\boldsymbol x} + \left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \\ & \overset{c}{=} {\left ( \boldsymbol C^{-1}_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H ^{} \right )}^{-1} \left( \boldsymbol C^{-1}_{\boldsymbol x}\boldsymbol \mu_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C_{\boldsymbol w}^{-1}\boldsymbol{y } \right) \end{aligned}
E[x∣y]=aμx+CxHT(HCxHT+Cw)−1(y−Hμx)=bμx+(Cx−1+HTCw−1H)−1HTCw−1(y−Hμx)=c(Cx−1+HTCw−1H)−1(Cx−1μx+HTCw−1y)
上述
(
a
,
b
,
c
)
(a,b,c)
(a,b,c)三式是等价的,我们常见的应该是式(a)。这三个式子与LMMSE的形式也是等价的(只是形式上等价,与LMMSE的推导过程无关),因为后验分布是高斯分布,所以LMMSE与MMSE等价。
与之对应的协方差矩阵:
C
x
∣
y
=
a
C
x
−
C
x
H
T
(
H
C
x
H
T
+
C
w
)
−
1
H
C
x
=
b
,
c
(
C
x
−
1
+
H
T
C
w
−
1
H
)
−
1
\begin{aligned} \boldsymbol C_{\boldsymbol x|\boldsymbol y} & \overset{a}{=}\boldsymbol C_{\boldsymbol x} - \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} \boldsymbol H \boldsymbol C_{\boldsymbol x} \\ & \overset{b,c}{=}\left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \end{aligned}
Cx∣y=aCx−CxHT(HCxHT+Cw)−1HCx=b,c(Cx−1+HTCw−1H)−1
上式协方差的标号与 E [ x ∣ y ] \mathbb{E} [\boldsymbol x| \boldsymbol y] E[x∣y]的的标号相对应。