假设
p
(
x
)
∝
C
N
(
μ
1
,
Σ
1
)
C
N
(
μ
2
,
Σ
2
)
p(\boldsymbol x) \propto \mathcal{CN}(\boldsymbol \mu_1, \boldsymbol \Sigma_1)\mathcal{CN}(\boldsymbol \mu_2, \boldsymbol \Sigma_2)
p(x)∝CN(μ1,Σ1)CN(μ2,Σ2),有
p
(
x
)
∝
exp
{
−
(
x
−
μ
1
)
H
Σ
1
−
1
(
x
−
μ
1
)
−
(
x
−
μ
2
)
H
Σ
2
−
1
(
x
−
μ
2
)
}
∝
exp
{
−
x
H
Σ
1
−
1
x
+
2
R
{
μ
1
H
Σ
1
−
1
x
}
−
x
H
Σ
2
−
1
x
+
2
R
{
μ
2
H
Σ
2
−
1
x
}
}
=
exp
{
−
x
H
(
Σ
1
−
1
+
Σ
2
−
1
)
x
+
2
R
{
(
μ
1
H
Σ
1
−
1
+
μ
2
H
Σ
2
−
1
)
x
}
}
=
exp
{
−
[
x
H
[
Σ
1
−
1
+
Σ
2
−
1
⏟
=
Σ
x
−
1
]
x
−
2
R
{
[
Σ
x
(
Σ
1
−
1
μ
1
+
Σ
2
−
1
μ
2
)
⏟
=
μ
x
]
H
Σ
x
−
1
x
}
]
}
=
exp
{
−
[
x
H
Σ
x
−
1
x
+
2
R
μ
x
H
Σ
x
−
1
x
]
}
∝
exp
{
−
(
x
−
μ
x
)
H
Σ
−
1
(
x
−
μ
x
)
}
\begin{aligned} p(\boldsymbol x)&\propto \exp{\left \{ -{(\boldsymbol x - \boldsymbol \mu_1)^H \boldsymbol \Sigma^{-1}_1 (\boldsymbol x - \boldsymbol \mu_1)}{} -{(\boldsymbol x - \boldsymbol \mu_2)^H \boldsymbol \Sigma^{-1}_2 (\boldsymbol x - \boldsymbol \mu_2)}{} \right \}} \\ &\propto \exp{\left \{ -{\boldsymbol x^H \boldsymbol \Sigma^{-1}_1 \boldsymbol x + 2 \mathcal{R} \left \{\boldsymbol \mu^H_1 \boldsymbol \Sigma^{-1}_1 \boldsymbol x\right \}}{} - {\boldsymbol x^H \boldsymbol \Sigma^{-1}_2 \boldsymbol x + 2 \mathcal{R} \left \{\boldsymbol \mu^H_2 \boldsymbol \Sigma^{-1}_2 \boldsymbol x\right \}}{} \right \}} \\ &=\exp{\left \{ {-\boldsymbol x^H \left ( \boldsymbol \Sigma^{-1}_1 + \boldsymbol \Sigma^{-1}_2 \right ) \boldsymbol x + 2 \mathcal{R} \left \{ \left( \boldsymbol \mu^H_1 \boldsymbol \Sigma^{-1}_1 + \boldsymbol \mu^H_2 \boldsymbol \Sigma^{-1}_2 \right ) \boldsymbol x \right \}}{} \right \}} \\ & = \exp \left \{ - \left [\boldsymbol x^H \left [ \mathop {\underbrace{{\boldsymbol{\varSigma }_{1}^{-1}+\boldsymbol{\varSigma }_{2}^{-1}}{}}} \limits_{=\boldsymbol{\varSigma }_{x}^{-1}} \right ] \boldsymbol x - 2 \mathcal{ R } \left \{ \left[ \mathop {\underbrace{{\boldsymbol{\varSigma }_x\left( \boldsymbol{\varSigma }_{1}^{-1}\boldsymbol{\mu }_1+\boldsymbol{\varSigma }_{2}^{-1}\boldsymbol{\mu }_2 \right)}{}}} \limits_{=\boldsymbol{\mu }_x} \right] ^H \boldsymbol \Sigma^{-1}_x \boldsymbol x \right \} \right ]\right \} \\ &= \exp \left \{ - \left [\boldsymbol x^H \boldsymbol \Sigma^{-1}_x \boldsymbol x + 2 \mathcal{ R } \boldsymbol \mu^H_x \boldsymbol \Sigma^{-1}_x \boldsymbol x \right ] \right \} \\ & \propto \exp \left \{ - (\boldsymbol x - \boldsymbol \mu_x)^H \boldsymbol \Sigma^{-1} (\boldsymbol x - \boldsymbol \mu_x) \right \} \end{aligned}
p(x)∝exp{−(x−μ1)HΣ1−1(x−μ1)−(x−μ2)HΣ2−1(x−μ2)}∝exp{−xHΣ1−1x+2R{μ1HΣ1−1x}−xHΣ2−1x+2R{μ2HΣ2−1x}}=exp{−xH(Σ1−1+Σ2−1)x+2R{(μ1HΣ1−1+μ2HΣ2−1)x}}=exp⎩⎪⎪⎨⎪⎪⎧−⎣⎢⎢⎡xH⎣⎢⎡=Σx−1
Σ1−1+Σ2−1⎦⎥⎤x−2R⎩⎪⎪⎨⎪⎪⎧⎣⎢⎡=μx
Σx(Σ1−1μ1+Σ2−1μ2)⎦⎥⎤HΣx−1x⎭⎪⎪⎬⎪⎪⎫⎦⎥⎥⎤⎭⎪⎪⎬⎪⎪⎫=exp{−[xHΣx−1x+2RμxHΣx−1x]}∝exp{−(x−μx)HΣ−1(x−μx)}
因此
x
∼
C
N
(
μ
x
,
Σ
x
)
x \sim \mathcal {CN}(\boldsymbol \mu_x, \boldsymbol \Sigma_x)
x∼CN(μx,Σx),其中
μ
x
=
Σ
x
(
Σ
1
−
1
μ
1
+
Σ
2
−
1
μ
2
)
Σ
x
=
(
Σ
1
−
1
+
Σ
2
−
1
)
−
1
\begin{aligned} \boldsymbol \mu_x & = \boldsymbol{\varSigma }_x\left( \boldsymbol{\varSigma }_{1}^{-1}\boldsymbol{\mu }_1+\boldsymbol{\varSigma }_{2}^{-1}\boldsymbol{\mu }_2 \right) \\ \boldsymbol \Sigma_x &= { \left (\boldsymbol \Sigma^{-1}_1 + \boldsymbol \Sigma^{-1}_2 \right )}^{-1} \end{aligned}
μxΣx=Σx(Σ1−1μ1+Σ2−1μ2)=(Σ1−1+Σ2−1)−1
注意,当
x
\boldsymbol x
x为标量
x
x
x时,
x
∼
C
N
(
μ
x
,
σ
x
)
x \sim \mathcal{CN}(\mu_x, \sigma_x)
x∼CN(μx,σx)
σ
x
=
(
1
σ
1
2
+
1
σ
2
2
)
−
1
=
σ
1
2
σ
2
2
σ
1
2
+
σ
2
2
μ
x
=
σ
1
2
σ
2
2
σ
1
2
+
σ
2
2
(
μ
1
σ
1
2
+
μ
2
σ
2
2
)
\begin{aligned} \sigma_x &= {\left ( \frac{1}{\sigma^2_1}+\frac{1}{\sigma^2_2} \right )}^{-1}= \frac{\sigma^2_1 \sigma^2_2}{\sigma^2_1+\sigma^2_2}\\ \mu_x &= \frac{\sigma^2_1 \sigma^2_2}{\sigma^2_1+\sigma^2_2} \left ( \frac{\mu_1}{\sigma^2_1} + \frac{\mu_2}{\sigma^2_2} \right ) \end{aligned}
σxμx=(σ121+σ221)−1=σ12+σ22σ12σ22=σ12+σ22σ12σ22(σ12μ1+σ22μ2)
然而,在实际编写代码的过程中,考虑到数值的稳定性,我们一般按照如下顺序执行:
g
=
σ
1
2
σ
1
2
+
σ
2
2
μ
x
=
g
⋅
(
μ
2
−
μ
1
)
+
μ
1
σ
x
=
g
⋅
σ
1
2
\begin{aligned} g & = \frac{\sigma^2_1 }{\sigma^2_1+\sigma^2_2} \\ \mu_x &= g \cdot (\mu_2 - \mu_1) + \mu_1 \\ \sigma_x &= g \cdot \sigma^2_1 \end{aligned}
gμxσx=σ12+σ22σ12=g⋅(μ2−μ1)+μ1=g⋅σ12
另外,考虑
常见的线性模型
\textbf{常见的线性模型}
常见的线性模型:
y
=
A
x
+
w
\boldsymbol y = \boldsymbol {Ax} + \boldsymbol w
y=Ax+w
其中
x
\boldsymbol x
x的先验分布:
x
∼
C
N
(
x
;
r
,
Σ
1
)
\boldsymbol x \sim \mathcal {CN}(\boldsymbol x; \boldsymbol r, \boldsymbol \Sigma_1)
x∼CN(x;r,Σ1),似然分布
y
∣
A
x
∼
C
N
(
y
;
A
x
,
Σ
2
)
\boldsymbol y | \boldsymbol {Ax} \sim \mathcal{CN}(\boldsymbol y; \boldsymbol{Ax}, \boldsymbol \Sigma_2)
y∣Ax∼CN(y;Ax,Σ2),则关于
x
\boldsymbol x
x的后验分布:
p
(
x
∣
y
)
∝
C
N
(
x
;
r
,
Σ
1
)
⋅
C
N
(
y
;
A
x
,
Σ
2
)
∝
exp
{
−
(
x
−
r
)
H
Σ
1
−
1
(
x
−
r
)
−
(
A
x
−
y
)
H
Σ
2
−
1
(
A
x
−
y
)
}
∝
exp
{
x
H
Σ
1
−
1
x
−
2
R
{
r
H
Σ
1
−
1
x
}
−
x
H
A
H
Σ
2
−
1
A
x
−
2
R
{
y
H
Σ
2
−
1
A
x
}
}
=
exp
{
x
H
(
Σ
1
−
1
+
A
H
Σ
2
−
1
A
)
x
−
2
R
{
(
r
H
Σ
1
−
1
+
y
H
Σ
2
−
1
A
)
x
}
}
=
exp
{
−
x
H
[
Σ
1
−
1
+
A
H
Σ
2
−
1
A
⏟
=
Σ
x
−
1
]
x
+
2
R
{
[
Σ
x
(
Σ
1
−
1
r
+
A
H
Σ
2
−
1
y
)
⏟
=
μ
x
]
H
Σ
x
−
1
x
}
}
\begin{aligned} p(\boldsymbol x| \boldsymbol y) &\propto \mathcal {CN}(\boldsymbol x; \boldsymbol r, \boldsymbol \Sigma_1) \cdot \mathcal{CN}(\boldsymbol y; \boldsymbol{Ax}, \boldsymbol \Sigma_2) \\ & \propto \exp{\left \{ -{(\boldsymbol x - \boldsymbol r)^H \boldsymbol \Sigma^{-1}_1 (\boldsymbol x - \boldsymbol r)}{} -{(\boldsymbol {Ax} - \boldsymbol y)^H \boldsymbol \Sigma^{-1}_2 (\boldsymbol {Ax} - \boldsymbol y)}{} \right \}} \\ & \propto \exp{\left \{ {\boldsymbol x^H \boldsymbol \Sigma^{-1}_1 \boldsymbol x - 2 \mathcal{R} \left \{\boldsymbol r^H \boldsymbol \Sigma^{-1}_1 \boldsymbol x\right \}}{} - {\boldsymbol x^H \boldsymbol A^H \boldsymbol \Sigma^{-1}_2 \boldsymbol A \boldsymbol x - 2 \mathcal{R} \left \{\boldsymbol y^H \boldsymbol \Sigma^{-1}_2 \boldsymbol A \boldsymbol x\right \}}{} \right \}} \\ &= \exp{\left \{ {\boldsymbol x^H \left ( \boldsymbol \Sigma^{-1}_1 + \boldsymbol A^H \boldsymbol \Sigma^{-1}_2 \boldsymbol A \right ) \boldsymbol x - 2 \mathcal{R} \left \{ \left( \boldsymbol r^H \boldsymbol \Sigma^{-1}_1 + \boldsymbol y^H \boldsymbol \Sigma^{-1}_2 \boldsymbol A \right ) \boldsymbol x \right \}}{} \right \}} \\ & = \exp \left \{ -\boldsymbol x^H \left [ \mathop {\underbrace{{\boldsymbol{\varSigma }_{1}^{-1}+{\boldsymbol A^H \boldsymbol \Sigma^{-1}_2 \boldsymbol A}^{}}{}}} \limits_{=\boldsymbol{\varSigma }_{x}^{-1}} \right ] \boldsymbol x + 2 \mathcal{ R } \left \{ \left[ \mathop {\underbrace{{\boldsymbol{\varSigma }_x\left( \boldsymbol{\varSigma }_{1}^{-1}\boldsymbol r+\boldsymbol A^H \boldsymbol{\varSigma }_{2}^{-1}\boldsymbol{y } \right)}{}}} \limits_{=\boldsymbol{\mu }_x} \right] ^H \boldsymbol \Sigma^{-1}_x \boldsymbol x \right \} \right \} \\ \end{aligned}
p(x∣y)∝CN(x;r,Σ1)⋅CN(y;Ax,Σ2)∝exp{−(x−r)HΣ1−1(x−r)−(Ax−y)HΣ2−1(Ax−y)}∝exp{xHΣ1−1x−2R{rHΣ1−1x}−xHAHΣ2−1Ax−2R{yHΣ2−1Ax}}=exp{xH(Σ1−1+AHΣ2−1A)x−2R{(rHΣ1−1+yHΣ2−1A)x}}=exp⎩⎪⎪⎨⎪⎪⎧−xH⎣⎢⎡=Σx−1
Σ1−1+AHΣ2−1A⎦⎥⎤x+2R⎩⎪⎪⎨⎪⎪⎧⎣⎢⎢⎡=μx
Σx(Σ1−1r+AHΣ2−1y)⎦⎥⎥⎤HΣx−1x⎭⎪⎪⎬⎪⎪⎫⎭⎪⎪⎬⎪⎪⎫
所以有
Σ
x
=
(
Σ
1
−
1
+
A
H
Σ
2
−
1
A
)
−
1
x
=
(
Σ
1
−
1
+
A
H
Σ
2
−
1
A
)
−
1
(
Σ
1
−
1
r
+
A
H
Σ
2
−
1
y
)
\begin{aligned} \boldsymbol \Sigma_x &= {\left ( \boldsymbol{\varSigma }_{1}^{-1}+\boldsymbol A^H \boldsymbol \Sigma^{-1}_2 \boldsymbol A ^{} \right )}^{-1} \\ \boldsymbol x &={\left ( \boldsymbol{\varSigma }_{1}^{-1}+\boldsymbol A^H \boldsymbol \Sigma^{-1}_2 \boldsymbol A ^{} \right )}^{-1} \left( \boldsymbol{\varSigma }_{1}^{-1}\boldsymbol r+\boldsymbol A^H \boldsymbol{\varSigma }_{2}^{-1}\boldsymbol{y } \right) \end{aligned}
Σxx=(Σ1−1+AHΣ2−1A)−1=(Σ1−1+AHΣ2−1A)−1(Σ1−1r+AHΣ2−1y)
事实上,
x
=
E
[
x
∣
y
]
\boldsymbol x = \mathbb E[\boldsymbol x| \boldsymbol y]
x=E[x∣y],所以该均值也是MMSE估计的结果,因为其估计结果是线性的,所以也称为“LMMSE”。