一维正态分布
f ( x ) = 1 σ 2 π exp ( − ( x − μ ) 2 2 σ 2 ) \begin{aligned} f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\exp\left({-\dfrac{(x-\mu)^2}{2\sigma^2}}\right) \end{aligned} f(x)=σ2π1exp(−2σ2(x−μ)2)则称 X ∼ N ( μ , σ 2 ) X\sim N(\mu,\sigma^2) X∼N(μ,σ2)
∫
−
∞
+
∞
f
(
x
)
d
x
=
∫
−
∞
+
∞
1
σ
2
π
exp
(
−
(
x
−
μ
)
2
2
σ
2
)
d
x
=
t
=
x
−
μ
σ
1
2
π
∫
−
∞
+
∞
exp
(
−
t
2
2
)
d
t
=
1
\begin{aligned} \begin{aligned} \int_{-\infty}^{+\infty}f(x){\rm d}x &= \int_{-\infty}^{+\infty}\dfrac{1}{\sigma\sqrt{2\pi}}\exp\left({-\dfrac{(x-\mu)^2}{2\sigma^2}}\right){\rm d}x\\ &\xlongequal[]{t=\dfrac{x-\mu}{\sigma}} \dfrac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp\left({-\dfrac{t^2}{2}}\right){\rm d}t=1 \end{aligned} \end{aligned}
∫−∞+∞f(x)dx=∫−∞+∞σ2π1exp(−2σ2(x−μ)2)dxt=σx−μ2π1∫−∞+∞exp(−2t2)dt=1
(
∫
−
∞
+
∞
exp
(
−
t
2
2
)
d
t
)
2
=
∫
−
∞
+
∞
∫
−
∞
+
∞
exp
(
−
t
2
+
u
2
2
)
d
t
d
u
=
t
=
ρ
c
o
s
θ
,
u
=
ρ
s
i
n
θ
∫
0
2
π
(
∫
0
+
∞
exp
(
−
ρ
2
2
)
ρ
d
ρ
)
d
θ
=
∫
0
2
π
d
θ
=
2
π
\begin{aligned} \begin{aligned} \left(\int_{-\infty}^{+\infty}\exp\left({-\dfrac{t^2}{2}}\right){\rm d}t\right)^2 &= \int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty}\exp\left({-\dfrac{t^2+u^2}{2}}\right){\rm d}t{\rm d}u\\ &\xlongequal{t=\rho cos\theta,u=\rho sin\theta}\int_{0}^{2\pi}\left(\int_{0}^{+\infty}\exp\left({-\dfrac{\rho^2}{2}}\right)\rho {\rm d}\rho\right){\rm d}\theta=\int_{0}^{2\pi}{\rm d}\theta=2\pi \end{aligned} \end{aligned}
(∫−∞+∞exp(−2t2)dt)2=∫−∞+∞∫−∞+∞exp(−2t2+u2)dtdut=ρcosθ,u=ρsinθ∫02π(∫0+∞exp(−2ρ2)ρdρ)dθ=∫02πdθ=2π
标准正态分布
φ ( x ) = 1 2 π exp ( − x 2 2 ) Φ ( x ) = 1 2 π ∫ − ∞ + ∞ exp ( − x 2 2 ) d x \begin{aligned} \varphi(x)&=\dfrac{1}{\sqrt{2\pi}}\exp\left({-\dfrac{x^2}{2}}\right)\\ \Phi(x)&=\dfrac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp\left({-\dfrac{x^2}{2}}\right)dx \end{aligned} φ(x)Φ(x)=2π1exp(−2x2)=2π1∫−∞+∞exp(−2x2)dx
性质:
- φ ( − x ) = φ ( x ) \varphi(-x)=\varphi(x) φ(−x)=φ(x)
-
Φ
(
−
x
)
=
1
−
Φ
(
x
)
\Phi(-x)=1-\Phi(x)
Φ(−x)=1−Φ(x)
pf:
Φ ( − x ) = ∫ − ∞ + ∞ φ ( t ) d t = t = − u ∫ x + ∞ φ ( u ) d u = ∫ − ∞ + ∞ φ ( u ) d u − ∫ − ∞ x φ ( u ) d u = 1 − Φ ( x ) \begin{aligned} \Phi(-x)=\int_{-\infty}^{+\infty}\varphi(t)dt\xlongequal{t=-u}\int_{x}^{+\infty}\varphi(u)du=\int_{-\infty}^{+\infty}\varphi(u)du-\int_{-\infty}^{x}\varphi(u)du=1-\Phi(x) \end{aligned} Φ(−x)=∫−∞+∞φ(t)dtt=−u∫x+∞φ(u)du=∫−∞+∞φ(u)du−∫−∞xφ(u)du=1−Φ(x)
高维正态分布
先从各维度不相关的多元正态分布入手
d维数据 x = [ x 1 x 2 ⋯ x d ] T x=\begin{bmatrix}x_1&x_2&\cdots &x_d\end{bmatrix}^T x=[x1x2⋯xd]T,各维度的均值为 μ 1 , μ 2 , ⋯ , μ d \mu_1,\mu_2,\cdots,\mu_d μ1,μ2,⋯,μd,标准差为 σ 1 , σ 2 , ⋯ , σ d \sigma_1,\sigma_2,\cdots,\sigma_d σ1,σ2,⋯,σd
高斯概率密度函数可以表示为
p
(
x
)
=
p
(
x
1
)
p
(
x
2
)
⋯
p
(
x
d
)
=
1
(
2
π
)
d
σ
1
σ
2
⋯
σ
d
exp
(
−
1
2
[
(
x
1
−
μ
1
σ
1
)
2
+
(
x
2
−
μ
2
σ
2
)
2
+
⋯
+
(
x
d
−
μ
d
σ
d
)
2
]
)
d
2
(
x
,
μ
)
=
(
x
1
−
μ
1
σ
1
)
2
+
(
x
2
−
μ
2
σ
2
)
2
+
⋯
+
(
x
d
−
μ
d
σ
d
)
2
=
(
X
−
μ
)
T
Σ
−
1
(
X
−
μ
)
=
[
x
1
−
μ
1
x
2
−
μ
2
⋯
x
d
−
μ
d
]
[
1
σ
1
2
⋯
0
0
1
σ
2
2
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
1
σ
d
2
]
[
x
1
−
μ
1
x
2
−
μ
2
⋮
x
d
−
μ
d
]
\begin{aligned} p({\boldsymbol{x}})&=p(x_1)p(x_2)\cdots p(x_d)=\dfrac{1}{(\sqrt{2\pi})^d\sigma_1\sigma_2\cdots\sigma_d}\exp\left({-\dfrac{1}{2}\left[(\dfrac{x_1-\mu_1}{\sigma_1})^2+(\dfrac{x_2-\mu_2}{\sigma_2})^2+\cdots+(\dfrac{x_d-\mu_d}{\sigma_d})^2\right]}\right)\\ d^2(x,\mu)&=(\dfrac{x_1-\mu_1}{\sigma_1})^2+(\dfrac{x_2-\mu_2}{\sigma_2})^2+\cdots+(\dfrac{x_d-\mu_d}{\sigma_d})^2=(X-\mu)^T\boldsymbol{\Sigma}^{-1}(X-\mu)\\ &=\begin{bmatrix}x_1-\mu_1&x_2-\mu_2&\cdots &x_d-\mu_d\end{bmatrix}\begin{bmatrix}\dfrac{1}{\sigma_1^2}&&\cdots &0\\0&\dfrac{1}{\sigma_2^2}&\cdots &0\\\vdots&\vdots&\ddots &\vdots\\0&0&\cdots &\dfrac{1}{\sigma_d^2}\end{bmatrix}\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\\\vdots \\x_d-\mu_d\end{bmatrix} \end{aligned}
p(x)d2(x,μ)=p(x1)p(x2)⋯p(xd)=(2π)dσ1σ2⋯σd1exp(−21[(σ1x1−μ1)2+(σ2x2−μ2)2+⋯+(σdxd−μd)2])=(σ1x1−μ1)2+(σ2x2−μ2)2+⋯+(σdxd−μd)2=(X−μ)TΣ−1(X−μ)=[x1−μ1x2−μ2⋯xd−μd]
σ1210⋮0σ221⋮0⋯⋯⋱⋯00⋮σd21
x1−μ1x2−μ2⋮xd−μd
得到高维正态分布:
p
(
x
)
=
1
(
2
π
)
d
σ
1
σ
2
⋯
σ
d
exp
(
−
1
2
[
d
2
(
x
,
μ
)
]
)
=
1
(
2
π
)
d
2
∣
Σ
∣
1
2
exp
(
−
1
2
(
X
−
μ
)
T
Σ
−
1
(
X
−
μ
)
)
\begin{aligned} p({\boldsymbol{x}})=\dfrac{1}{(\sqrt{2\pi})^d\sigma_1\sigma_2\cdots\sigma_d}\exp\left({-\dfrac{1}{2}[d^2(x,\mu)]}\right)=\dfrac{1}{(2\pi)^{\frac{d}{2}}|\boldsymbol{\Sigma}|^{\frac{1}{2}}}\exp\left({-\dfrac{1}{2}(X-\mu)^T\boldsymbol{\Sigma}^{-1}(X-\mu)}\right) \end{aligned}
p(x)=(2π)dσ1σ2⋯σd1exp(−21[d2(x,μ)])=(2π)2d∣Σ∣211exp(−21(X−μ)TΣ−1(X−μ))
高斯分布的归一化积
设
p
1
(
x
)
=
N
(
x
∣
μ
1
,
σ
1
)
,
p
2
(
x
)
=
N
(
x
∣
μ
2
,
σ
2
)
p_1(x)=\mathcal{N}(x|\mu_1,\sigma_1),p_2(x)=\mathcal{N}(x|\mu_2,\sigma_2)
p1(x)=N(x∣μ1,σ1),p2(x)=N(x∣μ2,σ2)均是关于变量
x
x
x的分布
p
1
(
x
)
p
2
(
x
)
∼
exp
(
−
1
2
σ
1
2
(
x
−
μ
1
)
2
)
exp
(
−
1
2
σ
2
2
(
x
−
μ
2
)
2
)
=
exp
(
−
1
2
(
σ
1
2
+
σ
2
2
)
x
2
−
2
(
μ
1
σ
2
2
+
μ
2
σ
1
2
)
x
+
c
o
n
s
t
a
n
t
σ
1
2
σ
2
2
)
∼
exp
(
−
1
2
σ
1
2
+
σ
2
2
σ
1
2
σ
2
2
(
x
−
μ
1
σ
2
2
+
μ
2
σ
1
2
σ
1
2
+
σ
2
2
)
2
)
\begin{aligned} p_1(x)p_2(x)&\sim \exp\left({-\dfrac{1}{2\sigma_1^2}(x-\mu_1)^2}\right)\exp\left({-\dfrac{1}{2\sigma_2^2}(x-\mu_2)^2}\right)\\&=\exp\left({-\dfrac{1}{2}\dfrac{(\sigma_1^2+\sigma_2^2)x^2-2(\mu_1\sigma_2^2+\mu_2\sigma^2_1)x+\rm{constant}}{\sigma_1^2\sigma_2^2}}\right)\\ &\sim\exp\left(-\dfrac{1}{2}\dfrac{\sigma_1^2+\sigma_2^2}{\sigma_1^2\sigma_2^2}\left(x-\dfrac{\mu_1\sigma_2^2+\mu_2\sigma_1^2}{\sigma_1^2+\sigma_2^2}\right)^2\right) \end{aligned}
p1(x)p2(x)∼exp(−2σ121(x−μ1)2)exp(−2σ221(x−μ2)2)=exp(−21σ12σ22(σ12+σ22)x2−2(μ1σ22+μ2σ12)x+constant)∼exp(−21σ12σ22σ12+σ22(x−σ12+σ22μ1σ22+μ2σ12)2)
得到两个高斯分布相乘仍为高斯分布
μ
=
μ
1
σ
2
2
+
μ
2
σ
1
2
σ
1
2
+
σ
2
2
σ
=
σ
1
2
σ
2
2
σ
1
2
+
σ
2
2
⟹
μ
=
(
μ
1
σ
1
2
+
μ
2
σ
2
2
)
σ
2
1
σ
2
=
1
σ
1
2
+
1
σ
2
2
\begin{array}{l} \mu=\dfrac{\mu_1\sigma_2^2+\mu_2\sigma_1^2}{\sigma_1^2+\sigma_2^2}\\\\ \sigma=\sqrt{\dfrac{\sigma_1^2\sigma^2_2}{\sigma_1^2+\sigma_2^2}} \end{array}\implies \begin{array}{l} \mu=\left(\dfrac{\mu_1}{\sigma_1^2}+\dfrac{\mu_2}{\sigma_2^2}\right){\sigma^2}\\\\ \dfrac{1}{\sigma^2}=\dfrac{1}{\sigma_1^2}+\dfrac{1}{\sigma^2_2} \end{array}
μ=σ12+σ22μ1σ22+μ2σ12σ=σ12+σ22σ12σ22⟹μ=(σ12μ1+σ22μ2)σ2σ21=σ121+σ221
高维高斯分布
p 1 ( x ) p 2 ( x ) ∼ exp ( − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ) exp ( − 1 2 ( x − μ 2 ) T Σ 2 − 1 ( x − μ 2 ) ) = exp ( − 1 2 [ x T ( Σ 1 − 1 + Σ 2 − 1 ) x − 2 ( μ 1 T Σ 1 − 1 + μ 2 T Σ 2 − 1 ) x + μ 1 T Σ 1 − 1 μ 1 + μ 2 T Σ 2 − 1 μ 2 ] ) = exp ( − [ x − ( Σ 1 − 1 + Σ 2 − 1 ) − 1 ( Σ 1 − 1 μ 1 + Σ 2 − 1 μ 2 ) ] T ( Σ 1 − 1 + Σ 2 − 1 ) [ x − ( Σ 1 − 1 + Σ 2 − 1 ) − 1 ( Σ 1 − 1 μ 1 + Σ 2 − 1 μ 2 ) ] 2 + c o n s t a n t ) \begin{aligned} p_1(\boldsymbol{x})p_2(\boldsymbol{x})\sim& \exp\left({-\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu_1})^T\boldsymbol{\Sigma}_1^{-1}(\boldsymbol{x}-\boldsymbol{\mu_1})}\right)\exp\left({-\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu_2})^T\boldsymbol{\Sigma}_2^{-1}(\boldsymbol{x}-\boldsymbol{\mu_2})}\right) \\ =&\exp\left({-\dfrac{1}{2}\left[\boldsymbol{x}^T(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1})\boldsymbol{x}-2(\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1})\boldsymbol{x}+\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right]}\right) \\ =&\exp\left({-\dfrac{\left[\boldsymbol{x}-\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1}\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]^T\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)\left[\boldsymbol{x}-\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1}\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]}{2}+\rm{constant}}\right) \end{aligned} p1(x)p2(x)∼==exp(−21(x−μ1)TΣ1−1(x−μ1))exp(−21(x−μ2)TΣ2−1(x−μ2))exp(−21[xT(Σ1−1+Σ2−1)x−2(μ1TΣ1−1+μ2TΣ2−1)x+μ1TΣ1−1μ1+μ2TΣ2−1μ2])exp −2[x−(Σ1−1+Σ2−1)−1(Σ1−1μ1+Σ2−1μ2)]T(Σ1−1+Σ2−1)[x−(Σ1−1+Σ2−1)−1(Σ1−1μ1+Σ2−1μ2)]+constant
得到两个高斯分布相乘仍为高斯分布
μ
=
(
Σ
1
−
1
μ
1
+
Σ
2
−
1
μ
2
)
(
Σ
1
−
1
+
Σ
2
−
1
)
−
1
Σ
=
(
Σ
1
−
1
+
Σ
2
−
1
)
−
1
\begin{aligned} \boldsymbol{\mu}&=\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu}_1+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu}_2\right)\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1} \\ \boldsymbol{\Sigma}&=\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1} \end{aligned}
μΣ=(Σ1−1μ1+Σ2−1μ2)(Σ1−1+Σ2−1)−1=(Σ1−1+Σ2−1)−1
即
Σ
−
1
=
(
Σ
1
−
1
+
Σ
2
−
1
)
(
Σ
1
−
1
+
Σ
2
−
1
)
μ
=
(
Σ
1
−
1
μ
1
+
Σ
2
−
1
μ
2
)
\begin{aligned} \boldsymbol{\Sigma}^{-1}&=\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)\\ \left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)\boldsymbol{\mu}&=\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu}_1+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu}_2\right) \end{aligned}
Σ−1(Σ1−1+Σ2−1)μ=(Σ1−1+Σ2−1)=(Σ1−1μ1+Σ2−1μ2)
进一步扩展到K个高斯分布的归一化积:
Σ
−
1
=
∑
k
=
1
K
Σ
k
−
1
Σ
−
1
μ
=
∑
k
=
1
K
Σ
k
−
1
μ
k
\begin{aligned} \boldsymbol{\Sigma}^{-1} &= \sum_{k=1}^K \boldsymbol{\Sigma}_k^{-1} \\ \boldsymbol{\Sigma}^{-1}\boldsymbol{\mu} &= \sum_{k=1}^K \boldsymbol{\Sigma}_k^{-1}\boldsymbol{\mu}_k \end{aligned}
Σ−1Σ−1μ=k=1∑KΣk−1=k=1∑KΣk−1μk
高斯分布随机变量线性变换的归一化积
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
∼
exp
(
−
1
2
(
G
1
x
−
μ
1
)
T
Σ
1
−
1
(
G
1
x
−
μ
1
)
)
exp
(
−
1
2
(
G
2
x
−
μ
2
)
T
Σ
2
−
1
(
G
2
x
−
μ
2
)
)
=
exp
(
−
1
2
[
x
T
(
G
1
T
Σ
1
−
1
G
1
+
G
2
T
Σ
2
−
1
G
2
)
x
−
2
(
μ
1
T
Σ
1
−
1
G
1
+
μ
2
T
Σ
2
−
1
G
2
)
x
+
μ
1
T
Σ
1
−
1
μ
1
+
μ
2
T
Σ
2
−
1
μ
2
]
)
=
exp
(
−
[
x
−
(
G
1
T
Σ
1
−
1
G
1
+
G
2
T
Σ
2
−
1
G
2
)
−
1
(
G
1
T
Σ
1
−
1
μ
1
+
G
2
T
Σ
2
−
1
μ
2
)
]
T
(
G
1
T
Σ
1
−
1
G
1
+
G
2
T
Σ
2
−
1
G
2
)
[
x
−
(
G
1
T
Σ
1
−
1
G
1
+
G
2
T
Σ
2
−
1
G
2
)
−
1
(
G
1
T
Σ
1
−
1
μ
1
+
G
2
T
Σ
2
−
1
μ
2
)
]
2
+
c
o
n
s
t
a
n
t
)
\begin{aligned} \exp\left({-\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})}\right)\sim& \exp\left({-\dfrac{1}{2}(\boldsymbol{G}_1\boldsymbol{x}-\boldsymbol{\mu_1})^T\boldsymbol{\Sigma}_1^{-1}(\boldsymbol{G}_1\boldsymbol{x}-\boldsymbol{\mu_1})}\right)\exp\left({-\dfrac{1}{2}(\boldsymbol{G}_2\boldsymbol{x}-\boldsymbol{\mu_2})^T\boldsymbol{\Sigma}_2^{-1}(\boldsymbol{G}_2\boldsymbol{x}-\boldsymbol{\mu_2})}\right) \\ &=\exp\left({-\dfrac{1}{2}\left[\boldsymbol{x}^T(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2)\boldsymbol{x}-2(\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2)\boldsymbol{x}+\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right]}\right) \\ &=\exp\left({-\dfrac{\left[\boldsymbol{x}-\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)^{-1}\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]^T\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)\left[\boldsymbol{x}-\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)^{-1}\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]}{2}+\rm{constant}}\right) \end{aligned}
exp(−21(x−μ)TΣ−1(x−μ))∼exp(−21(G1x−μ1)TΣ1−1(G1x−μ1))exp(−21(G2x−μ2)TΣ2−1(G2x−μ2))=exp(−21[xT(G1TΣ1−1G1+G2TΣ2−1G2)x−2(μ1TΣ1−1G1+μ2TΣ2−1G2)x+μ1TΣ1−1μ1+μ2TΣ2−1μ2])=exp
−2[x−(G1TΣ1−1G1+G2TΣ2−1G2)−1(G1TΣ1−1μ1+G2TΣ2−1μ2)]T(G1TΣ1−1G1+G2TΣ2−1G2)[x−(G1TΣ1−1G1+G2TΣ2−1G2)−1(G1TΣ1−1μ1+G2TΣ2−1μ2)]+constant
可得
Σ
−
1
=
(
G
1
T
Σ
1
−
1
G
1
+
G
2
T
Σ
2
−
1
G
2
)
(
G
1
T
Σ
1
−
1
G
1
+
G
2
T
Σ
2
−
1
G
2
)
μ
=
Σ
−
1
μ
=
(
G
1
T
Σ
1
−
1
μ
1
+
G
2
T
Σ
2
−
1
μ
2
)
\begin{aligned} \boldsymbol{\Sigma}^{-1}&=\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}^T_2\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)\\ \left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}^T_2\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)\boldsymbol{\mu}=\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}&=\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu}_2\right) \end{aligned}
Σ−1(G1TΣ1−1G1+G2TΣ2−1G2)μ=Σ−1μ=(G1TΣ1−1G1+G2TΣ2−1G2)=(G1TΣ1−1μ1+G2TΣ2−1μ2)
进一步扩展到K个高斯分布的归一化积:
Σ
−
1
=
∑
k
=
1
K
G
k
T
Σ
k
−
1
G
k
Σ
−
1
μ
=
∑
k
=
1
K
G
k
T
Σ
k
−
1
μ
k
\begin{aligned} \boldsymbol{\Sigma}^{-1} &= \sum_{k=1}^K \boldsymbol{G}_k^T\boldsymbol{\Sigma}_k^{-1}\boldsymbol{G}_k \\ \boldsymbol{\Sigma}^{-1}\boldsymbol{\mu} &= \sum_{k=1}^K \boldsymbol{G}_k^T\boldsymbol{\Sigma}_k^{-1}\boldsymbol{\mu}_k \end{aligned}
Σ−1Σ−1μ=k=1∑KGkTΣk−1Gk=k=1∑KGkTΣk−1μk
联合高斯概率密度函数,分解与推断
设有一对服从多元正态分布的变量
(
x
,
y
)
(\boldsymbol{x},\boldsymbol{y})
(x,y),联合概率为
p
(
x
,
y
)
=
N
(
[
μ
x
μ
y
]
,
[
Σ
x
x
Σ
x
y
Σ
y
x
Σ
y
y
]
)
p\left( \boldsymbol{x},\boldsymbol{y}\right) =\mathcal{N}\left( \begin{bmatrix} \boldsymbol{\mu }_{x} \\ \boldsymbol{\mu }_{y} \end{bmatrix},\begin{bmatrix} \boldsymbol{\Sigma}_{xx} & \boldsymbol{\Sigma}_{xy} \\ \boldsymbol{\Sigma}_{yx} & \boldsymbol{\Sigma}_{yy} \end{bmatrix}\right) \\
p(x,y)=N([μxμy],[ΣxxΣyxΣxyΣyy])
注意
Σ
y
x
=
Σ
x
y
T
\boldsymbol{\Sigma}_{yx}=\boldsymbol{\Sigma}_{xy}^T
Σyx=ΣxyT。
我们总是可以将联合概率密度分解成两个因子的乘积(条件概率乘以边缘概率)
p
(
x
,
y
)
=
p
(
x
∣
y
)
p
(
y
)
p\left( \boldsymbol{x}, \boldsymbol{y}\right) =p\left( \boldsymbol{x}\vert \boldsymbol{y}\right) p\left( \boldsymbol{y}\right)
p(x,y)=p(x∣y)p(y)
对协方差矩阵进行相似对角化得:
[
I
−
Σ
x
y
Σ
y
y
−
1
0
I
]
[
Σ
x
x
Σ
x
y
Σ
y
x
Σ
x
y
]
=
[
Σ
x
x
−
Σ
x
y
Σ
y
y
−
1
Σ
y
x
0
Σ
y
x
Σ
y
y
]
[
I
−
Σ
x
y
Σ
y
y
−
1
0
1
]
[
Σ
x
x
Σ
x
y
Σ
y
x
Σ
y
y
]
[
I
0
−
Σ
y
y
−
1
Σ
y
x
I
]
=
[
Σ
x
x
−
Σ
x
y
Σ
y
y
−
1
Σ
y
x
0
0
Σ
y
y
]
[
Σ
x
x
Σ
x
y
Σ
y
x
Σ
y
y
]
=
[
I
Σ
x
y
Σ
y
y
−
1
0
I
]
[
Σ
x
x
−
Σ
x
y
Σ
y
y
−
1
Σ
y
x
0
0
Σ
y
y
]
[
I
0
Σ
y
y
−
1
Σ
y
x
I
]
\begin{aligned} \begin{bmatrix} \boldsymbol{I} & -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \boldsymbol{\Sigma} _{xx} & \boldsymbol{\Sigma} _{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma} _{xy} \end{bmatrix} &=\begin{bmatrix} \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{0} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma}_{yy} \end{bmatrix}\\ \\ \begin{bmatrix} \boldsymbol{I} & -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & 1 \end{bmatrix} \begin{bmatrix} \boldsymbol{\Sigma} _{xx} & \boldsymbol{\Sigma} _{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma} _{yy} \end{bmatrix} \begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{I} \end{bmatrix} &=\begin{bmatrix} \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma}_{yy} \end{bmatrix} \\\\ \begin{bmatrix} \boldsymbol{\Sigma} _{xx} & \boldsymbol{\Sigma}_{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma} _{yy} \end{bmatrix} &=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix} \begin{bmatrix} \boldsymbol{\Sigma}_{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma}_{yy} \end{bmatrix} \begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ \boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{I} \end{bmatrix} \end{aligned}
[I0−ΣxyΣyy−1I][ΣxxΣyxΣxyΣxy][I0−ΣxyΣyy−11][ΣxxΣyxΣxyΣyy][I−Σyy−1Σyx0I][ΣxxΣyxΣxyΣyy]=[Σxx−ΣxyΣyy−1ΣyxΣyx0Σyy]=[Σxx−ΣxyΣyy−1Σyx00Σyy]=[I0ΣxyΣyy−1I][Σxx−ΣxyΣyy−1Σyx00Σyy][IΣyy−1Σyx0I]
联合概率密度
p
(
x
,
y
)
p(\boldsymbol{x},\boldsymbol{y})
p(x,y)指数部分的二次项
(
[
x
y
]
−
[
μ
x
μ
y
]
)
T
(
Σ
x
x
Σ
x
y
Σ
y
x
Σ
y
y
)
−
1
(
[
x
y
]
−
[
μ
x
μ
y
]
)
=
(
[
x
y
]
−
[
μ
x
μ
y
]
)
T
[
I
0
−
Σ
y
y
−
1
Σ
y
x
I
]
[
(
Σ
x
x
−
Σ
x
y
Σ
y
y
−
1
Σ
y
x
)
−
1
0
0
Σ
y
y
−
1
]
[
I
−
Σ
x
y
Σ
y
y
−
1
0
I
]
(
[
x
y
]
−
[
μ
x
μ
y
]
)
=
[
(
x
−
μ
x
)
−
Σ
x
y
Σ
y
y
−
1
(
y
−
μ
y
)
y
−
μ
y
]
T
[
(
Σ
x
x
−
Σ
x
y
Σ
y
y
Σ
y
x
)
−
1
0
0
Σ
y
y
−
1
]
[
(
x
−
μ
x
)
−
Σ
x
y
Σ
y
y
−
1
(
y
−
μ
y
)
y
−
μ
y
]
=
[
x
−
μ
x
−
Σ
x
y
Σ
y
y
−
1
(
y
−
μ
y
)
]
T
(
Σ
x
x
−
Σ
x
y
Σ
y
y
−
1
Σ
y
x
)
−
1
[
x
−
μ
x
−
Σ
x
y
Σ
y
y
−
1
(
y
−
μ
y
)
]
+
(
y
−
μ
y
)
T
Σ
y
y
−
1
(
y
−
μ
y
)
\left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) ^{T}\begin{pmatrix} \boldsymbol{\Sigma}_{xx} & \boldsymbol{\Sigma} _{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma}_{yy} \end{pmatrix}^{-1}\left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) \\ =\left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) ^{T}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{I} \end{bmatrix} \begin{bmatrix} \left( \boldsymbol{\Sigma}_{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx}\right) ^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma}_{yy}^{-1} \end{bmatrix} \begin{bmatrix} \boldsymbol{I} & -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix}\left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) \\= \begin{bmatrix} \left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) -\boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}^{-1}_{yy}\left( y-\boldsymbol{\mu}_y\right) \\ \boldsymbol{y}-\boldsymbol{\mu}_y \end{bmatrix}^{T} \begin{bmatrix} \left( \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}_{yy}\boldsymbol{\Sigma}_{yx}\right) ^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma} _{yy}^{-1} \end{bmatrix} \begin{bmatrix} \left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\boldsymbol{\mu}} _{y}\right) \\ \boldsymbol{y}-\boldsymbol{\boldsymbol{\mu}} _{y} \end{bmatrix}\\ =\left[ \boldsymbol{x}-\boldsymbol{\mu} _{x}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) \right] ^{T} \left( \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx}\right) ^{-1} \left[ \boldsymbol{x}-\boldsymbol{\mu} _{x}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( y-\boldsymbol{\mu} _{y}\right) \right] +\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) ^{T}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu} _{y}\right)
([xy]−[μxμy])T(ΣxxΣyxΣxyΣyy)−1([xy]−[μxμy])=([xy]−[μxμy])T[I−Σyy−1Σyx0I][(Σxx−ΣxyΣyy−1Σyx)−100Σyy−1][I0−ΣxyΣyy−1I]([xy]−[μxμy])=[(x−μx)−ΣxyΣyy−1(y−μy)y−μy]T[(Σxx−ΣxyΣyyΣyx)−100Σyy−1][(x−μx)−ΣxyΣyy−1(y−μy)y−μy]=[x−μx−ΣxyΣyy−1(y−μy)]T(Σxx−ΣxyΣyy−1Σyx)−1[x−μx−ΣxyΣyy−1(y−μy)]+(y−μy)TΣyy−1(y−μy)
这是两个二次项的和。由于幂运算中同底数幂相乘,底数不变、指数相加的性质,可以得到:
p
(
x
,
y
)
=
p
(
x
∣
y
)
p
(
y
)
p
(
x
∣
y
)
=
N
(
μ
x
+
Σ
x
y
Σ
y
y
−
1
(
y
−
μ
y
)
,
Σ
x
x
−
Σ
x
y
Σ
y
y
−
1
Σ
y
x
)
p
(
y
)
=
N
(
μ
y
,
Σ
y
y
)
\begin{array}{l} p\left( \boldsymbol{x},\boldsymbol{y}\right) =p\left( \boldsymbol{x}\vert \boldsymbol{y}\right) p\left( \boldsymbol{y}\right) \\ p\left( \boldsymbol{x}\vert \boldsymbol{y}\right) = \mathcal{N}( \boldsymbol{\mu} _{x}+\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) , \boldsymbol{\Sigma}_{xx}-\boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}_{yy}^{-1}\boldsymbol{\Sigma}_{yx}) \\ p\left( \boldsymbol{y}\right) =\mathcal{N}\left( \boldsymbol{\mu}_y,\boldsymbol{\Sigma} _{yy}\right) \end{array}
p(x,y)=p(x∣y)p(y)p(x∣y)=N(μx+ΣxyΣyy−1(y−μy),Σxx−ΣxyΣyy−1Σyx)p(y)=N(μy,Σyy)
高斯分布随机变量的非线性变换
研究高斯分布经过一个随机非线性变换之后的情况,即计算:
p
(
y
)
=
∫
−
∞
∞
p
(
y
∣
x
)
p
(
x
)
d
x
\begin{aligned} p(\boldsymbol{y})=\int_{-\infty}^{\infty}p(\boldsymbol{y}|\boldsymbol{x})p(\boldsymbol{x}){\rm d}\boldsymbol{x} \end{aligned}
p(y)=∫−∞∞p(y∣x)p(x)dx
其中
p
(
y
∣
x
)
=
N
(
g
(
x
)
,
R
)
p
(
x
)
=
N
(
μ
x
,
Σ
x
x
)
\begin{aligned} p(\boldsymbol{y}|\boldsymbol{x}) &=\mathcal{N}(\boldsymbol{g}(\boldsymbol{x}),\boldsymbol{R})\\ p(\boldsymbol{x}) &=\mathcal{N}(\boldsymbol{\mu}_x,\boldsymbol{\Sigma}_{xx}) \end{aligned}
p(y∣x)p(x)=N(g(x),R)=N(μx,Σxx)
这里
g
(
⋅
)
\boldsymbol{g}(\cdot)
g(⋅)表示
g
:
x
↦
y
\boldsymbol{g}:\boldsymbol{x}\mapsto\boldsymbol{y}
g:x↦y,是一个非线性映射。它受零均值高斯噪声干扰,其协方差为
R
\boldsymbol{R}
R。后文我们需要用到这类随机非线性映射对传感器进行建模。
对非线性变换进行线性化后,得到
g
(
x
)
≈
μ
y
+
G
(
x
−
μ
x
)
G
=
∂
g
(
x
)
∂
x
∣
x
=
μ
x
μ
y
=
g
(
μ
x
)
\begin{aligned} \boldsymbol{g}\left( \boldsymbol{x}\right) &\approx\boldsymbol{\mu} _{y}+\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \\ \boldsymbol{G}&=\dfrac{\partial \boldsymbol{g}\left( \boldsymbol{x}\right) }{\partial \boldsymbol{x}}{\huge\rvert}_{ \boldsymbol{x}=\boldsymbol{\mu} _{\boldsymbol{x}}}\\ \boldsymbol{\mu} _{y}&=\boldsymbol{g}\left( \boldsymbol{\mu} _{\boldsymbol{x}}\right) \end{aligned}
g(x)Gμy≈μy+G(x−μx)=∂x∂g(x)∣x=μx=g(μx)
p
(
y
)
=
∫
−
∞
+
∞
p
(
y
∣
x
)
p
(
x
)
d
x
=
η
∫
−
∞
+
∞
exp
{
−
1
2
[
y
−
(
μ
y
+
G
(
x
−
μ
x
)
)
]
T
R
−
1
[
y
−
(
μ
y
+
G
(
x
−
μ
x
)
)
]
}
×
exp
[
−
1
2
(
x
−
μ
x
)
T
Σ
x
x
−
1
(
x
−
μ
x
)
]
d
x
=
η
∫
−
∞
+
∞
exp
{
−
1
2
[
(
y
−
μ
y
)
T
R
−
1
(
y
−
μ
y
)
+
(
x
−
μ
x
)
T
G
T
R
−
1
G
(
x
−
μ
x
)
−
2
(
y
−
μ
y
)
T
R
−
1
G
(
x
−
μ
x
)
+
(
x
−
μ
x
)
T
Σ
x
x
−
1
(
x
−
μ
x
)
]
}
d
x
=
η
exp
[
−
1
2
(
y
−
μ
y
)
T
R
−
1
(
y
−
μ
y
)
]
∫
−
∞
+
∞
exp
{
−
1
2
[
(
x
−
μ
x
)
T
(
G
T
R
−
1
G
+
Σ
x
x
−
1
)
(
x
−
μ
x
)
−
2
(
y
−
μ
y
)
T
R
−
1
G
(
x
−
μ
x
)
]
}
d
x
\begin{aligned} \begin{aligned} p\left( \boldsymbol{\boldsymbol{y}}\right) &=\int_{-\infty }^{+\infty }p\left( \boldsymbol{\boldsymbol{y}}| \boldsymbol{\boldsymbol{x}}\right) p\left( \boldsymbol{\boldsymbol{x}}\right) {\rm d}\boldsymbol{\boldsymbol{x}}\\ &=\eta \int _{-\infty }^{+\infty }\exp \left\{ -\dfrac{1}{2}\left[ \boldsymbol{y}-\left( \boldsymbol{\mu}_y+\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \right) \right] ^{T}\boldsymbol{R}^{-1}\left[ \boldsymbol{y}-\left( \boldsymbol{\mu}_y+\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \right) \right] \right\} \\ &\times \exp \left[ -\dfrac{1}{2}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) ^{T}\boldsymbol{\Sigma}_{xx}^{-1}\left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) \right] {\rm d}\boldsymbol{x} \\ &=\eta \int _{-\infty }^{+\infty }\exp \left\{ -\dfrac{1}{2}\left[ \left( \boldsymbol{y}-\boldsymbol{\mu} _y\right) ^{T}\boldsymbol{R}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) +\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right)^T \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) -2\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right)^T \boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) +\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) ^{T}\Sigma_{xx}^{-1}\left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) \right] \right\} {\rm d}\boldsymbol{x} \\ &=\eta \exp \left[ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\boldsymbol{R}^{-1}\left( \boldsymbol{y}-\boldsymbol{\boldsymbol{\mu}} _{\boldsymbol{y}}\right) \right] \int _{-\infty }^{+\infty }\exp \left\{ -\dfrac{1}{2}\left[ \left( \boldsymbol{x}-\boldsymbol{\mu_x}\right)^T \left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\Sigma_{xx}^{-1}\right) \left( \boldsymbol{x}-\boldsymbol{\mu}_{\boldsymbol{x}}\right) -2\left( \boldsymbol{y}-\mu_y\right)^T \boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \right] \right\} {\rm d}\boldsymbol{x} \end{aligned} \end{aligned}
p(y)=∫−∞+∞p(y∣x)p(x)dx=η∫−∞+∞exp{−21[y−(μy+G(x−μx))]TR−1[y−(μy+G(x−μx))]}×exp[−21(x−μx)TΣxx−1(x−μx)]dx=η∫−∞+∞exp{−21[(y−μy)TR−1(y−μy)+(x−μx)TGTR−1G(x−μx)−2(y−μy)TR−1G(x−μx)+(x−μx)TΣxx−1(x−μx)]}dx=ηexp[−21(y−μy)TR−1(y−μy)]∫−∞+∞exp{−21[(x−μx)T(GTR−1G+Σxx−1)(x−μx)−2(y−μy)TR−1G(x−μx)]}dx
其中
η
\eta
η是归一化常量,对积分号里的指数部分进行配方得到如下二次项
[
x
−
μ
x
−
F
(
y
−
μ
y
)
]
T
(
G
T
R
−
1
G
+
Σ
x
x
−
1
)
[
x
−
μ
x
−
F
(
y
−
μ
y
)
]
−
(
y
−
μ
y
)
T
F
T
(
G
T
R
−
1
G
+
Σ
x
x
−
1
)
F
(
y
−
μ
y
)
\left[ \boldsymbol{x}-\boldsymbol{\mu_x}-\boldsymbol{F}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right] ^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma_{xx}}^{-1}\right) \left[ \boldsymbol{x}-\boldsymbol{\mu_x}-\boldsymbol{F}\left( \boldsymbol{y}-\boldsymbol{\mu} _y\right) \right] -\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) ^{T}\boldsymbol{F}^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma_{xx}}^{-1}\right) \boldsymbol{F}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right)
[x−μx−F(y−μy)]T(GTR−1G+Σxx−1)[x−μx−F(y−μy)]−(y−μy)TFT(GTR−1G+Σxx−1)F(y−μy)
其中第二个因子与
x
\boldsymbol{x}
x无关,可以提取到积分外面,剩下的积分部分(第一个因子)就是
x
\boldsymbol{x}
x的高斯分布,因此对
x
\boldsymbol{x}
x积分可以得到一个常数,再与常数
η
\eta
η合并。
F
\boldsymbol{F}
F 满足如下等式:
(
y
−
μ
y
)
T
F
T
(
G
T
R
−
1
G
+
Σ
x
x
−
1
)
(
x
−
μ
x
)
=
(
y
−
μ
y
)
T
R
−
1
G
(
x
−
μ
x
)
F
T
(
G
T
R
−
1
G
+
Σ
x
x
−
1
)
=
R
−
1
G
\begin{aligned} \left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\boldsymbol{F}^{T}\left( \boldsymbol{\boldsymbol{G}}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma} _{xx}^{-1}\right) \left( \boldsymbol{x}-\boldsymbol{\mu_x}\right) &=\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu_{x}}\right)\\ \boldsymbol{F}^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma} _{xx}^{-1}\right) &=\boldsymbol{R}^{-1}\boldsymbol{G} \end{aligned}
(y−μy)TFT(GTR−1G+Σxx−1)(x−μx)FT(GTR−1G+Σxx−1)=(y−μy)TR−1G(x−μx)=R−1G
于是:
p
(
y
)
=
ρ
exp
{
−
1
2
(
y
−
μ
y
)
T
[
R
−
1
−
F
T
(
G
T
R
−
1
G
+
Σ
x
x
−
1
)
F
]
(
y
−
μ
y
)
}
=
ρ
exp
{
−
1
2
(
y
−
μ
y
)
T
[
R
−
1
−
R
−
1
G
(
G
T
R
−
1
G
+
Σ
x
x
−
1
)
−
1
G
T
R
−
1
]
⏟
由矩阵求逆引理
(
29
)
式得
=
(
R
+
G
Σ
x
x
G
−
1
)
−
1
(
y
−
μ
y
)
}
=
ρ
exp
[
−
1
2
(
y
−
μ
y
)
T
(
R
+
G
Σ
x
x
G
−
1
)
−
1
(
y
−
μ
y
)
]
\begin{aligned} p(\boldsymbol{y})&=\rho\exp \left\{ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\left[ \boldsymbol{R}^{-1}-\boldsymbol{F}^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma}_{xx}^{-1}\right) \boldsymbol{F}\right] \left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right\} \\ &=\rho\exp\left\{ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\underbrace{\left[\boldsymbol{R}^{-1}-\boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma} _{xx}^{-1}\right) ^{-1}\boldsymbol{G}^{T}\boldsymbol{R}^{-1}\right]}_{由矩阵求逆引理(29)式得\ =\ \left( \boldsymbol{R}+\boldsymbol{G}\boldsymbol{\Sigma} _{xx} \boldsymbol{G}^{-1}\right)^{-1}}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right\}\\ &=\rho\exp \left[ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\left( \boldsymbol{R}+\boldsymbol{G}\boldsymbol{\Sigma} _{xx}\boldsymbol{G}^{-1}\right) ^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right] \end{aligned}
p(y)=ρexp{−21(y−μy)T[R−1−FT(GTR−1G+Σxx−1)F](y−μy)}=ρexp⎩
⎨
⎧−21(y−μy)T由矩阵求逆引理(29)式得 = (R+GΣxxG−1)−1
[R−1−R−1G(GTR−1G+Σxx−1)−1GTR−1](y−μy)⎭
⎬
⎫=ρexp[−21(y−μy)T(R+GΣxxG−1)−1(y−μy)]
其中
ρ
\rho
ρ 是一个新的归一化常量。该式即
x
\boldsymbol{x}
x的高斯分布:
y
∼
N
(
μ
y
,
Σ
y
y
)
=
N
(
μ
y
⏟
g
(
μ
x
)
,
R
+
G
Σ
x
x
G
T
)
\boldsymbol{y}\sim\mathcal{N}(\boldsymbol{\mu_y},\boldsymbol{\Sigma_{yy}})=\mathcal{N}(\underbrace{\boldsymbol{\mu}_y}_{\boldsymbol{g}(\boldsymbol{\mu_x})},\boldsymbol{R}+\boldsymbol{G}\boldsymbol{\Sigma_{xx}}\boldsymbol{G}^T)
y∼N(μy,Σyy)=N(g(μx)
μy,R+GΣxxGT)
中间过程使用了矩阵求逆引理,见下文。
矩阵求逆引理
对于可逆矩阵,我们可以将它分解为一个下三角-对角-上三角(lower-diagnal-upper,LDU)形式或上三角-对角-下三角(upper-diagnal-lower,UDL)形式。
LDU
[
A
−
1
−
B
C
D
]
=
[
I
0
C
A
1
]
[
A
−
1
0
0
D
+
C
A
B
]
[
I
−
A
B
0
I
]
\begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0}\\ \boldsymbol{C}\boldsymbol{A} & 1 \end{bmatrix}\begin{bmatrix} \boldsymbol{A}^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & -\boldsymbol{A}\boldsymbol{B} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix} \end{aligned}
[A−1C−BD]=[ICA01][A−100D+CAB][I0−ABI]
对两侧求逆得:
[
A
−
1
−
B
C
D
]
−
1
=
[
I
A
B
0
I
]
[
A
0
0
(
D
+
C
A
B
)
−
1
]
[
I
0
−
C
A
I
]
=
[
A
−
A
B
(
D
+
C
A
B
)
−
1
C
A
A
B
(
D
+
C
A
B
)
−
1
−
(
D
+
C
A
B
)
−
1
C
A
(
D
+
C
A
B
)
−
1
]
\begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}^{-1} &=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{A}\boldsymbol{B} \\ \boldsymbol{0}& \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \boldsymbol{A} & \boldsymbol{0} \\ \boldsymbol{0} & \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{C}\boldsymbol{A} & \boldsymbol{I} \end{bmatrix}\\ &=\begin{bmatrix} \boldsymbol{A}-\boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1}\boldsymbol{C}\boldsymbol{A} & \boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} \\ -\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1}\boldsymbol{C}\boldsymbol{A} & \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} \end{bmatrix} \end{aligned}
[A−1C−BD]−1=[I0ABI][A00(D+CAB)−1][I−CA0I]=[A−AB(D+CAB)−1CA−(D+CAB)−1CAAB(D+CAB)−1(D+CAB)−1]
UDL
[
A
−
1
−
B
C
D
]
=
[
I
−
B
D
−
1
0
I
]
[
A
−
1
+
B
D
−
1
C
0
0
D
]
[
I
0
D
−
1
C
I
]
\begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}=\begin{bmatrix} \boldsymbol{I} & -\boldsymbol{B}\boldsymbol{D}^{-1} \\ \boldsymbol{0}& \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{D} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ \boldsymbol{D}^{-1}\boldsymbol{C} & \boldsymbol{I} \end{bmatrix} \end{aligned}
[A−1C−BD]=[I0−BD−1I][A−1+BD−1C00D][ID−1C0I]
对两侧求逆得:
[
A
−
1
−
B
C
D
]
−
1
=
[
I
0
−
D
−
1
C
I
]
[
(
A
−
1
+
B
D
−
1
C
)
−
1
0
0
D
−
1
]
[
I
B
D
−
1
0
I
]
=
[
(
A
−
1
+
B
D
−
1
C
)
−
1
(
A
−
1
+
B
D
−
1
C
)
1
B
D
−
1
−
D
−
1
C
(
A
−
1
+
B
D
−
1
C
)
−
1
D
−
1
−
D
−
1
C
(
A
−
1
+
B
D
−
1
C
)
−
1
B
D
−
1
]
\begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}^{-1} &=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{D}^{-1}\boldsymbol{C} & \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{D}^{-1} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{B}\boldsymbol{D}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix} \\&= \begin{bmatrix} \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} & \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{1}\boldsymbol{B}\boldsymbol{D}^{-1} \\ -\boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} & \boldsymbol{D}^{-1}-\boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1}\boldsymbol{B}\boldsymbol{D}^{-1} \end{bmatrix} \end{aligned}
[A−1C−BD]−1=[I−D−1C0I][(A−1+BD−1C)−100D−1][I0BD−1I]=[(A−1+BD−1C)−1−D−1C(A−1+BD−1C)−1(A−1+BD−1C)1BD−1D−1−D−1C(A−1+BD−1C)−1BD−1]
SMW恒等式
比较得到如下恒等式:
(
A
−
1
+
B
D
−
1
C
)
−
1
≡
A
−
A
B
(
D
+
C
A
B
)
−
1
C
A
(
1
)
(
D
+
C
A
B
)
−
1
≡
D
−
1
−
D
−
1
C
(
A
−
1
+
B
D
−
1
C
)
−
1
B
D
−
1
(
2
)
A
B
(
D
+
C
A
B
)
−
1
≡
(
A
−
1
+
B
D
−
1
C
)
−
1
B
D
−
1
(
3
)
(
D
+
C
A
B
)
−
1
C
A
≡
D
−
1
C
(
A
−
1
+
B
D
−
1
C
)
−
1
(
4
)
\begin{aligned} \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} &\equiv \boldsymbol{A}-\boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1}\boldsymbol{C}\boldsymbol{A} &(1) \\ \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} &\equiv \boldsymbol{D}^{-1}-\boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1}\boldsymbol{B}\boldsymbol{D}^{-1} &(2) \\ \boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} &\equiv \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1}\boldsymbol{B}\boldsymbol{D}^{-1} &(3) \\ \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right)^{-1}\boldsymbol{C}\boldsymbol{A} &\equiv \boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} &(4) \end{aligned}
(A−1+BD−1C)−1(D+CAB)−1AB(D+CAB)−1(D+CAB)−1CA≡A−AB(D+CAB)−1CA≡D−1−D−1C(A−1+BD−1C)−1BD−1≡(A−1+BD−1C)−1BD−1≡D−1C(A−1+BD−1C)−1(1)(2)(3)(4)