机器学习-白板推导 P2_4
已知高维高斯分布,求边缘概率分布和条件概率分布
x = [ x 1 x 2 ⋮ x p ] μ = [ μ 1 μ 2 ⋮ μ p ] Σ = [ σ 11 σ 12 ⋯ σ 1 p σ 21 σ 22 ⋯ σ 2 p ⋮ ⋮ ⋱ ⋮ σ p 1 σ p 2 ⋯ σ p p ] p × p x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \qquad \mu= \begin{bmatrix} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{p} \end{bmatrix} \qquad \Sigma= \begin{bmatrix} \sigma_{11} & \sigma_{12} & \cdots &\sigma_{1p} \\ \sigma_{21} & \sigma_{22} & \cdots&\sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots &\sigma_{pp} \end{bmatrix}_{p \times p } x=⎣⎢⎢⎢⎡x1x2⋮xp⎦⎥⎥⎥⎤μ=⎣⎢⎢⎢⎡μ1μ2⋮μp⎦⎥⎥⎥⎤Σ=⎣⎢⎢⎢⎡σ11σ21⋮σp1σ12σ22⋮σp2⋯⋯⋱⋯σ1pσ2p⋮σpp⎦⎥⎥⎥⎤p×p
已知:
把 p p p维向量分成两组, x a ∈ R m , x b ∈ R n , m + n = p x_a \in R^m, x_b \in R ^n, m+n = p xa∈Rm,xb∈Rn,m+n=p
把 x x x看成 x a , x b x_a,x_b xa,xb的联合概率分布
x = [ x a x b ] μ = [ μ 0 μ 1 ] Σ = [ Σ a a Σ a b Σ b a Σ b b ] x= \begin{bmatrix} x_{a} \\ x_{b} \end{bmatrix} \qquad \mu= \begin{bmatrix} \mu_{0} \\ \mu_{1} \end{bmatrix} \qquad \Sigma= \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix} x=[xaxb]μ=[μ0μ1]Σ=[ΣaaΣbaΣabΣbb]
求:
p ( x a ) p(x_a) p(xa), p ( x b ∣ x a ) p(x_b|x_a) p(xb∣xa), p ( x b ) p(x_b) p(xb), p ( x a ∣ x b ) p(x_a|x_b) p(xa∣xb)
配方法->PRML 这里不用
定理:
已知:
x
∼
N
(
μ
,
Σ
)
x \sim N(\mu, \Sigma)
x∼N(μ,Σ),
y
=
A
x
+
B
y=Ax+B
y=Ax+B
结论:
y
∼
N
(
A
μ
+
B
,
A
Σ
A
T
)
y \sim N(A\mu+B,A\Sigma A^T)
y∼N(Aμ+B,AΣAT)
E
[
y
]
=
E
[
A
x
+
B
]
=
A
E
[
x
]
+
B
=
A
μ
+
B
E[y]=E[Ax+B]=AE[x]+B=A\mu+B
E[y]=E[Ax+B]=AE[x]+B=Aμ+B
V
a
r
[
y
]
=
V
a
r
[
A
x
+
B
]
=
V
a
r
[
A
x
]
+
V
a
r
[
B
]
=
A
V
a
r
[
x
]
A
T
=
A
Σ
A
T
Var[y]=Var[Ax+B]=Var[Ax]+Var[B]=AVar[x]A^T=A\Sigma A^T
Var[y]=Var[Ax+B]=Var[Ax]+Var[B]=AVar[x]AT=AΣAT
p
(
x
a
)
p(x_a)
p(xa)公式推理:
x
a
=
[
I
m
0
n
]
[
x
a
x
b
]
E
[
x
a
]
=
[
I
m
0
n
]
[
u
a
u
b
]
=
u
a
V
a
r
[
x
a
]
=
[
I
m
0
n
]
[
Σ
a
a
Σ
a
b
Σ
b
a
Σ
b
b
]
[
I
m
0
n
]
=
[
Σ
a
a
Σ
b
b
]
[
I
m
0
]
=
Σ
a
a
x
a
∼
(
μ
a
,
Σ
a
a
)
\begin{aligned} & x_a= \begin{bmatrix} I_m & 0_n \end{bmatrix} \begin{bmatrix} x_a \\ x_b \end{bmatrix} \\ &E[x_a]= \begin{bmatrix} I_m & 0_n \end{bmatrix} \begin{bmatrix} u_a \\ u_b \end{bmatrix}=u_a \\ & Var[x_a]= \begin{bmatrix} I_m & 0_n \end{bmatrix} \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix} \begin{bmatrix} I_m \\ 0_n \end{bmatrix} = \begin{bmatrix} \Sigma_{aa} & \Sigma_{bb} \end{bmatrix} \begin{bmatrix} I_m\\ 0\end{bmatrix} = \Sigma_{aa} \\ &x_a \sim (\mu_a, \Sigma_{aa}) \end{aligned}
xa=[Im0n][xaxb]E[xa]=[Im0n][uaub]=uaVar[xa]=[Im0n][ΣaaΣbaΣabΣbb][Im0n]=[ΣaaΣbb][Im0]=Σaaxa∼(μa,Σaa)
p
(
x
b
∣
x
a
)
p(x_b|x_a)
p(xb∣xa)公式推理:
定义变量:
x
b
.
a
=
x
b
−
Σ
b
a
Σ
a
a
−
1
x
a
x_{b.a}=x_b-\Sigma_{ba}\Sigma_{aa}^{-1}x_a
xb.a=xb−ΣbaΣaa−1xa 不要问为什么
μ
b
.
a
=
μ
b
−
Σ
b
a
Σ
a
a
−
1
μ
a
\mu_{b.a}=\mu_b-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_a
μb.a=μb−ΣbaΣaa−1μa
Σ
b
b
.
a
=
Σ
b
b
−
Σ
b
a
Σ
a
a
−
1
Σ
a
b
\Sigma_{bb.a}=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab}
Σbb.a=Σbb−ΣbaΣaa−1Σab
\qquad
schur complementary
设:
x
b
.
a
=
[
−
Σ
b
a
Σ
a
a
−
1
I
]
[
x
a
x
b
]
E
[
x
b
.
a
]
=
[
−
Σ
b
a
Σ
a
a
−
1
I
]
[
u
a
u
b
]
=
μ
b
−
Σ
b
a
Σ
a
a
−
1
μ
a
=
μ
b
.
a
V
a
r
[
x
b
.
a
]
=
[
−
Σ
b
a
Σ
a
a
−
1
I
]
[
Σ
a
a
Σ
a
b
Σ
b
a
Σ
b
b
]
[
−
Σ
b
a
Σ
a
a
−
1
I
]
=
[
0
Σ
b
b
−
Σ
b
a
Σ
a
a
−
1
Σ
a
b
]
[
−
Σ
b
a
Σ
a
a
−
1
I
]
=
Σ
b
b
−
Σ
b
a
Σ
a
a
−
1
Σ
a
b
=
Σ
b
b
.
a
x
b
.
a
∼
N
(
μ
b
.
a
,
Σ
b
b
.
a
)
\begin{aligned} &x_{b.a}=\begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{bmatrix} \begin{bmatrix} x_a \\ x_b \end{bmatrix}\\ &E[x_{b.a}] = \begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{bmatrix} \begin{bmatrix} u_a \\ u_b \end{bmatrix} = \mu_b-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_a = \mu_{b.a} \\ &Var[x_{b.a}]=\begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{bmatrix} \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix}\begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} \\ I \end{bmatrix} \\ &= \begin{bmatrix} 0 & \Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab} \end{bmatrix} \begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} \\ I \end{bmatrix} \\ &=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab} \\ &=\Sigma_{bb.a} \\ &x_{b.a} \sim N(\mu_{b.a},\Sigma_{bb.a}) \end{aligned}
xb.a=[−ΣbaΣaa−1I][xaxb]E[xb.a]=[−ΣbaΣaa−1I][uaub]=μb−ΣbaΣaa−1μa=μb.aVar[xb.a]=[−ΣbaΣaa−1I][ΣaaΣbaΣabΣbb][−ΣbaΣaa−1I]=[0Σbb−ΣbaΣaa−1Σab][−ΣbaΣaa−1I]=Σbb−ΣbaΣaa−1Σab=Σbb.axb.a∼N(μb.a,Σbb.a)
已知
x
b
.
a
∼
N
(
μ
b
.
a
,
Σ
b
b
.
a
)
x_{b.a} \sim N(\mu_{b.a},\Sigma_{bb.a})
xb.a∼N(μb.a,Σbb.a)
x
b
.
a
=
x
b
−
Σ
b
a
Σ
a
a
−
1
x
a
x_{b.a}=x_b-\Sigma_{ba}\Sigma_{aa}^{-1}x_a
xb.a=xb−ΣbaΣaa−1xa
x
b
=
x
b
.
a
+
Σ
b
a
Σ
a
a
−
1
x
a
x_{b}=x_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a
xb=xb.a+ΣbaΣaa−1xa
y
=
A
x
+
B
y=Ax+B
y=Ax+B
y
=
x
b
,
A
=
1
,
x
=
x
b
.
a
,
B
=
Σ
b
a
Σ
a
a
−
1
x
a
y=x_{b},A=1,x=x_{b.a},B=\Sigma_{ba}\Sigma_{aa}^{-1}x_a
y=xb,A=1,x=xb.a,B=ΣbaΣaa−1xa
结论
E
[
x
b
∣
x
a
]
=
μ
b
.
a
+
Σ
b
a
Σ
a
a
−
1
x
a
E[x_b|x_a]=\mu_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a
E[xb∣xa]=μb.a+ΣbaΣaa−1xa
V
a
r
[
x
b
∣
x
a
]
=
V
a
r
[
x
b
.
a
]
=
Σ
b
b
.
a
Var[x_b|x_a]=Var[x_{b.a}]=\Sigma_{bb.a}
Var[xb∣xa]=Var[xb.a]=Σbb.a
x
b
∣
x
a
∼
N
(
μ
b
.
a
+
Σ
b
a
Σ
a
a
−
1
x
a
,
Σ
b
b
.
a
)
x_b|x_a \sim N(\mu_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a,\Sigma_{bb.a})
xb∣xa∼N(μb.a+ΣbaΣaa−1xa,Σbb.a)