机器学习-白板推导 P5_6
P-PCA
x ∈ R p z ∈ R q q < p x \in R^p \quad z \in R^q \quad q<p x∈Rpz∈Rqq<p
x
,
o
b
s
e
r
v
e
  
d
a
t
a
x,observe\;data
x,observedata
z
,
l
a
t
e
n
t
  
v
a
r
i
a
b
l
e
z,latent\;variable
z,latentvariable
降维的目的是从 p p p维降到 q q q维。
设
z
z
z的先验:
z
∼
N
(
0
q
,
I
q
)
z \sim N(0_q,I_q)
z∼N(0q,Iq)
x
=
w
z
+
μ
+
ϵ
x = wz + \mu + \epsilon
x=wz+μ+ϵ
ϵ
∼
N
(
0
,
σ
2
I
p
)
\epsilon \sim N(0,\sigma^2I_p)
ϵ∼N(0,σ2Ip)
σ 2 I p = [ σ 2 0 . . . 0 0 σ 2 . . . 0 ⋮ ⋮ ⋱ ⋮ 0 0 . . . σ 2 ] 各 向 同 性 \sigma^2 I_p= \begin{bmatrix}\sigma^2& 0 &...& 0 \\ 0 & \sigma^2&...& 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 &...&\sigma^2 \\ \end{bmatrix} \quad 各向同性 σ2Ip=⎣⎢⎢⎢⎡σ20⋮00σ2⋮0......⋱...00⋮σ2⎦⎥⎥⎥⎤各向同性
L i n e a r    G a u s s i o n    M o d e l Linear\;Gaussion\;Model LinearGaussionModel
P − P C A = { I n f e r e n c e , p ( z ∣ x ) L e a r i n g , w , μ , σ 2 → E M P-PCA= \begin{cases} Inference, & \text {$p(z|x)$} \\ Learing, & \text{$w,\mu,\sigma^2 \rightarrow EM$} \end{cases} P−PCA={Inference,Learing,p(z∣x)w,μ,σ2→EM
{
z
∼
N
(
0
,
I
)
x
=
w
z
+
μ
+
ϵ
ϵ
∼
N
(
0
,
σ
2
I
)
ϵ
⊥
z
E
[
x
∣
z
]
=
E
[
w
z
+
μ
+
ϵ
]
=
w
z
+
μ
V
a
r
[
x
∣
z
]
=
V
a
r
[
w
z
+
μ
+
ϵ
]
=
σ
2
I
x
∣
z
∼
N
(
w
z
+
u
,
σ
2
I
)
\begin{cases} z \sim N(0,I) \\ x = wz + \mu + \epsilon \\ \epsilon \sim N(0,\sigma^2I) \\ \epsilon \bot z \\ E[x|z]=E[wz+\mu+\epsilon]=wz+\mu\\ Var[x|z]=Var[wz+\mu+\epsilon]=\sigma^2I \\ x|z \sim N(wz+u,\sigma^2I) \end{cases}
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧z∼N(0,I)x=wz+μ+ϵϵ∼N(0,σ2I)ϵ⊥zE[x∣z]=E[wz+μ+ϵ]=wz+μVar[x∣z]=Var[wz+μ+ϵ]=σ2Ix∣z∼N(wz+u,σ2I)
E
[
x
]
=
E
[
w
z
+
μ
+
ϵ
]
=
E
[
w
z
+
μ
]
+
E
[
ϵ
]
=
μ
E[x]=E[wz+\mu+\epsilon]=E[wz+\mu]+E[\epsilon]=\mu
E[x]=E[wz+μ+ϵ]=E[wz+μ]+E[ϵ]=μ
V
a
r
[
x
]
=
V
a
r
[
w
z
+
μ
+
ϵ
]
=
V
a
r
[
w
z
]
+
V
a
r
[
ϵ
]
=
w
I
w
T
+
σ
2
I
=
w
w
T
+
σ
2
I
Var[x]=Var[wz+\mu+\epsilon]=Var[wz]+Var[\epsilon]=wIw^T+\sigma^2I=ww^T+\sigma^2I
Var[x]=Var[wz+μ+ϵ]=Var[wz]+Var[ϵ]=wIwT+σ2I=wwT+σ2I
x
∼
N
(
μ
,
w
w
T
+
σ
2
)
x \sim N(\mu,ww^T+\sigma^2)
x∼N(μ,wwT+σ2)
之前的公式:
x
=
[
x
a
x
b
]
μ
=
[
μ
0
μ
1
]
Σ
=
[
Σ
a
a
Σ
a
b
Σ
b
a
Σ
b
b
]
x= \begin{bmatrix} x_{a} \\ x_{b} \end{bmatrix} \qquad \mu= \begin{bmatrix} \mu_{0} \\ \mu_{1} \end{bmatrix} \qquad \Sigma= \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix}
x=[xaxb]μ=[μ0μ1]Σ=[ΣaaΣbaΣabΣbb]
已知:
x
∼
N
(
μ
,
Σ
)
x \sim N(\mu, \Sigma)
x∼N(μ,Σ)
x
b
.
a
=
x
b
−
Σ
b
a
Σ
a
a
−
1
x
a
x_{b.a}=x_b-\Sigma_{ba}\Sigma_{aa}^{-1}x_a
xb.a=xb−ΣbaΣaa−1xa
μ
b
.
a
=
μ
b
−
Σ
b
a
Σ
a
a
−
1
μ
a
\mu_{b.a}=\mu_b-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_a
μb.a=μb−ΣbaΣaa−1μa
Σ
b
b
.
a
=
Σ
b
b
−
Σ
b
a
Σ
a
a
−
1
Σ
a
b
\Sigma_{bb.a}=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab}
Σbb.a=Σbb−ΣbaΣaa−1Σab
\qquad
schur complementary
x b = x b . a + Σ b a Σ a a − 1 x a x_{b}=x_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a xb=xb.a+ΣbaΣaa−1xa
E
[
x
b
∣
x
a
]
=
μ
b
.
a
+
Σ
b
a
Σ
a
a
−
1
x
a
E[x_b|x_a]=\mu_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a
E[xb∣xa]=μb.a+ΣbaΣaa−1xa
V
a
r
[
x
b
∣
x
a
]
=
V
a
r
[
x
b
.
a
]
=
Σ
b
b
.
a
Var[x_b|x_a]=Var[x_{b.a}]=\Sigma_{bb.a}
Var[xb∣xa]=Var[xb.a]=Σbb.a
x b ∣ x a ∼ N ( μ b . a + Σ b a Σ a a − 1 x a , Σ b b . a ) x_b|x_a \sim N(\mu_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a,\Sigma_{bb.a}) xb∣xa∼N(μb.a+ΣbaΣaa−1xa,Σbb.a)
推导:
[
x
z
]
∼
(
[
μ
0
]
[
O
Δ
Δ
T
I
]
)
\begin{bmatrix} x \\ z \end{bmatrix} \sim \left( \begin{bmatrix} \mu \\ 0 \end{bmatrix} \begin{bmatrix} O & \Delta \\ \Delta^T & I \end{bmatrix} \right)
[xz]∼([μ0][OΔTΔI])
Δ = C o v ( x , z ) = E [ ( x − μ ) ( z − 0 ) ] = E [ ( x − μ ) Σ T ] = E [ ( w z + ϵ ) Σ T ] = E [ w z Σ T + ϵ Σ T ] = w E [ z Σ T ] + E [ ϵ ] ⋅ E [ Σ T ] = w ⋅ I + 0 = w \begin{aligned} \Delta &=Cov(x,z) \\ &=E[(x-\mu)(z-0)] \\ &=E[(x-\mu)\Sigma^T] \\ &=E[(wz+\epsilon )\Sigma^T] \\ &=E[wz\Sigma^T + \epsilon\Sigma^T ] \\ &=wE[z\Sigma^T] +E[\epsilon]\cdot E[\Sigma^T] \\ &=w\cdot I+0 \\ &=w \end{aligned} Δ=Cov(x,z)=E[(x−μ)(z−0)]=E[(x−μ)ΣT]=E[(wz+ϵ)ΣT]=E[wzΣT+ϵΣT]=wE[zΣT]+E[ϵ]⋅E[ΣT]=w⋅I+0=w
[ x z ] ∼ ( [ μ 0 ] [ O Δ Δ T I ] ) = ( [ μ 0 ] [ w w T + σ 2 I w w T I ] ) \begin{bmatrix} x \\ z \end{bmatrix} \sim \left( \begin{bmatrix} \mu \\ 0 \end{bmatrix} \begin{bmatrix} O & \Delta \\ \Delta^T & I \end{bmatrix} \right) = \left( \begin{bmatrix} \mu \\ 0 \end{bmatrix} \begin{bmatrix} ww^T+\sigma^2I & w \\ w^T & I \end{bmatrix} \right) [xz]∼([μ0][OΔTΔI])=([μ0][wwT+σ2IwTwI])
[
x
z
]
∼
N
(
μ
^
,
Σ
^
)
\begin{bmatrix} x \\ z \end{bmatrix} \sim N(\hat{\mu},\hat{\Sigma})
[xz]∼N(μ^,Σ^)
通过之前的公式能很简单的求出
z
∣
x
z|x
z∣x。