【一】 Latent Variable Model(隐变量模型)
举个例子:比如说一个人的观测值为【公益活动,运动,执行力强】,但其对应的未观测值为【善良,坚持,博学】,也就是说这是一种因果关系,【善良,坚持,博学】=》【公益活动,运动,执行力强】,“因” 是隐变量,“果” 是观测值
- Complete Case(
X
X
X,
Z
Z
Z 已知;
θ
θ
θ 未知)-- MLE
ℓ ( θ ; D ) = l o g P ( X , Z ∣ θ ) = l o g P ( Z ∣ θ z ) + l o g P ( X ∣ Z , θ x ) \bm {\ell( \theta ; D )} = log P ( X , Z \,|\, \theta ) = logP ( Z \,|\, \theta _ { z } ) + logP ( X | Z , \theta _ { x } ) ℓ(θ;D)=logP(X,Z∣θ)=logP(Z∣θz)+logP(X∣Z,θx)
- Incomplete Case(
X
X
X 已知;
Z
Z
Z,
θ
θ
θ 未知)-- EM
ℓ ( θ ; D ) = l o g ∑ Z P ( X , Z ∣ θ ) = l o g ∑ Z P ( Z ∣ θ Z ) P ( X ∣ Z , θ x ) \bm {\ell( \theta ; D )} = log\sum _ { Z } P ( X , Z \,|\, \theta ) = log \sum _ { Z } P ( Z | \theta _ { Z } ) P ( X | Z , \theta _ { x } ) ℓ(θ;D)=logZ∑P(X,Z∣θ)=logZ∑P(Z∣θZ)P(X∣Z,θx)
【二】 Expectation Maximization(EM算法)
【无监督】一种 迭代算法,用于含 隐变量 的概率模型参数的极大似然估计(E 求期望 + M 求极大)
- 观测数据 & 未观测数据
Y = ( Y 1 , Y 2 … Y n ) T Z = ( Z 1 , Z 2 … Z n ) T \bm Y = ( Y _ { 1 } , Y _ { 2 } \ldots Y _ { n } ) ^ { T } \;\;\;\;\;\;\; \bm Z = ( Z _ { 1 } , Z _ { 2 } \ldots Z _ { n } ) ^ { T } Y=(Y1,Y2…Yn)TZ=(Z1,Z2…Zn)T
- 求解 观测数据 的似然函数(MLE)
L ( θ ) = P ( Y ∣ θ ) = ∑ Z P ( Z ∣ θ ) P ( Y ∣ Z , θ ) \bm {L ( \theta )} = \bm {P ( Y | \theta ) }= \sum _ { Z } P ( Z | \theta ) P ( Y | Z , \theta ) L(θ)=P(Y∣θ)=Z∑P(Z∣θ)P(Y∣Z,θ)
- 求解模型参数 θ 的对数极大似然估计
θ ^ = arg m a x θ l o g P ( Y ∣ θ ) \bm {\hat { \theta }} = \operatorname { arg } \operatorname { max _ {\theta}} \, logP(Y | \theta) θ^=argmaxθlogP(Y∣θ)
【三】 手撕 EM 算法(必须掌握)
- 输入:观测变量数据
Y
\bm Y
Y,隐变量数据
Z
\bm Z
Z,联合分布
P
(
Y
,
Z
∣
θ
)
\bm {P(Y,Z∣θ)}
P(Y,Z∣θ) ,条件分布
P
(
Z
∣
Y
,
θ
)
\bm {P(Z∣Y,θ)}
P(Z∣Y,θ)
输出:模型参数 θ \bm θ θ
- 推导过程:
θ
n
+
1
=
arg
m
a
x
θ
L
(
θ
)
−
L
(
θ
n
)
\bm \red {\theta ^ { n + 1 }} = \operatorname { arg } \operatorname { max _ {\theta} } \, \bm {L ( \theta )} - \bm {L ( \theta ^ { n } )}
θn+1=argmaxθL(θ)−L(θn)
=
arg
m
a
x
θ
l
o
g
P
(
Y
∣
θ
)
−
l
o
g
P
(
Y
∣
θ
n
)
= \operatorname { arg } \operatorname { max _ {\theta} } \, logP(Y \,|\, \theta) - logP(Y \,|\, {\theta} ^ n )
=argmaxθlogP(Y∣θ)−logP(Y∣θn)
=
arg
m
a
x
θ
l
o
g
∑
Z
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
−
l
o
g
P
(
Y
∣
θ
n
)
= \operatorname { arg } \operatorname { max _ {\theta} } \, log \sum _ { Z } P(Y\,|\, Z,\theta) P(Z \,|\, \theta)- log P(Y \,|\, {\theta} ^ n )
=argmaxθlogZ∑P(Y∣Z,θ)P(Z∣θ)−logP(Y∣θn)
=
arg
m
a
x
θ
l
o
g
∑
Z
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
⋅
P
(
Z
∣
Y
,
θ
n
)
P
(
Z
∣
Y
,
θ
n
)
−
l
o
g
P
(
Y
∣
θ
n
)
= \operatorname { arg } \operatorname { max _ {\theta} } \, log \sum _ { Z } P(Y \,|\, Z, \theta) P(Z \,|\, \theta) \cdot \bm {\frac { P ( Z \,|\, Y , \theta ^ { n } ) } { P ( Z \,|\, Y , \theta ^ { n } ) }} - log P(Y \,|\, {\theta} ^ n )
=argmaxθlogZ∑P(Y∣Z,θ)P(Z∣θ)⋅P(Z∣Y,θn)P(Z∣Y,θn)−logP(Y∣θn)
=
arg
m
a
x
θ
l
o
g
∑
Z
P
(
Z
∣
Y
,
θ
n
)
⋅
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
n
)
−
l
o
g
P
(
Y
∣
θ
n
)
= \operatorname { arg } \operatorname { max _ {\theta} } \, log \sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } ) }}} - log P(Y \,|\, {\theta} ^ n )
=argmaxθlogZ∑P(Z∣Y,θn)⋅P(Z∣Y,θn)P(Y∣Z,θ)P(Z∣θ)−logP(Y∣θn)
≥
arg
m
a
x
θ
∑
Z
P
(
Z
∣
Y
,
θ
n
)
⋅
l
o
g
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
n
)
−
l
o
g
P
(
Y
∣
θ
n
)
\bm \red {\geq }\operatorname { arg } \operatorname { max _ {\theta} } \, \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } ) }}} - log P(Y \,|\, {\theta} ^ n )}
≥argmaxθZ∑P(Z∣Y,θn)⋅logP(Z∣Y,θn)P(Y∣Z,θ)P(Z∣θ)−logP(Y∣θn)
=
arg
m
a
x
θ
△
(
θ
∣
θ
n
)
(1)
= \operatorname { arg } \operatorname { max _ {\theta} } \bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \tag{1}
=argmaxθ△(θ∣θn)(1)
∴
L
(
θ
)
−
L
(
θ
n
)
≥
△
(
θ
∣
θ
n
)
⇒
【最大化下限】
L
(
θ
)
≥
L
(
θ
n
)
+
△
(
θ
∣
θ
n
)
(2)
\bm \red {\therefore} \;\; \bm {L ( \theta )} - \bm {L ( \theta ^ { n } ) } \geq \bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \;\;\;\Rightarrow\;\; \operatorname {【最大化下限】} \bm {L ( \theta )} \geq \bm {L ( \theta ^ { n } ) } +\bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \tag{2}
∴L(θ)−L(θn)≥△(θ∣θn)⇒【最大化下限】L(θ)≥L(θn)+△(θ∣θn)(2)
∴
θ
n
+
1
=
arg
m
a
x
θ
[
L
(
θ
n
)
+
△
(
θ
∣
θ
n
)
]
\bm \red {\therefore} \;\; \bm \red {\theta ^ { n + 1 }} = \operatorname { arg } \operatorname { max _ {\theta} } \; [ \bm {L ( \theta ^ { n } ) } +\bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \, ]
∴θn+1=argmaxθ[L(θn)+△(θ∣θn)]
=
arg
m
a
x
θ
[
L
(
θ
n
)
+
∑
Z
P
(
Z
∣
Y
,
θ
n
)
⋅
l
o
g
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
n
)
−
l
o
g
P
(
Y
∣
θ
n
)
]
= \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm {L ( \theta ^ { n } ) } + \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } ) }}} - log P(Y \,|\, {\theta} ^ n )} \, ]
=argmaxθ[L(θn)+Z∑P(Z∣Y,θn)⋅logP(Z∣Y,θn)P(Y∣Z,θ)P(Z∣θ)−logP(Y∣θn)]
=
arg
m
a
x
θ
[
L
(
θ
n
)
+
∑
Z
P
(
Z
∣
Y
,
θ
n
)
⋅
l
o
g
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
n
)
P
(
Y
∣
θ
n
)
]
= \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm {L ( \theta ^ { n } ) } + \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } )} P(Y \,|\, {\theta} ^ n ) }}} \, ]
=argmaxθ[L(θn)+Z∑P(Z∣Y,θn)⋅logP(Z∣Y,θn)P(Y∣θn)P(Y∣Z,θ)P(Z∣θ)]
=
arg
m
a
x
θ
[
∑
Z
P
(
Z
∣
Y
,
θ
n
)
⋅
l
o
g
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
]
= \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, { { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } }} \, ]
=argmaxθ[Z∑P(Z∣Y,θn)⋅logP(Y∣Z,θ)P(Z∣θ)]
=
arg
m
a
x
θ
[
∑
Z
P
(
Z
∣
Y
,
θ
n
)
⋅
l
o
g
P
(
Y
,
Z
∣
θ
)
]
= \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, { { P(Y, Z \,|\, \theta) }}} \, ]
=argmaxθ[Z∑P(Z∣Y,θn)⋅logP(Y,Z∣θ)]
=
arg
m
a
x
θ
[
E
Z
∣
Y
,
θ
n
[
l
o
g
P
(
Y
,
Z
∣
θ
)
]
]
(3)
= \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm \red {E _ { Z | Y , \, \theta ^ { n } } \bm [\,{ log } \, { { P(Y, Z \,|\, \theta) }} \,]} \, ] \tag{3}
=argmaxθ[EZ∣Y,θn[logP(Y,Z∣θ)]](3)
-
E
−
S
t
e
p
\bm \red {E - Step}
E−Step:(先求
Z
\bm Z
Z)
arg m a x θ [ E Z ∣ Y , θ n [ l o g P ( Y , Z ∣ θ ) ] ] (E) \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm \red {E _ { Z | Y , \, \theta ^ { n } }} \bm {[\,{ log } \, { { P(Y, Z \,|\, \theta) }} \,] \, ]} \tag{E} argmaxθ[EZ∣Y,θn[logP(Y,Z∣θ)]](E)
-
M
−
S
t
e
p
\bm \red {M - Step}
M−Step:(再求
θ
\bm \theta
θ)
arg m a x θ [ E Z ∣ Y , θ n [ l o g P ( Y , Z ∣ θ ) ] ] (M) \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm {E _ { Z | Y , \, \theta ^ { n } }} \bm \red { [\,{ log } \, { { P(Y, Z \,|\, \theta) }} \,] \, ]} \tag{M} argmaxθ[EZ∣Y,θn[logP(Y,Z∣θ)]](M)
- 重复 E − S t e p E-Step E−Step, M − S t e p M-Step M−Step,直至收敛( L ( θ 1 ) < L ( θ 2 ) < L ( θ 3 ) < . . . < L ( θ n ) L ( \theta ^ { 1 } ) < L ( \theta ^ { 2 } ) < L ( \theta ^ { 3 } ) <...< L ( \theta ^ { n } ) L(θ1)<L(θ2)<L(θ3)<...<L(θn))