变分推断是为了近似获得
P
(
Z
∣
X
)
P(Z | X)
P(Z∣X) ,即隐状态的后验分布。
l
o
g
P
(
X
)
=
l
o
g
P
(
X
,
Z
)
−
l
o
g
P
(
Z
∣
X
)
=
l
o
g
P
(
X
,
Z
)
q
(
Z
)
−
l
o
g
P
(
Z
∣
X
)
q
(
Z
)
\begin{aligned} log P(X) &= log P(X, Z) - log P(Z | X) \\ &= log \frac{P(X, Z)}{q(Z)} - log \frac{P(Z | X)}{q(Z)} \end{aligned}
logP(X)=logP(X,Z)−logP(Z∣X)=logq(Z)P(X,Z)−logq(Z)P(Z∣X)
对两边求
q
(
Z
)
q(Z)
q(Z) 的期望
E
q
(
Z
)
[
l
o
g
P
(
X
)
]
=
∫
q
(
Z
)
l
o
g
P
(
X
)
d
Z
=
l
o
g
P
(
X
)
∫
q
(
Z
)
d
Z
=
l
o
g
P
(
X
)
\begin{aligned} E_{q(Z)}[log P(X)] &= \int q(Z) log P(X) dZ \\ &= log P(X) \int q(Z) dZ \\ &= log P(X) \end{aligned}
Eq(Z)[logP(X)]=∫q(Z)logP(X)dZ=logP(X)∫q(Z)dZ=logP(X)
所以
l
o
g
P
(
X
)
=
∫
q
(
Z
)
l
o
g
P
(
X
,
Z
)
q
(
Z
)
d
Z
−
∫
q
(
Z
)
P
(
Z
∣
X
)
q
(
Z
)
d
Z
=
L
(
q
)
+
K
L
(
q
∥
p
)
\begin{aligned} log P(X) &= \int q(Z) log \frac{P(X, Z)}{q(Z)} dZ - \int q(Z) \frac{P(Z | X)}{q(Z)} dZ \\ &= \mathcal{L}(q) + KL(q \| p) \end{aligned}
logP(X)=∫q(Z)logq(Z)P(X,Z)dZ−∫q(Z)q(Z)P(Z∣X)dZ=L(q)+KL(q∥p)
不了解KL散度的同学可以参考这篇文章:如何理解K-L散度(相对熵)
为了使
q
(
Z
)
q(Z)
q(Z) 满足
P
(
Z
∣
X
)
P(Z | X)
P(Z∣X) 的分布,应最小化
K
L
(
q
∥
p
)
KL(q \| p)
KL(q∥p) 。
X
X
X 为可观测变量,即
l
o
g
P
(
X
)
log P(X)
logP(X) 为常值。所以,最小化
K
L
(
q
∥
p
)
KL(q \| p)
KL(q∥p) 与最大化
L
(
q
)
\mathcal{L}(q)
L(q) 等价。
假设
q
(
Z
)
=
∏
i
q
i
(
Z
i
)
q(Z) = \prod_i q_i(Z_i)
q(Z)=∏iqi(Zi)
L
(
q
)
=
∫
q
(
Z
)
[
l
o
g
P
(
X
,
Z
)
−
l
o
g
q
(
Z
)
]
d
Z
=
∫
∏
i
q
i
(
Z
i
)
l
o
g
P
(
X
,
Z
)
d
Z
−
∫
∏
i
q
i
(
Z
i
)
l
o
g
∏
i
q
i
(
Z
i
)
d
Z
=
∫
q
j
(
Z
j
)
[
∫
∏
i
≠
j
q
i
(
Z
i
)
l
o
g
P
(
X
,
Z
)
d
Z
i
]
d
Z
j
−
∫
∑
j
q
j
(
Z
j
)
l
o
g
q
j
(
Z
j
)
[
∏
i
≠
j
q
i
(
Z
i
)
d
Z
i
]
d
Z
j
=
∫
q
j
(
Z
j
)
[
∫
∏
i
≠
j
q
i
(
Z
i
)
l
o
g
P
(
X
,
Z
)
d
Z
i
]
d
Z
j
−
∑
i
∫
q
i
(
Z
i
)
l
o
g
q
i
(
Z
i
)
d
Z
i
=
∫
q
j
(
Z
j
)
E
q
i
(
Z
i
)
,
i
≠
j
[
l
o
g
P
(
X
,
Z
)
]
d
Z
j
−
∫
q
j
(
Z
j
)
l
o
g
q
j
(
Z
j
)
d
Z
j
+
∑
i
≠
j
∫
q
i
(
Z
i
)
l
o
g
q
i
(
Z
i
)
d
Z
i
=
∫
q
j
(
Z
j
)
l
o
g
P
~
(
X
,
Z
j
)
d
Z
j
−
∫
q
j
(
Z
j
)
l
o
g
q
j
(
Z
j
)
d
Z
j
+
∑
i
≠
j
∫
q
i
(
Z
i
)
l
o
g
q
i
(
Z
i
)
d
Z
i
=
−
K
L
(
q
j
(
Z
j
)
∣
∣
P
~
(
X
,
Z
j
)
)
+
∑
i
≠
j
∫
q
i
(
Z
i
)
l
o
g
q
i
(
Z
i
)
d
Z
i
\begin{aligned} \mathcal{L}(q) &= \int q(Z) [ log P(X, Z) - log q(Z)] dZ \\ &= \int \prod_i q_i(Z_i)log P(X, Z) dZ - \int \prod_i q_i(Z_i) log \prod_i q_i(Z_i) dZ \\ &= \int q_j(Z_j) [\int \prod_{i \ne j} q_i(Z_i)log P(X, Z) dZ_i] dZ_j - \int \sum_j q_j(Z_j) log q_j(Z_j) [\prod_{i \ne j} q_i(Z_i) dZ_i] dZ_j\\ &= \int q_j(Z_j) [\int \prod_{i \ne j} q_i(Z_i)log P(X, Z) dZ_i] dZ_j - \sum_i \int q_i(Z_i) log q_i(Z_i) dZ_i \\ &= \int q_j(Z_j) E_{q_i(Z_i), i \ne j}[log P(X, Z)] dZ_j - \int q_j(Z_j) log q_j(Z_j) dZ_j + \sum_{i \ne j} \int q_i(Z_i) log q_i(Z_i) dZ_i \\ &= \int q_j(Z_j) log \widetilde{P}(X, Z_j) dZ_j - \int q_j(Z_j) log q_j(Z_j) dZ_j + \sum_{i \ne j} \int q_i(Z_i) log q_i(Z_i) dZ_i \\ &= -KL(q_j(Z_j) || \widetilde{P}(X, Z_j)) + \sum_{i \ne j} \int q_i(Z_i) log q_i(Z_i) dZ_i \end{aligned}
L(q)=∫q(Z)[logP(X,Z)−logq(Z)]dZ=∫i∏qi(Zi)logP(X,Z)dZ−∫i∏qi(Zi)logi∏qi(Zi)dZ=∫qj(Zj)[∫i=j∏qi(Zi)logP(X,Z)dZi]dZj−∫j∑qj(Zj)logqj(Zj)[i=j∏qi(Zi)dZi]dZj=∫qj(Zj)[∫i=j∏qi(Zi)logP(X,Z)dZi]dZj−i∑∫qi(Zi)logqi(Zi)dZi=∫qj(Zj)Eqi(Zi),i=j[logP(X,Z)]dZj−∫qj(Zj)logqj(Zj)dZj+i=j∑∫qi(Zi)logqi(Zi)dZi=∫qj(Zj)logP
(X,Zj)dZj−∫qj(Zj)logqj(Zj)dZj+i=j∑∫qi(Zi)logqi(Zi)dZi=−KL(qj(Zj)∣∣P
(X,Zj))+i=j∑∫qi(Zi)logqi(Zi)dZi
定义
l
o
g
P
~
(
X
,
Z
j
)
=
E
q
i
(
Z
i
)
,
i
≠
j
[
l
o
g
P
(
X
,
Z
)
]
log \widetilde{P}(X, Z_j) = E_{q_i(Z_i), i \ne j}[log P(X, Z)]
logP
(X,Zj)=Eqi(Zi),i=j[logP(X,Z)]
CAVI 的思想是当迭代
q
j
(
Z
j
)
q_j(Z_j)
qj(Zj) 时,固定其他
q
i
(
Z
i
)
,
i
≠
j
q_i(Z_i), i \ne j
qi(Zi),i=j ,所以
L
(
q
)
=
−
K
L
(
q
j
(
Z
j
)
∣
∣
P
~
(
X
,
Z
j
)
)
+
c
o
n
s
t
\mathcal{L}(q) = -KL(q_j(Z_j) || \widetilde{P}(X, Z_j)) + const
L(q)=−KL(qj(Zj)∣∣P
(X,Zj))+const
为了最大化
L
(
q
)
\mathcal{L}(q)
L(q) 就是使
q
j
(
Z
j
)
=
P
~
(
X
,
Z
j
)
q_j(Z_j) = \widetilde{P}(X, Z_j)
qj(Zj)=P
(X,Zj) ,即
q
j
∗
(
Z
j
)
=
e
x
p
(
E
q
i
(
Z
i
)
,
i
≠
j
[
l
o
g
P
(
X
,
Z
)
]
)
q_j^*(Z_j) = exp(E_{q_i(Z_i), i \ne j}[log P(X, Z)])
qj∗(Zj)=exp(Eqi(Zi),i=j[logP(X,Z)])
为了使
∑
Z
j
q
j
∗
(
Z
j
)
=
1
\sum_{Z_j} q_j^*(Z_j) = 1
∑Zjqj∗(Zj)=1 ,将上面的式子进行标准化,可得
q
j
∗
(
Z
j
)
=
e
x
p
(
E
q
i
(
Z
i
)
,
i
≠
j
[
l
o
g
P
(
X
,
Z
)
]
)
∫
e
x
p
(
E
q
i
(
Z
i
)
,
i
≠
j
[
l
o
g
P
(
X
,
Z
)
]
)
d
Z
j
q_j^*(Z_j) = \frac{exp(E_{q_i(Z_i), i \ne j}[log P(X, Z)])}{\int exp(E_{q_i(Z_i), i \ne j}[log P(X, Z)]) dZ_j}
qj∗(Zj)=∫exp(Eqi(Zi),i=j[logP(X,Z)])dZjexp(Eqi(Zi),i=j[logP(X,Z)])
迭代直至收敛,即为我们所希望求得的隐状态后验。
Reference
[1]Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer New York