IMPORTANCE WEIGHTED AUTOENCODERS
这篇文章主要是提出一种更紧的
E
L
B
O
ELBO
ELBO
L
k
(
x
)
=
E
h
1
,
…
,
h
k
∼
q
(
h
∣
x
)
[
log
1
k
∑
i
=
1
k
p
(
x
,
h
i
)
q
(
h
i
∣
x
)
]
\mathcal{L}_{k}(\mathbf{x})=\mathbb{E}_{\mathbf{h}_{1}, \ldots, \mathbf{h}_{k} \sim q(\mathbf{h} | \mathbf{x})}\left[\log \frac{1}{k} \sum_{i=1}^{k} \frac{p\left(\mathbf{x}, \mathbf{h}_{i}\right)}{q\left(\mathbf{h}_{i} | \mathbf{x}\right)}\right]
Lk(x)=Eh1,…,hk∼q(h∣x)[logk1i=1∑kq(hi∣x)p(x,hi)]其中
w
i
=
p
(
x
,
h
i
)
/
q
(
h
i
∣
x
)
w_{i}=p\left(\mathbf{x}, \mathbf{h}_{i}\right) / q\left(\mathbf{h}_{i} | \mathbf{x}\right)
wi=p(x,hi)/q(hi∣x),可以证明
L
k
=
E
[
log
1
k
∑
i
=
1
k
w
i
]
≤
log
E
[
1
k
∑
i
=
1
k
w
i
]
=
log
p
(
x
)
\mathcal{L}_{k}=\mathbb{E}\left[\log \frac{1}{k} \sum_{i=1}^{k} w_{i}\right] \leq \log \mathbb{E}\left[\frac{1}{k} \sum_{i=1}^{k} w_{i}\right]=\log p(\mathbf{x})
Lk=E[logk1i=1∑kwi]≤logE[k1i=1∑kwi]=logp(x)当
k
=
1
k=1
k=1时,就等价为一般VAE的
E
L
B
O
ELBO
ELBO。可以发现随着
k
k
k的增加,
E
L
B
O
ELBO
ELBO更紧
log
p
(
x
)
≥
L
k
+
1
≥
L
k
\log p(\mathbf{x}) \geq \mathcal{L}_{k+1} \geq \mathcal{L}_{k}
logp(x)≥Lk+1≥Lk这个同样是一个无偏估计。同样可以使用重参数,其梯度估计如下
∇
θ
L
k
(
x
)
=
∇
θ
E
h
1
,
…
,
h
k
[
log
1
k
∑
i
=
1
k
w
i
]
=
∇
θ
E
ϵ
1
,
…
,
ϵ
k
[
log
1
k
∑
i
=
1
k
w
(
x
,
h
(
x
,
ϵ
i
,
θ
)
,
θ
)
]
=
E
ϵ
1
,
…
,
ϵ
k
[
∇
θ
log
1
k
∑
i
=
1
k
w
(
x
,
h
(
x
,
ϵ
i
,
θ
)
,
θ
)
]
=
E
ϵ
1
,
…
,
ϵ
k
[
∑
i
=
1
k
w
~
i
∇
θ
log
w
(
x
,
h
(
x
,
ϵ
i
,
θ
)
,
θ
)
]
\begin{aligned} \nabla_{\boldsymbol{\theta}} \mathcal{L}_{k}(\mathbf{x})=\nabla_{\boldsymbol{\theta}} \mathbb{E}_{\mathbf{h}_{1}, \ldots, \mathbf{h}_{k}}\left[\log \frac{1}{k} \sum_{i=1}^{k} w_{i}\right] &=\nabla_{\boldsymbol{\theta}} \mathbb{E}_{\boldsymbol{\epsilon}_{1}, \ldots, \boldsymbol{\epsilon}_{k}}\left[\log \frac{1}{k} \sum_{i=1}^{k} w\left(\mathbf{x}, \mathbf{h}\left(\mathbf{x}, \boldsymbol{\epsilon}_{i}, \boldsymbol{\theta}\right), \boldsymbol{\theta}\right)\right] \\ &=\mathbb{E}_{\boldsymbol{\epsilon}_{1}, \ldots, \boldsymbol{\epsilon}_{k}}\left[\nabla_{\boldsymbol{\theta}} \log \frac{1}{k} \sum_{i=1}^{k} w\left(\mathbf{x}, \mathbf{h}\left(\mathbf{x}, \boldsymbol{\epsilon}_{i}, \boldsymbol{\theta}\right), \boldsymbol{\theta}\right)\right] \\ &=\mathbb{E}_{\boldsymbol{\epsilon}_{1}, \ldots, \boldsymbol{\epsilon}_{k}}\left[\sum_{i=1}^{k} \widetilde{w}_{i} \nabla_{\boldsymbol{\theta}} \log w\left(\mathbf{x}, \mathbf{h}\left(\mathbf{x}, \boldsymbol{\epsilon}_{i}, \boldsymbol{\theta}\right), \boldsymbol{\theta}\right)\right] \end{aligned}
∇θLk(x)=∇θEh1,…,hk[logk1i=1∑kwi]=∇θEϵ1,…,ϵk[logk1i=1∑kw(x,h(x,ϵi,θ),θ)]=Eϵ1,…,ϵk[∇θlogk1i=1∑kw(x,h(x,ϵi,θ),θ)]=Eϵ1,…,ϵk[i=1∑kw
i∇θlogw(x,h(x,ϵi,θ),θ)]其中
w
i
~
=
w
i
/
∑
i
=
1
k
w
i
\widetilde{w_{i}}=w_{i} / \sum_{i=1}^{k} w_{i}
wi
=wi/∑i=1kwi。利用MC估计梯度
∑
i
=
1
k
w
i
~
∇
θ
log
w
(
x
,
h
(
ϵ
i
,
x
,
θ
)
,
θ
)
\sum_{i=1}^{k} \widetilde{w_{i}} \nabla_{\boldsymbol{\theta}} \log w\left(\mathbf{x}, \mathbf{h}\left(\boldsymbol{\epsilon}_{i}, \mathbf{x}, \boldsymbol{\theta}\right), \boldsymbol{\theta}\right)
i=1∑kwi
∇θlogw(x,h(ϵi,x,θ),θ)IWAE采用的方法为REINFORCE-like,而不能像VAE那样对
K
L
KL
KL部分进行解析分析。
在这篇文章中,引入了一个多层的隐藏变量,即为
p
(
x
∣
θ
)
=
∑
z
1
,
…
,
z
L
p
(
z
L
∣
θ
)
p
(
z
L
−
1
∣
z
L
,
θ
)
⋯
p
(
x
∣
z
1
,
θ
)
q
(
z
∣
x
)
=
q
(
z
1
∣
x
)
q
(
z
2
∣
z
1
)
⋯
q
(
z
L
∣
z
L
−
1
)
\begin{array}{c}{p(\mathbf{x} | \boldsymbol{\theta})=\sum_{\mathbf{z}^{1}, \ldots, \mathbf{z}^{L}} p\left(\mathbf{z}^{L} | \boldsymbol{\theta}\right) p\left(\mathbf{z}^{L-1} | \mathbf{z}^{L}, \boldsymbol{\theta}\right) \cdots p\left(\mathbf{x} | \mathbf{z}^{1}, \boldsymbol{\theta}\right)} \\ {q(\mathbf{z} | \mathbf{x})=q\left(\mathbf{z}^{1} | \mathbf{x}\right) q\left(\mathbf{z}^{2} | \mathbf{z}^{1}\right) \cdots q\left(\mathbf{z}^{L} | \mathbf{z}^{L-1}\right)}\end{array}
p(x∣θ)=∑z1,…,zLp(zL∣θ)p(zL−1∣zL,θ)⋯p(x∣z1,θ)q(z∣x)=q(z1∣x)q(z2∣z1)⋯q(zL∣zL−1)
这个模型在之后的Ladder-VAE会再次提到。