3. Method
定义face image x ∈ X x\in X x∈X,给定target facial structural information c c c,学习一个mapping G \mathcal{G} G,将 x x x转换为output image x ~ \tilde{x} x~
生成图像需要保留 x x x的appearance并且体现 c c c表示的structural changes,将 G \mathcal{G} G分解为 ϕ a p p \phi_{app} ϕapp和 μ s t r \mu_{str} μstr
具体来,提取pose-invariant appearance representation z = ϕ a p p ( x , c ) z=\phi_{app}(x, c) z=ϕapp(x,c),提取structural representation y = μ s t r ( c ) y=\mu_{str}(c) y=μstr(c)
将conditional variational auto-encoder(C-VAE)作为baseline,考虑conditional data-log-likelihood
p
(
x
∣
y
)
p(x\mid y)
p(x∣y)
log
p
(
x
∣
y
)
⩾
E
q
\log p(x\mid y)\geqslant \mathbb{E}_q
logp(x∣y)⩾Eq