主要贡献点
we propose 3D-FM GAN, a novel conditional GAN framework designed specifically for 3D-controllable Face Manipulation, and does not require any tuning after the end-to-end learning phase.
基于conditional GAN做人脸的操作
A StyleGAN conditional generator then takes in both the original image and the manipulated face rendering to synthesize the edited face.
引入了StyleGAN,结合了真实照片和渲染模型的输入。
引入了两种训练策略,既保留人脸的identity,又保留了可编辑性
Moreover, we develop two essential training strategies, reconstruction and disentangled training, to help our model gain abilities of identity preservation and 3D editability.
又引入了multiplicative co-modulation的架构平衡两者
As we find an interesting trade-off between identity and editability in the network structure and the simple encoding strategy is sub-optimal, we propose a novel multiplicative co-modulation architecture for our framework.
方法
整体流程
the generator G, the face reconstruction network FR, and the renderer Rd.
数据集
FFHQ. FFHQ [23] is a human face photo dataset, where most identities only have one corresponding image. For each of the training image P, we extract its render counterpart by R = Rd(FR(P)) to form the (P, R) pair.
Synthetic Dataset. We also require a dataset where each identity has multiple images with various attributes of expression, pose, and illumination. Such a dataset is crucial for model to perform learning for editing. While this kind of high-quality dataset is not publicly available, we leverage DiscoFaceGAN [10], Gd, to synthesize one as follows.
训练策略
分离训练使用了content loss,强调了生成和输入的condition的一致性。
消融实验表明使用两种策略可以更好保持脸部一致,又保留脸部的可编辑。
架构
encoder包含了三种隐空间
分离式地调制,把照片和渲染分别输入到不同的encoder当中
混合式调制,把照片和渲染输入到W,W+encoder当中,用元素间乘法来融合
实验
实验指标
架构的比较
三个encoder的co-modulation效果最好
可控的人脸合成效果
属性可分离的人脸编辑
基于其他图像的属性迁移
脸部驱动
肖像画的编辑
其他方法的比较
指标的比较
可视化比较
DiscoFaceGAN
其他GAN模型
人脸转正& relighting
存在的问题
无法控制头发和皱纹
由于3DMM参数估计不准确造成的差异
合成数据集引入的一些bias