PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer(CVPR20)

3. PSGAN

3.1. Formulation

source image domain X X X, reference image domain Y Y Y

domain X X X上有 N N N个样本, { x n } n = 1 , ⋯   , N , x n ∈ X \left \{ x^n \right \}_{n=1,\cdots,N}, x^n\in X {xn}n=1,,N,xnX;domain Y Y Y上有 M M M个样本, { y m } m = 1 , ⋯   , M , y m ∈ Y \left \{ y^m \right \}_{m=1,\cdots,M}, y^m\in Y {ym}m=1,,M,ymY

domain X X X上的分布 P X \mathcal{P}_X PX,domain Y Y Y上的分布 P Y \mathcal{P}_Y PY

学习目标是一个transfer function G : { x , y } → x ~ G:\left \{ x, y \right \}\rightarrow\tilde{x} G:{x,y}x~,使得 x ~ \tilde{x} x~包含 y y y的makeup style,以及 x x x的identity

3.2. Framework

在这里插入图片描述
Overall

PSGAN的framework如Fig. 2所示

  1. Makeup distill network(MDNet),从reference image y y y中提取makeup style,共有2个成分 γ , β \gamma, \beta γ,β,称为makeup matrices
  2. Attentive makeup morphing module(AMM module),因为source image x x x和reference image y y y之间的expression和pose差异很大,所以提出AMM module,用于morph the two makeup matrices λ , β \lambda, \beta λ,β to two new matrices λ ′ , β ′ \lambda', \beta' λ,β, which are adaptive to the source image by considering the similarities between pixels of the source and reference
  3. Makeup apply network(MANet),将 λ ′ , β ′ \lambda', \beta' λ,β作用在MANet的bottleneck feature map上

Makeup distill network(MDNet)

MDNet的网络结构为StarGAN的encoder-bottleneck部分(bottleneck指residual block),负责提取 the makeup related features(如唇彩、眼影等),这些feature被表示为2个makeup matrices γ , β \gamma, \beta γ,β

如Fig.2(B)所示,MDNet的输出为feature map V y ∈ R C × H × W \mathbf{V}_\mathbf{y}\in\mathbb{R}^{C\times H\times W} VyRC×H×W,后接2个并列的1x1 conv layer,得到 γ ∈ R 1 × H × W , β ∈ R 1 × H × W \gamma\in\mathbb{R}^{1\times H\times W}, \beta\in\mathbb{R}^{1\times H\times W} γR1×H×W,βR1×H×W

Attentive makeup morphing module(AMM module)

因为source image x x x和reference image y y y之间的expression和pose差异很大,所以不能直接将 γ , β \gamma, \beta γ,β直接作用在 source image x x x
Q:可以认为 γ , β \gamma, \beta γ,β中仍然包含reference image y y y的expression和pose等信息吗?

AMM module计算一个attentive matrix A ∈ R H W × H W A\in\mathbb{R}^{HW\times HW} ARHW×HW to specify how a pixel in the source image x x x is morphed from the pixels in the reference image y y y,where A i , j A_{i,j} Ai,j indicates the attentive value between the i i i-th pixel x i x_i xi in image x x x and the j j j-th pixel y j y_j yj in image y y y
理解:假设在 x x x中position i i i是眼角的位置,在 y y y中position j j j也是眼角的位置,那么 A i , j A_{i,j} Ai,j的值应该比较大,意味着 x ~ \tilde{x} x~中position i i i的像素值应该参考 y y y中position j j j的像素值,才能实现较好的眼影迁移
(有个缺点,既然把 H H H W W W乘起来了,一定程度上丢失了spatial information)

引入68个facial landmarks作为anchor points
以鼻尖处的landmark为例,对于 x x x的所有position,计算该position i i i到鼻尖x的距离(有正有负),得到一个2维vector,于是所有68 landmark就可以得到136维向量, p i ∈ R 136 , i = 1 , ⋯   , H × W \mathbf{p}_i\in\mathbb{R}^{136}, i=1,\cdots,H\times W piR136,i=1,,H×W,称为relative position features
p = [ f ( x i ) − f ( l 1 ) , f ( x i ) − f ( l 2 ) , ⋯   , f ( x i ) − f ( l 68 ) g ( x i ) − g ( l 1 ) , g ( x i ) − g ( l 2 ) , ⋯   , g ( x i ) − g ( l 68 ) ] ( 1 ) \begin{aligned} \mathbf{p}=&[ f(x_i)-f(l_1), f(x_i)-f(l_2),\cdots,f(x_i)-f(l_{68}) \\ &g(x_i)-g(l_1), g(x_i)-g(l_2),\cdots,g(x_i)-g(l_{68}) ] \qquad(1) \end{aligned} p=[f(xi)f(l1),f(xi)f(l2),,f(xi)f(l68)g(xi)g(l1),g(xi)g(l2),,g(xi)g(l68)](1)
where f ( ⋅ ) f(\cdot) f() and g ( ⋅ ) g(\cdot) g() indicate the coordinates on x x x and y y y axes, l i l_i li indicates the i i i-th facial landmark
思考: p \mathbf{p} p的维度应该是 H × W × 136 H\times W\times136 H×W×136

既然是landmark,那么必然会存在face size的差异,因此令 p \mathbf{p} p单位化,即 p ∥ p ∥ \frac{\mathbf{p}}{\left \| \mathbf{p} \right \|} pp(为何不是将坐标转换到 [ 0 , 1 ] [0, 1] [0,1]之间?)

Moreover, to avoid unreasonable sampling pixels with similar relative positions but different semantics, we also consider the visual similarities between pixels

Fig.2(c)举了一个例子

【源代码】
face parser工具提供的标签
0:background,1:face,2:left-eyebrown,3:right-eyebrown,
4:left-eye,5:right-eye,6:nose,7:upper-lip,8:teeth,
9:under-lip,10:hair,11:left-ear,12:right-ear,13:neck
在这里插入图片描述
在这里插入图片描述

解析源代码

运行demo推理

python demo.py
python demo.py --device cuda --speed	# 使用GPU,并且测试推理时间
【demo.py】
args = parser.parse_args(),执行后args的内容如下
args.config_file = 'configs/base.yaml'
args.device = 'cpu'
args.model_path = 'assets/models/G.pth'
args.opts = []
args.reference_dir = 'assets/images/makeup'
args.source_path = './assets/images/non-makeup/xfsy_0106.png'
args.speed = False

config = setup_config(args),执行后config为fvcore.common.config.CfgNode型,打印如下
DATA:
  BATCH_SIZE: 1
  IMG_SIZE: 256
  NUM_WORKERS: 4
  PATH: ./data
LOG:
  LOG_PATH: log/
  LOG_STEP: 8
  SNAPSHOT_PATH: snapshot/
  SNAPSHOT_STEP: 1024
  VIS_PATH: visulization/
  VIS_STEP: 2048
LOSS:
  LAMBDA_A: 10.0
  LAMBDA_B: 10.0
  LAMBDA_CLS: 1
  LAMBDA_EYE: 1
  LAMBDA_HIS: 1
  LAMBDA_HIS_EYE: 1
  LAMBDA_HIS_LIP: 1
  LAMBDA_HIS_SKIN: 0.1
  LAMBDA_IDT: 0.5
  LAMBDA_REC: 10
  LAMBDA_SKIN: 0.1
  LAMBDA_VGG: 0.005
MODEL:
  D_CONV_DIM: 64
  D_REPEAT_NUM: 3
  G_CONV_DIM: 64
  G_REPEAT_NUM: 6
  NORM: SN
  WEIGHTS: assets/models
POSTPROCESS:
  WILL_DENOISE: False
PREPROCESS:
  DOWN_RATIO: 0.23529411764705885
  FACE_CLASS: [1, 6]
  LANDMARK_POINTS: 68
  LIP_CLASS: [7, 9]
  UP_RATIO: 0.7058823529411765
  WIDTH_RATIO: 0.23529411764705885
TRAINING:
  BETA1: 0.5
  BETA2: 0.999
  C_DIM: 2
  D_LR: 0.0002
  G_LR: 0.0002
  G_STEP: 1
  NUM_EPOCHS: 50
  NUM_EPOCHS_DECAY: 0
以上参数来自psgan/config.py, configs/base.yaml以及args
【psgan/inference.py】
source_input, face, crop_face = self.preprocess(source)
  • 3
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值