Landmark Assisted CycleGAN for Cartoon Face Generation

3. Our Method

3.1. Review of CycleGAN

给定来自两个domain的unpaired training samples x ∈ X , y ∈ Y x\in X, y\in Y xX,yY,对于其从 X X X Y Y Y的mapping G X → Y G_{X\rightarrow Y} GXY,及其判别器 D Y D_Y DY,adversarial loss定义如下
L G A N ( G X → Y , D Y ) = E y [ log ⁡ D y ( y ) ] + E x [ log ⁡ ( 1 − D Y ( G X → Y ( x ) ) ) ] ( 1 ) \begin{aligned} \mathcal{L}_{GAN}&\left ( G_{X\rightarrow Y}, D_Y \right )=\mathbb{E}_y\left [ \log D_y(y) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{X\rightarrow Y}(x) \right ) \right ) \right ] \qquad(1) \end{aligned} LGAN(GXY,DY)=Ey[logDy(y)]+Ex[log(1DY(GXY(x)))](1)

CycleGAN学习正向和反向的mapping,the cycle consistency
loss如下
L c y c = ∥ G Y → X ( G X → Y ( x ) ) − x ∥ 1 + ∥ G X → Y ( G Y → X ( y ) ) − y ∥ 1 ( 2 ) \begin{aligned} \mathcal{L}_{cyc}=&\left \| G_{Y\rightarrow X}\left ( G_{X\rightarrow Y}(x) \right )-x \right \|_1+ \\ &\left \| G_{X\rightarrow Y}\left ( G_{Y\rightarrow X}(y) \right )-y \right \|_1 \qquad(2) \end{aligned} Lcyc=GYX(GXY(x))x1+GXY(GYX(y))y1(2)

CycleGAN的total objective function定义如下
L ( G X → Y , G Y → X , D X , D Y ) = L G A N ( G X → Y , D Y ) + L G A N ( G X → Y , D Y ) + L c y c ( 3 ) \begin{aligned} \mathcal{L}\big ( G_{X\rightarrow Y}, &G_{Y\rightarrow X}, D_X, D_Y \big ) = \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \mathcal{L}_{cyc} \qquad(3) \end{aligned} L(GXY,GYX,DX,DY)=LGAN(GXY,DY)+LGAN(GXY,DY)+Lcyc(3)

本文定义 X X X为real face domain, Y Y Y为cartoon face domain

3.2. Cartoon Face Landmark Assisted CycleGAN

3.2.1 Landmark Consistency Loss
L c ( G ( X , L ) → Y ) = ∥ R Y ( G ( X , L ) → Y ( x , l ) ) − l ∥ 2 ( 4 ) \begin{aligned} \mathcal{L}_c\big ( &G_{(X,L)\rightarrow Y} \big )= \\ &\left \| R_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ) \right )-l \right \|_2 \qquad(4) \end{aligned} Lc(G(X,L)Y)=RY(G(X,L)Y(x,l))l2(4)
其中 l ∈ L l\in L lL是input landmark heatmap, R R R是一个预训练的U-Net,用于预测landmark heatmap, R Y R_Y RY表示domain Y Y Y中的landmark regressor

公式(4)的含义为,对于real face image x x x及其landmark l l l,送入生成器 G ( X , L ) → Y G_{(X,L)\rightarrow Y} G(X,L)Y生成图像,对于生成的图像使用 R Y R_Y RY预测landmark,应该尽可能地与 l l l接近
在这里插入图片描述
3.2.2 Landmark Matched Global Discriminator
如Figure 2所示,对于translation X → Y X\rightarrow Y XY,unconditional global discriminator D Y D_Y DY produces more realistic cartoon faces,conditional global discriminator D Y g c D_Y^{g_c} DYgc aims to generate landmark-matched cartoon faces with landmark heat map l ∈ L l\in L lL as part of input
L G A N ( G ( X , L ) → Y , D Y g c ) = E y [ log ⁡ D Y ( y , l ) ] + E x [ log ⁡ ( 1 − D Y ( G ( X , L ) → Y ( x , l ) , l ) ) ] ( 5 ) \begin{aligned} \mathcal{L}_{GAN}\big ( &G_{(X,L)\rightarrow Y}, D_Y^{g_c} \big )=\mathbb{E}_y\left [ \log D_Y\left ( y,l \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ),l \right ) \right ) \right ] \qquad(5) \end{aligned} LGAN(G(X,L)Y,DYgc)=Ey[logDY(y,l)]+Ex[log(1DY(G(X,L)Y(x,l),l))](5)
在这里插入图片描述
3.2.3 Landmark Guided Local Discriminator
在眼睛、鼻子、嘴巴的区域引入3个local discriminators,其adversarial loss定义如下
L G A N l o c a l X → Y = ∑ i = 1 3 λ l i ⋅ L G A N p a t c h ( G ( X , L ) → Y , D Y l i ) = ∑ i = 1 3 λ l i { E y [ log ⁡ D Y l i ( y p ) ] + E x [ log ⁡ ( 1 − D Y l i ( [ G ( X , L ) → Y ( x ) ] p ) ) ] } ( 6 ) \begin{aligned} &\mathcal{L}_{GAN_{local}^{X\rightarrow Y}}=\sum_{i=1}^{3}\lambda_{l_i}\cdot\mathcal{L}_{GAN_{patch}}\left ( G_{(X,L)\rightarrow Y}, D_Y^{l_i} \right ) \\ &=\sum_{i=1}^{3}\lambda_{l_i}\Big \{ \mathbb{E}_y\left [ \log D_Y^{l_i}\left ( y_p \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y^{l_i}\left ( \left [ G_{(X,L)\rightarrow Y}(x) \right ]_p \right ) \right ) \right ]\Big \} \qquad(6) \end{aligned} LGANlocalXY=i=13λliLGANpatch(G(X,L)Y,DYli)=i=13λli{Ey[logDYli(yp)]+Ex[log(1DYli([G(X,L)Y(x)]p))]}(6)
其中 y p y_p yp [ G ( X , L ) → Y ( x ) ] p \left [ G_{(X,L)\rightarrow Y}(x) \right ]_p [G(X,L)Y(x)]p分别表示real cartoon image与generated cartoon image的local patch

3.3. Network Training

3.3.1 Two Stage Training

Stage I 首先在framework中去掉local discriminator训练100K iterations,得到coarse results

Stage II 使用pre-trained landmark prediction network对coarse images预测landmark,利用landmark提取local patch,送入local discriminator得到更精确的生成结果

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值