Method
Problem Formulation
定义young face image为 I y \mathbf{I}_y Iy,对应的age为 α y \bm{\alpha}_y αy,是一个one-hot向量
给定目标age
α
o
\bm{\alpha}_o
αo(要求
α
o
>
α
y
\bm{\alpha}_o\gt\bm{\alpha}_y
αo>αy),我们希望学习一个age progressor
G
p
G_p
Gp,能够生成older face image
I
o
\mathbf{I}_o
Io,即
I
o
=
G
p
(
I
y
,
α
o
)
\mathbf{I}_o=G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right )
Io=Gp(Iy,αo)
注:下标
y
y
y表示young,
o
o
o表示old
需要注意的是,数据集使用的是unpaired aging data,即不要求同一个人的young face image和older face image
另一方面,考虑age regression,引入age regressor G r G_r Gr,利用之前生成的 I o \mathbf{I}_o Io重构出young face image,即 I y ′ = G r ( I o , α y ) \mathbf{I}_y'=G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) Iy′=Gr(Io,αy)
我们将
G
p
G_p
Gp和
G
r
G_r
Gr集成为一个统一的framework,得到unified solution for both age progression and regression
如Figure 3所示,整个framework包含2个data flow cycle:age progression cycle和age regression cycle,涉及4个网络:
G
p
G_p
Gp、
D
p
D_p
Dp、
G
r
G_r
Gr、
D
r
D_r
Dr
Network Architecture
本节介绍网络结构,因为age progression和age regression是类似的,所以我们将
G
p
G_p
Gp和
G
r
G_r
Gr统称为
G
G
G,将
D
p
D_p
Dp和
D
r
D_r
Dr统称为
D
D
D
Spatial Attention based Generator
G
p
G_p
Gp和
G
r
G_r
Gr的网络结构是相同的,以
G
p
G_p
Gp为例,结构如Figure 2所示
已有的face aging工作采用的生成器结构是单路的(with single pathway),这样无法保证生成器只关注与aging相关的区域,生成结果会包含age-irrelevant changes和ghosting artifacts
为了解决这个问题,我们引入spatial attention mechanism,采用多路(多分支)的生成器结构
具体来说,如Figure 2(a)所示,设置一个FCN
G
p
A
G_p^A
GpA用于生成attention mask,另一个FCN
G
p
I
G_p^I
GpI是常规的生成器,最终利用attention mask融合两幅图像,得到最终的生成结果
I
o
\mathbf{I}_o
Io
I
o
=
G
p
A
(
I
y
,
α
o
)
⋅
I
y
+
(
1
−
G
p
A
(
I
y
,
α
o
)
)
⋅
G
p
I
(
I
y
,
α
o
)
(
1
)
\mathbf{I}_o=G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right )\cdot\mathbf{I}_y+\left ( 1 - G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )\cdot G_p^I\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \qquad(1)
Io=GpA(Iy,αo)⋅Iy+(1−GpA(Iy,αo))⋅GpI(Iy,αo)(1)
其中
G
p
A
(
I
y
,
α
o
)
∈
[
0
,
1
]
H
×
W
G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right )\in[0,1]^{H\times W}
GpA(Iy,αo)∈[0,1]H×W
注:此处的spatial attention mechanism其实就是GANimation中所用的
Discriminator
判别器
D
D
D的任务是区别real/fake images,同时还对image进行age的回归预测
注:本质上就是加了一个auxiliary predictor,这和换脸的套路是类似的
D D D的结构是PatchGAN,包含6层Conv_4x4,Conv之后连接LeakyReLU
Loss Function
Adversarial Loss
采用least square adversarial loss
L
G
A
N
=
E
I
y
[
(
D
p
I
(
G
p
(
I
y
,
α
o
)
)
−
1
)
2
]
+
E
I
o
[
(
D
p
I
(
I
o
)
−
1
)
2
]
+
E
I
y
[
D
p
I
(
G
p
(
I
y
,
α
o
)
)
2
]
+
E
I
o
[
(
D
r
I
(
G
r
(
I
o
,
α
y
)
)
−
1
)
2
]
+
E
I
y
[
(
D
r
I
(
I
y
)
−
1
)
2
]
+
E
I
o
[
D
r
I
(
G
r
(
I
o
,
α
y
)
)
2
]
(
2
)
\begin{aligned} \mathcal{L}_{GAN}&=\mathbb{E}_{\mathbf{I}_y}\left [ \left ( D_p^I\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-1 \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_o}\left [ \left ( D_p^I\left ( \mathbf{I}_o \right )-1 \right )^2 \right ]+\mathbb{E}_{\mathbf{I}_y}\left [ D_p^I\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_o}\left [ \left ( D_r^I\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-1 \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_y}\left [ \left ( D_r^I\left ( \mathbf{I}_y \right )-1 \right )^2 \right ]+\mathbb{E}_{\mathbf{I}_o}\left [ D_r^I\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )^2 \right ] \qquad(2) \end{aligned}
LGAN=EIy[(DpI(Gp(Iy,αo))−1)2]+EIo[(DpI(Io)−1)2]+EIy[DpI(Gp(Iy,αo))2]+EIo[(DrI(Gr(Io,αy))−1)2]+EIy[(DrI(Iy)−1)2]+EIo[DrI(Gr(Io,αy))2](2)
注:对于
G
G
G来说,最小化
L
G
A
N
\mathcal{L}_{GAN}
LGAN,对于
D
D
D来说,最大化
L
G
A
N
\mathcal{L}_{GAN}
LGAN
Reconstruction Loss
使用L1-norm来避免blurry
L
r
e
c
o
n
=
E
I
y
∥
G
r
(
G
p
(
I
y
,
α
o
)
)
−
I
y
∥
1
+
E
I
o
∥
G
p
(
G
r
(
I
o
,
α
y
)
)
−
I
o
∥
1
(
3
)
\begin{aligned} \mathcal{L}_{recon}&=\mathbb{E}_{\mathbf{I}_y}\left \| G_r\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-\mathbf{I}_y \right \|_1 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| G_p\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-\mathbf{I}_o \right \|_1 \qquad(3) \end{aligned}
Lrecon=EIy∥Gr(Gp(Iy,αo))−Iy∥1+EIo∥Gp(Gr(Io,αy))−Io∥1(3)
Attention Activation Loss
在生成器中,attention mask很容易saturate to 1,为了解决这个问题,提出attention activation loss对attention mask进行约束
L
a
c
t
v
=
E
I
y
∥
G
p
A
(
I
y
,
α
o
)
∥
2
+
E
I
o
∥
G
r
A
(
I
o
,
α
y
)
∥
2
(
4
)
\mathcal{L}_{actv}=\mathbb{E}_{\mathbf{I}_y}\left \| G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right \|_2+\mathbb{E}_{\mathbf{I}_o}\left \| G_r^A\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right \|_2 \qquad(4)
Lactv=EIy∥∥GpA(Iy,αo)∥∥2+EIo∥∥GrA(Io,αy)∥∥2(4)
注:实际上就是对attention mask添加L2正则化,这和GANimiation中的一模一样
Age Regression Loss
在判别器中增加的age predictor分支记为
D
α
D^\alpha
Dα,则对于生成器来说,生成图像的age要与target age越接近越好
L
r
e
g
=
E
I
y
∥
D
p
α
(
G
p
(
I
y
,
α
o
)
)
−
α
o
∥
2
+
E
I
y
∥
D
p
α
(
I
y
)
−
α
y
∥
2
+
E
I
o
∥
D
r
α
(
G
r
(
I
o
,
α
y
)
)
−
α
y
∥
2
+
E
I
o
∥
D
r
α
(
I
o
)
−
α
o
∥
2
(
5
)
\begin{aligned} \mathcal{L}_{reg}&=\mathbb{E}_{\mathbf{I}_y}\left \| D_p^\alpha\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-\bm{\alpha}_o \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_y}\left \| D_p^\alpha\left ( \mathbf{I}_y \right )-\bm{\alpha}_y \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| D_r^\alpha\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-\bm{\alpha}_y \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| D_r^\alpha\left ( \mathbf{I}_o \right )-\bm{\alpha}_o \right \|_2 \qquad(5) \end{aligned}
Lreg=EIy∥∥Dpα(Gp(Iy,αo))−αo∥∥2+EIy∥∥Dpα(Iy)−αy∥∥2+EIo∥Drα(Gr(Io,αy))−αy∥2+EIo∥Drα(Io)−αo∥2(5)
Overall Loss
L
=
L
G
A
N
+
λ
r
e
c
o
n
L
r
e
c
o
n
+
λ
a
c
t
v
L
a
c
t
v
+
λ
r
e
g
L
r
e
g
(
6
)
\mathcal{L}=\mathcal{L}_{GAN}+\lambda_{recon}\mathcal{L}_{recon}+\lambda_{actv}\mathcal{L}_{actv}+\lambda_{reg}\mathcal{L}_{reg} \qquad(6)
L=LGAN+λreconLrecon+λactvLactv+λregLreg(6)
min
G
p
,
G
r
max
D
p
,
D
r
L
(
7
)
\underset{G_p,G_r}{\min}\ \underset{D_p,D_r}{\max}\ \mathcal{L} \qquad(7)
Gp,Grmin Dp,Drmax L(7)
Question:训练
D
p
D_p
Dp和
D
r
D_r
Dr的损失函数感觉不应该直接写成
L
\mathcal{L}
L
【可改进的点】
- 借鉴STGAN,将condition定义为年龄的变化量