文章目录
1 综述
今天分享一篇2017年的论文《Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks》,Cycle-GAN已经有很多博客对其进行了介绍,这里不再重复。这里主要提一下论文亮点:
主要解决:对于源域和目标域之间,无须建立训练数据间一对一的映射,也可实现这种迁移的问题(Domain Adaptation)。
原文描述:We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples。
论文地址:
《Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks》
代码地址1:junyanz/pytorch-CycleGAN-and-pix2pix
代码地址2:aitorzip/PyTorch-CycleGAN (代码可读性较强,下文源码分析在此基础上进行)
网络效果图:
2 网络结构
在训练过程中,判别器和生成器的参数是分别训练的,整个过程有点像进化论中捕食者和被捕食者迭代进化的过程:
当我们固定住生成器的参数训练判别器时,判别器便能学到更好的判别技巧,当我们固定住判别器参数训练生成器时,生成器为了骗过现在更厉害的判别器,被迫产生出更好质量的图片。两者便在这迭代学习的过程中逐步进化,最终达到动态平衡;
2.1 Unpaired image data
关于paired training data 和 unpaired training data 的区别,论文中做了说明。pix2pix模型必须要求成对数据,而CycleGAN利用非成对数据也能进行训练;
2.2 Cycle Consistency Loss
是本论文的亮点,引入了循环映射和Cycle Consistency Loss(循环一致性损失)。
对抗训练可以学习和产生与目标域Y相同分布的输出。但单纯使用一般的 Gan-loss 损失是无法进行训练的。原因在于,在足够大的样本容量下,网络可以将相同的输入图像集合映射到目标域中图像的任何随机排列,其中任何学习的映射可以归纳出与目标分布匹配的输出分布(即:映射F完全可以将所有x都映射为Y空间中的同一张图片,使损失无效化)
因此我们希望:
x -> G(x) -> F(G(x)) ≈ x,称作 forward cycle consistency;
同理,y -> F(y) -> G(F(y)) ≈ y,称作 backward cycle consistency;
就是说,将X的图片转换到Y空间后,应该还可以转换回来。这样就杜绝模型把所有X的图片都转换为Y空间中的同一张图片了,原文的解释如下:
Cycle Consistency Loss 函数实现如下:
2.3 Identity Loss
在论文 application 部分之中提及了:
2.4 Adversarial Loss
GAN网络中都具有的Loss,函数如下:
但在实际实现中,借鉴了LSGAN,对 adversarial loss (对抗损失)进行了改进。用最小二乘损失代替了负对数似然目标;
2.5 网络与训练细节
(1)Generator采用的是Perceptual losses for real-time style transfer and super-resolution 一文中的网络结构;几个resblock组成的网络,降采样部分采用 stride 卷积,增采样部分采用反卷积;
(2)Discriminator 采用的仍是 pix2pix 中的 70x70 的PatchGANs 结构;
(3)图片使用了 Instance Normalization 而非经典DCGAN中所使用的Batch Normalization;
(4)使用了 Reflection padding 而非普通的 Zero padding;
(5)训练判别器时还会用到生成器产生的历史数据;
(6)Lr=0.0002。对于前100个周期,保持相同的学习速率0.0002,然后在接下来的100个周期内线性衰减到0;
3 结果对比
3.1 Cycle Consistency Loss效果
3.2 其他GAN网络对比
论文中给出了不同GAN网络结果:
4 源码解析
此处展示CycleGAN结构代码,对照网络结构看起来更易理解;
4.1 Generator和Discrmiator实现
import torch.nn as nn
import torch.nn.functional as F
import torch
class ResidualBlock(nn.Module):
def __init__(self, in_features):
super(ResidualBlock, self).__init__()
conv_block = [ nn.ReflectionPad2d(1),
nn.Conv2d(in_features, in_features, 3),
nn.InstanceNorm2d(in_features),
nn.ReLU(inplace=True),
nn.ReflectionPad2d(1),
nn.Conv2d(in_features, in_features, 3),
nn.InstanceNorm2d(in_features) ]
self.conv_block = nn.Sequential(*conv_block)
def forward(self, x):
return x + self.conv_block(x)
class Generator(nn.Module):
def __init__(self, input_nc, output_nc, n_residual_blocks=9):
super(Generator, self).__init__()
# Initial convolution block
model = [ nn.ReflectionPad2d(3),
nn.Conv2d(input_nc, 64, 7),
nn.InstanceNorm2d(64),
nn.ReLU(inplace=True) ]
# Downsampling
in_features = 64
out_features = in_features*2
for _ in range(2):
model += [ nn.Conv2d(in_features, out_features, 3, stride=2, padding=1),
nn.InstanceNorm2d(out_features),
nn.ReLU(inplace=True) ]
in_features = out_features
out_features = in_features*2
# Residual blocks
for _ in range(n_residual_blocks):
model += [ResidualBlock(in_features)]
# Upsampling
out_features = in_features//2
for _ in range(2):
model += [ nn.ConvTranspose2d(in_features, out_features, 3, stride=2, padding=1, output_padding=1),
nn.InstanceNorm2d(out_features),
nn.ReLU(inplace=True) ]
in_features = out_features
out_features = in_features//2
# Output layer
model += [ nn.ReflectionPad2d(3),
nn.Conv2d(64, output_nc, 7),
nn.Tanh() ]
self.model = nn.Sequential(*model)
def forward(self, x):
return self.model(x)
class Discriminator(nn.Module):
def __init__(self, input_nc):
super(Discriminator, self).__init__()
# A bunch of convolutions one after another
model = [ nn.Conv2d(input_nc, 64, 4, stride=2, padding=1),
nn.LeakyReLU(0.2, inplace=True) ]
model += [ nn.Conv2d(64, 128, 4, stride=2, padding=1),
nn.InstanceNorm2d(128),
nn.LeakyReLU(0.2, inplace=True) ]
model += [ nn.Conv2d(128, 256, 4, stride=2, padding=1),
nn.InstanceNorm2d(256),
nn.LeakyReLU(0.2, inplace=True) ]
model += [ nn.Conv2d(256, 512, 4, padding=1),
nn.InstanceNorm2d(512),
nn.LeakyReLU(0.2, inplace=True) ]
# FCN classification layer
model += [nn.Conv2d(512, 1, 4, padding=1)]
self.model = nn.Sequential(*model)
def forward(self, x):
x = self.model(x)
# Average pooling and flatten
result = F.avg_pool2d(x, x.size()[2:]).view(x.size()[0], -1)
return torch.squeeze(result.T)
4.1 Loss实现
###### Training ######
iter_num = 0
for epoch in range(opt.epoch, opt.n_epochs):
for i, batch in enumerate(dataloader):
# Set model input
real_A = Variable(input_A.copy_(batch['A']))
real_B = Variable(input_B.copy_(batch['B']))
###### Generators A2B and B2A ######
optimizer_G.zero_grad()
# Identity loss;
same_B = netG_A2B(real_B)
loss_identity_B = criterion_identity(same_B, real_B) * 5.0
# G_B2A(A) should equal A if real A is fed
same_A = netG_B2A(real_A)
loss_identity_A = criterion_identity(same_A, real_A) * 5.0
# GAN loss;
fake_B = netG_A2B(real_A)
pred_fake = netD_B(fake_B)
loss_GAN_A2B = criterion_GAN(pred_fake, target_real)
fake_A = netG_B2A(real_B)
pred_fake = netD_A(fake_A)
loss_GAN_B2A = criterion_GAN(pred_fake, target_real)
# Cycle loss;
recovered_A = netG_B2A(fake_B)
loss_cycle_ABA = criterion_cycle(recovered_A, real_A) * 10.0
recovered_B = netG_A2B(fake_A)
loss_cycle_BAB = criterion_cycle(recovered_B, real_B) * 10.0
# Total loss
loss_G = loss_identity_A + loss_identity_B + loss_GAN_A2B + loss_GAN_B2A + loss_cycle_ABA + loss_cycle_BAB
loss_G.backward()
optimizer_G.step()
##################
###### Discriminator A ######
optimizer_D_A.zero_grad()
# Real loss
pred_real = netD_A(real_A)
loss_D_real = criterion_GAN(pred_real, target_real)
# Fake loss
fake_A = fake_A_buffer.push_and_pop(fake_A)
pred_fake = netD_A(fake_A.detach())
loss_D_fake = criterion_GAN(pred_fake, target_fake)
# Total loss
loss_D_A = (loss_D_real + loss_D_fake) * 0.5
loss_D_A.backward()
optimizer_D_A.step()
###################################
###### Discriminator B ######
optimizer_D_B.zero_grad()
# Real loss
pred_real = netD_B(real_B)
loss_D_real = criterion_GAN(pred_real, target_real)
# Fake loss
fake_B = fake_B_buffer.push_and_pop(fake_B)
pred_fake = netD_B(fake_B.detach())
loss_D_fake = criterion_GAN(pred_fake, target_fake)
# Total loss
loss_D_B = (loss_D_real + loss_D_fake) * 0.5
loss_D_B.backward()
optimizer_D_B.step()
###################################