Image-to-Image Translation with Conditional Adversarial Networks
Paper:https://arxiv.org/pdf/1611.07004.pdf
Code:https://github.com/affinelayer/Pix2Pix-tensorflow
Tips:CVPR2017的一篇paper。
(阅读笔记)
1.Main idea
- 用条件GAN解决图像到图像的转换问题。a general-purpose solution to image-to-image translation problems.
- 去学习损失函数来实现图像到图像的映射关系。learn a loss function to train this mapping.
2.Intro
- 类似于语言翻译,给出了图像到图像的定义解释。we define automatic image-to-image translation as the task of translating one possible representation of a scene into another.
- 虽然CNN已经取得了很优秀的结果,但还是需要一个目标函数。In other words, we still have to tell the CNN what we wish it to minimize.
得益于GAN,所以可以直接学到一个高维的Loss function。 - 之前的大多相关工作都是学习图像与图像之间的结构形式的损失,然后介绍了条件GAN的发展。
3.Details
- 目标函数与原始GAN的目标函数差不多,只是添加了L1损失,如下式:
L L 1 ( G ) = E x , y , z [ ∥ y − G ( x , z ) ∥ 1 ] \begin{aligned} \mathcal{L}_{L1}(G) &= \mathbb{E}_{x,y,z} \left[ \|y-G(x,z) \|_1 \right] \\ \end{aligned} LL1(G)=Ex,y,z[∥y−G(x,z)∥1]
arg min G max D L c G A N ( G , D ) + λ L L 1 ( G ) \begin{aligned} \arg \min_G & \max_D \mathcal{L}_{cGAN}(G,D) + \lambda \mathcal{L}_{L1}(G) \\ \end{aligned} argGminDmaxLcGAN(G,D)+λLL1(G)
注意到如果不加噪声 z z z,那么生成器只会学习到定式的函数(只会输出与输入 x x x很类似的结果),这样的结果是不够好的。 - 生成器和U-net类似,自编码器并有跳跃连接的形式。
判别器是一个马尔科夫过程(patchGAN),并不是整张图片进行判别,而是一个区域一个区域(patch)的判别,最后结果求平均得分。This discriminator tries to classify if each N × N N \times N N×N patch in an image is real or fake.
这样以后,运行速度更快,参数更少,也能得到很好的结果。produce high quality results; has fewer parameters, runs faster, and can be applied to arbitrarily large images. - 但是代码的实现却还是和其他GAN一样,并没有发现patch的具体设置,于是:
The difference between a PatchGAN and regular GAN discriminator is that rather the regular GAN maps from a 256x256 image to a single scalar output, which signifies “real” or “fake”, whereas the PatchGAN maps from 256x256 to an N × N N \times N N×N array of outputs X X X, where each X i j X_{ij} Xij signifies whether the patch i , j i,j i,j in the image is real or fake.
参考:https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/39
Maybe it would have been better if we called it a “Fully Convolutional GAN” like in FCNs, it is the same idea.