Abstract
- Image-to-image translation training set of aligned image pairs
- However ,paired training data will not be available
- Our goal: G : X --> Y using an adversarial loss
Because mapping is highly under-constrained, couple with an inverse mapping F : Y —> X
and introduce a cycle consistency loss to enforce F(G(X)) ≈ X (and vice versa)
1. Introduction
- “translate” from one set into the other described as image-to-image translation
- Years of research have produced powerful translation systems in the supervised setting,
where example image pairs are available - However, obtaining paired training data can be difficult and expensive
seek an algorithm that can learn to translate without paired input-output examples
assume there is some underlying relationship between the domains ,and seek to learn that relationship - train a mapping G : X --> Y . However, all input images map to the same output image
- translation should be “cycle consistent” ==> translator G : X -> Y and another translator F : Y -> X,
- training both the mapping G and F simultaneously, using an adversarial loss
and adding a cycle consistency loss that encourages F(G(x)) ≈ x and G(F(y)) ≈ y.
2. Related work
- GANs:
Recent methods adopt conditional image generation applications
GANs’ success is adversarial loss:forces the generated images to be indistinguishable from real photos.
We adopt translated images cannot be distinguished from images in the target domain. - Image-to-Image Translation:
开始于 non-parametric texture model on a single input-output training image pair.
More recent :use a dataset learn a parametric translation function using CNNs
Our approach :builds on the “pix2pix” uses a conditional GAN,without paired training examples. - Unpaired Image-to-Image Translation
(1)之前 :use adversarial networks with additional terms
to enforce the output to be close to the input in a predefined metric space,
such as class label space ,image pixel space , and image feature space
(2)does not rely on any task-specific, predefined similarity function between the input and output,
nor do we assume that the input and output have to lie in the same low-dimensional embedding space. - Cycle Consistency
using transitivity as a way to regularize structured data
similar to our work:use a cycle consistency loss as a way of using transitivity to supervise CNN training.
In this work:we introducing a similar loss to push G and F to be consistent with each other
Concurrent with our work:independently use a similar objective for unpaired image-to-image translation, inspired by dual learning in machine translation - Neural Style Transfer
(1) synthesizes a novel image by combining the content of one image with the style of another image
based on matching the Gram matrix statistics of pre-trained deep features.
(2) our primary focus:learning the mapping between two image collections,rather specific images
by trying to capture correspondences between higher-level appearance structures
3. Formulation
- two mappings :
G : X --> Y and F : Y --> X. - two adversarial discriminators :
DX aims to distinguish between images { x } and translate images { F(y) };
DY aims to discriminate between { y } and { G(x) }. - objective contains two types of terms:
adversarial losses: matching the distribution(generated images) to the data distribution(in target domain); cycle consistency losses: prevent the learned mappings G and F from contradicting each other. - our model can be viewed as training two “autoencoders”:
F⭕G : X -->X jointly with another G⭕F : Y --> Y
these autoencoders each have special internal structures:
map image to itself via an intermediate representation that is a translation of the image into another domain
“adversarial autoencoders”:use adversarial loss to train the bottleneck layer of an autoencoder to match an arbitrary target distribution.
4. Implementation
( 1 ) Network Architecture
- generative networks:
two stride-2 convolutions,
several residual blocks
two fractionally-strided convolutions with stride 1/2 .
We use 6 blocks for128x128 images and 9 blocks for 256x256 and higher-resolution training images.
we use instance normalization - discriminator networks :use 70x70 PatchGANs
which aim to classify whether 70x70 overlapping image patches are real or fake.
( 2 )Training details
- replace the negative log likelihood objective by a least-squares loss
- update the discriminators using a history of generated images rather than the ones produced by the latest generators