- paper
- This is (probably?) the first paper from Apple in ML/CV field
Contribution
- Propose Simulate + Unsupervised training, using real world images to refine the synthetic images, with GAN
- Training GAN using adversarial loss and a self-regularization loss
- Key modifications to stabilize GAN training and prevent artifacts
Framework
- Refiner (Generator): x̃ :=Rθ(x)
- Refiner loss (general formular):
R(θ)=∑ilreal(θ;x̃ i,)+λlreg(θ;x̃ i,xi)
- lreg minimizing the difference between the synthetic and the refined images
- Discriminator Loss:
D(ϕ)=−∑ilog(Dϕ(x̃ i))−∑jlog(1−Dϕ(yj))
x~_i, y_j
are randomly sampled from refined images and real images sets
- Algorithm:
L_R
loss in the implementation of this paper: R(θ)=∑ilog(1−Dϕ(Rθ(xi)))+λ||Rθ(xi)−xi||1
Stabilize GAN training
- Local adversarial loss: divide the refined image and real image into
w x h
regions and use separate discriminators to judge each region
- Final loss is the sum of loss on each region
- Using history of refined images
- Two issues when only use the latest refined images
- Diverging of adversarial training
- The refiner network re-introducing the artifacts that the discriminator had forgotten about
- Buffer history images, use
b/2
history images andb/2
newly refined images in each iter - Update
b/2
of the buffered images in each iter
- Two issues when only use the latest refined images
Experiments
- Gaze estimation
- Dataset: MPIIGaze dataset
- Synthesizer: UnityEyes
- Visual Turing test: human cannot tell the difference between refined and real images
- Quantitative result: 22.3% percentage of improvement
- Hand pose estimation
- Dataset: NYU hand pose dataset
- Training CNN: Hour glass network