The GAN Landscape: Losses, Architectures, Regularization, and Normalization
Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly
Abstract
GAN: successful; notoriously challenging to train, requires a significant amount of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of “tricks”
lack of a measure to quantify the failure modes ⇒ ⇒ a plethora of proposed losses, regularization and normalization schemes, and neural architectures
所以本文针对这4个变量进行测试,看看那些情况下能改善训练效果
Introduction
GAN: learning a target distribution, generator + discriminator
contribution:
provide a thorough empirical analysis of (those loss functions, regularization and normalization schemes, coupled with neural architecture choices), and help the researchers and practitioners navigate this space
1. GAN landscape – the set of loss functions, normalization and regularization schemes, and the most commonly used architectures(其实就是4个可控变量) ⇒ ⇒ the non-saturating loss is sufficiently stable across data sets, architectures and hyperparameters
2. decompose the effect of various normalization and regularization schemes, as well as varying architectures ⇒ ⇒ both gradient penalty as well as spectral normalization are useful in the context of high-capacity architectures ⇒ ⇒ simultaneous regularization and normalization are beneficial
3. a discussion of common pitfalls, reproducibility issues, and practical considerations
code and pretrained model
The GAN Landscape
Loss Functions
P,Q P , Q : the target (true) distribution and the model distribution
type | principle | discriminator | generator | form |
---|---|---|---|---|
original GAN | the minimax GAN and the non-saturating (NS) GAN | minimizes the negative log-likelihood for the binary classification task (i.e. is the sample true or fake),is equivalent to minimizing the Jensen-Shannon (JS) divergence between P P and | maximizes the probability of generated samples being real | LD=Ex∼P(−logD(x))+Ex^∼Q(log(1−D(x^))),LG=Ex |