The GAN Landscape: Losses, Architectures, Regularization, and Normalization

最新推荐文章于 2024-07-30 08:55:08 发布

qq_36356761

最新推荐文章于 2024-07-30 08:55:08 发布

阅读量1k

点赞数

分类专栏： paper reading notes

本文链接：https://blog.csdn.net/qq_36356761/article/details/81075814

版权

本文对GAN的训练进行深入探讨，包括不同损失函数、正则化和归一化策略以及架构的影响。研究发现非饱和损失在多个数据集上表现稳定，梯度惩罚和谱归一化在高容量架构中是有益的。同时，文章揭示了批归一化可能损害性能，而 Spectral normalization 提供了更好的质量和效率。通过对这些组件的控制变量分析，得出了一些实践指导建议。

摘要由CSDN通过智能技术生成

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

Abstract

GAN: successful; notoriously challenging to train, requires a significant amount of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of “tricks”
lack of a measure to quantify the failure modes $\Rightarrow$ a plethora of proposed losses, regularization and normalization schemes, and neural architectures
所以本文针对这4个变量进行测试，看看那些情况下能改善训练效果

Introduction

GAN: learning a target distribution, generator + discriminator
contribution:
provide a thorough empirical analysis of (those loss functions, regularization and normalization schemes, coupled with neural architecture choices), and help the researchers and practitioners navigate this space
1. GAN landscape – the set of loss functions, normalization and regularization schemes, and the most commonly used architectures（其实就是4个可控变量） $\Rightarrow$ the non-saturating loss is sufficiently stable across data sets, architectures and hyperparameters
2. decompose the effect of various normalization and regularization schemes, as well as varying architectures $\Rightarrow$ both gradient penalty as well as spectral normalization are useful in the context of high-capacity architectures $\Rightarrow$ simultaneous regularization and normalization are beneficial
3. a discussion of common pitfalls, reproducibility issues, and practical considerations
code and pretrained model

The GAN Landscape

Loss Functions

$P, Q$ : the target (true) distribution and the model distribution

type	principle	discriminator	generator	form
original GAN	the minimax GAN and the non-saturating (NS) GAN	minimizes the negative log-likelihood for the binary classification task (i.e. is the sample true or fake),is equivalent to minimizing the Jensen-Shannon (JS) divergence between $P$ and $Q$	maximizes the probability of generated samples being real	LD=Ex∼P(−logD(x))+Ex^∼Q(log(1−D(x^))),LG=Ex