The GAN Landscape: Losses, Architectures, Regularization, and Normalization

本文对GAN的训练进行深入探讨,包括不同损失函数、正则化和归一化策略以及架构的影响。研究发现非饱和损失在多个数据集上表现稳定,梯度惩罚和谱归一化在高容量架构中是有益的。同时,文章揭示了批归一化可能损害性能,而 Spectral normalization 提供了更好的质量和效率。通过对这些组件的控制变量分析,得出了一些实践指导建议。
摘要由CSDN通过智能技术生成

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

Abstract

GAN: successful; notoriously challenging to train, requires a significant amount of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of “tricks”
lack of a measure to quantify the failure modes a plethora of proposed losses, regularization and normalization schemes, and neural architectures
所以本文针对这4个变量进行测试,看看那些情况下能改善训练效果

Introduction

GAN: learning a target distribution, generator + discriminator
contribution:
provide a thorough empirical analysis of (those loss functions, regularization and normalization schemes, coupled with neural architecture choices), and help the researchers and practitioners navigate this space
1. GAN landscape – the set of loss functions, normalization and regularization schemes, and the most commonly used architectures(其实就是4个可控变量) the non-saturating loss is sufficiently stable across data sets, architectures and hyperparameters
2. decompose the effect of various normalization and regularization schemes, as well as varying architectures both gradient penalty as well as spectral normalization are useful in the context of high-capacity architectures simultaneous regularization and normalization are beneficial
3. a discussion of common pitfalls, reproducibility issues, and practical considerations
code and pretrained model

The GAN Landscape

Loss Functions

P,Q P , Q : the target (true) distribution and the model distribution

type principle discriminator generator form
original GAN the minimax GAN and the non-saturating (NS) GAN minimizes the negative log-likelihood for the binary classification task (i.e. is the sample true or fake),is equivalent to minimizing the Jensen-Shannon (JS) divergence between P P and Q maximizes the probability of generated samples being real LD=ExP(logD(x))+Ex^Q(log(1D(x^))),LG=Ex
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值