生成网络论文阅读：PGGAN（二）：PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

最新推荐文章于 2023-08-21 18:42:41 发布

CUHK-SZ-Bf

最新推荐文章于 2023-08-21 18:42:41 发布

阅读量589

点赞数 1

分类专栏：生成网络文章标签：论文阅读

本文链接：https://blog.csdn.net/qq_43210957/article/details/126934297

版权

生成网络专栏收录该内容

9 篇文章 2 订阅

订阅专栏

速览索引

0.ABSTRACT
- 0.1逐句翻译
- 0.2总结
1 INTRODUCTION
2 PROGRESSIVE GROWING OF GANS
- - - - 第一段（）
      - 第二段（）
4 NORMALIZATION IN GENERATOR AND DISCRIMINATOR（生成器和鉴别器的归一化）
- - - - 开始的简单介绍
- 4.1 EQUALIZED LEARNING RATE

0.ABSTRACT

0.1逐句翻译

We describe a new training methodology for generative adversarial networks.
我们描述了一种新的生成对抗网络训练方法。

The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses.
关键思想是逐步增长生成器和鉴别器:从低分辨率开始，我们添加新的层，随着训练的进行建模越来越精细的细节。

This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CELEBA images at 10242.
这既加快了训练速度，又极大地稳定了它，使我们能够产生前所未有的高质量图像，例如，10242的CELEBA图像。

We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10.
我们还提出了一种简单的方法来增加生成图像的变化，并在无监督CIFAR10中获得8.80的记录初始分。

Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator.
此外，我们还描述了几个实现细节，这些细节对于阻止生成器和鉴别器之间的不健康竞争非常重要。

Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation.
最后，我们提出了一种新的评价GAN结果的指标，包括图像质量和变化。

As an additional contribution, we construct a higher-quality version of the CELEBA dataset.
作为额外的贡献，我们构建了一个更高质量的CELEBA数据集。

0.2总结

主要是描述了文章的总体情况

1.主要是提供一种逐步的训练方法，并最终得到一个高分辨率的图像
2.陈述了几个训练细节
3.提出了一个GAN的训练指标
4.作者团队构建了一个新的训练集

1 INTRODUCTION

第一段（分析当前各种生成模型的优缺点，最终作者决定研究如何提升GAN网络的分辨率）

Generative methods that produce novel samples from high-dimensional data distributions, such as images, are finding widespread use, for example in speech synthesis (van den Oord et al., 2016a), image-to-image translation (Zhu et al., 2017; Liu et al., 2017; Wang et al., 2017), and image inpainting (Iizuka et al., 2017).
从高维数据分布(如图像)中生成新样本的生成方法正得到广泛应用，例如语音合成(van den Oord等人，2016a)、图像对图像的翻译(Zhu等人，2017;刘等，2017;Wang et al.， 2017)和图像嵌入(Iizuka et al.， 2017)。

Currently the most prominent approaches are autoregressive models (van den Oord et al., 2016b;c), variational autoencoders (VAE) (Kingma & Welling, 2014), and generative adversarial networks (GAN) (Goodfellow et al., 2014).
目前最突出的方法是自回归模型(van den Oord等人，2016b;c)、变分自编码器(VAE) (Kingma & Welling, 2014)和生成对抗网络(GAN) (Goodfellow等人，2014)。

Currently they all have significant strengths and weaknesses. Autoregressive models – such as PixelCNN – produce sharp images but are slow to evaluate and do not have a latent representation as they directly model the conditional distribution over pixels, potentially limiting their applicability.
目前，它们都有显著的优势和劣势。自回归模型——如PixelCNN——产生清晰的图像，但评估很慢，而且没有潜在的表示，因为它们直接建模了像素上的条件分布，这可能限制了它们的适用性。

VAEs are easy to train but tend to produce blurry results due to restrictions in the model, although recent work is improving this (Kingma et al., 2016).
尽管最近的工作正在改进这一点(Kingma et al.， 2016)，但由于模型的限制，VAEs很容易训练，但往往会产生模糊的结果。（VAE训练较快但是模式限制导致其多少有点模糊）

GANs produce sharp images, albeit only in fairly small resolutions and with somewhat limited variation, and the training continues to be unstable despite recent progress (Salimans et al., 2016; Gulrajani et al., 2017; Berthelot et al., 2017; Kodali et al., 2017).
GANs产生了清晰的图像，尽管只有相当小的分辨率和有限的变化，而且尽管最近取得了进展，但训练仍然不稳定(Salimans等人，2016;Gulrajani等，2017;Berthelot等人，2017;Kodali等人，2017)。

Hybrid methods combine various strengths of the three, but so far lag behind GANs in image quality (Makhzani & Frey, 2017; Ulyanov et al., 2017; Dumoulin et al., 2016).
混合方法结合了这三种方法的各种优点，但到目前为止，在图像质量上落后于GANs (Makhzani & Frey, 2017;Ulyanov等，2017;Dumoulin et al.， 2016)。

第二段（讲述GAN网络的结构）

Typically, a GAN consists of two networks: generator and discriminator (aka critic).
典型地，GAN由两个网络组成:生成器和鉴别器(又名批评家)。

The generator produces a sample, e.g., an image, from a latent code, and the distribution of these images should ideally be indistinguishable from the training distribution.
生成器从潜在代码生成一个样本，例如一个图像，并且这些图像的分布在理想情况下应该与训练分布没有区别。

Since it is generally infeasible to engineer a function that tells whether that is the case, a discriminator network is trained to do the assessment, and since networks are differentiable, we also get a gradient we can use to steer both networks to the right direction.
由于设计一个函数来判断这种情况是否正确通常是不可行的，因此我们训练了一个鉴别器网络来进行评估，并且由于网络是可微的，我们还得到了一个梯度，可以用来将两个网络引导到正确的方向。

Typically, the generator is of main interest – the discriminator is an adaptive loss function that gets discarded once the generator has been trained.
通常，生成器是主要的兴趣-鉴别器是一个自适应损失函数，一旦生成器被训练就会被丢弃。

第三段（大约就是说为了训练的稳定性，本文采用了WGAN网络当中的推土机loss）

There are multiple potential problems with this formulation.
这一表述存在多重潜在问题。

When we measure the distance between the training distribution and the generated distribution, the gradients can point to more or less random directions if the distributions do not have substantial overlap, i.e., are too easy to tell apart (Arjovsky & Bottou, 2017).
当我们测量训练分布和生成分布之间的距离时，如果分布没有大量重叠，即太容易区分，则梯度可以或多或少地指向随机方向(Arjovsky & Bottou, 2017)。

Originally, Jensen-Shannon divergence was used as a distance metric (Goodfellow et al., 2014), and recently that formulation has been improved (Hjelm et al., 2017) and a number of more stable alternatives have been proposed, including least squares (Mao et al., 2016b), absolute deviation with margin (Zhao et al., 2017), and Wasserstein distance (Arjovsky et al., 2017; Gulrajani et al., 2017). Our contributions are largely orthogonal to this ongoing discussion, and we primarily use the improved Wasserstein loss, but also experiment with least-squares loss.
Gulrajani et al.， 2017)。我们的贡献在很大程度上与正在进行的讨论正交，我们主要使用改进的Wasserstein损耗，但也使用最小二乘损耗进行实验。

第四段（作者陈述如果训练高分辨率图像，在一开始训练的时候生成图像和实际图像差的太多所以不太好训练，并且大的图像占用内存较大）

The generation of high-resolution images is difficult because higher resolution makes it easier to tell the generated images apart from training images (Odena et al., 2017), thus drastically amplifying the gradient problem.
高分辨率图像的生成很困难，因为更高的分辨率更容易将生成的图像与训练图像（正常的照片）区分开来(Odena et al.， 2017)，因此大大放大了梯度问题。（一开始生成的图像和正常的图像差的太多了所以网络训练太随机了）

Large resolutions also necessitate using smaller minibatches due to memory constraints, further compromising training stability.
由于内存限制，大分辨率也需要使用较小的小批次，进一步损害训练稳定性。

Our key insight is that we can grow both the generator and discriminator progressively, starting from easier low-resolution images, and add new layers that introduce higher-resolution details as the training progresses.
我们的关键见解是，我们可以逐步增长生成器和鉴别器，从更容易的低分辨率图像开始，并随着训练的进行添加引入更高分辨率细节的新层。

This greatly speeds up training and improves stability in high resolutions, as we will discuss in Section 2.
这大大加快了训练速度，提高了高分辨率下的稳定性，我们将在第2节中讨论。

第五段（作者讲自己的评价标准）

The GAN formulation does not explicitly require the entire training data distribution to be represented by the resulting generative model.
GAN公式并不明确要求用生成模型表示整个训练数据分布。

The conventional wisdom has been that there is a tradeoff between image quality and variation, but that view has been recently challenged (Odena et al., 2017).
传统观点认为，图像质量和变化之间存在权衡，但这种观点最近受到了挑战(Odena等人，2017年)。

The degree of preserved variation is currently receiving attention and various methods have been suggested for measuring it, including inception score (Salimans et al., 2016), multi-scale structural similarity (MS-SSIM) (Odena et al., 2017; Wang et al., 2003), birthday paradox (Arora & Zhang, 2017), and explicit tests for the number of discrete modes discovered (Metz et al., 2016).
保存变异的程度目前受到关注，人们提出了各种方法来衡量它，包括初始评分(Salimans等人，2016)、多尺度结构相似度(MS-SSIM) (Odena等人，2017;Wang et al.， 2003)、生日悖论(Arora & Zhang, 2017)以及对发现的离散模式数量的显式检验(Metz et al.， 2016)。

We will describe our method for encouraging variation in Section 3, and propose a new metric for evaluating the quality and variation in Section 5.
我们将在第3节中描述我们鼓励变化的方法，并在第5节中提出一个评价质量和变化的新指标。

第六段（训练时候的问题）

Section 4.1 discusses a subtle modification to the initialization of networks, leading to a more balanced learning speed for different layers.
4.1节讨论了对网络初始化的细微修改，从而使不同层的学习速度更加平衡。

Furthermore, we observe that mode collapses traditionally plaguing GANs tend to happen very quickly, over the course of a dozen minibatches.
此外，我们观察到传统上困扰GANs的模式崩溃往往发生得非常快，在十几个小批次的过程中。

Commonly they start when the discriminator overshoots, leading to exaggerated gradients, and an unhealthy competition follows where the signal magnitudes escalate in both networks.
通常，当鉴别器超调时，它们就开始了，导致了夸张的梯度，随后出现了一个不健康的竞争，在两个网络中信号的量级升级。

We propose a mecha nism to stop the generator from participating in such escalation, overcoming the issue (Section 4.2).
我们提出了一种机制来阻止生成器参与这种升级，克服了这个问题(章节4.2)。

第七段（作者讲训练效果）

We evaluate our contributions using the CELEBA, LSUN, CIFAR10 datasets.
我们使用CELEBA, LSUN, CIFAR10数据集来评估我们的贡献。

We improve the best published inception score for CIFAR10. Since the datasets commonly used in bench-marking generative methods are limited to a fairly low resolution, we have also created a higher quality version of the CELEBA dataset that allows experimentation with output resolutions up to 1024 × 1024 pixels.
我们提高了CIFAR10的最佳发布初始评分。由于基准测试生成方法中常用的数据集被限制在一个相当低的分辨率，我们还创建了一个更高质量的CELEBA数据集版本，允许输出分辨率高达1024 × 1024像素的实验。
(输出的分辨率达到1024了)

This dataset and our full implementation are available at 作者给的gittrained networks can be found at 作者给的网站 along with result images, and a supplementary video illustrating the datasets, additional results, and latent space interpolations is at 作者给的网址

2 PROGRESSIVE GROWING OF GANS

第一段（）

Our primary contribution is a training methodology for GANs where we start with low-resolution images, and then progressively increase the resolution by adding layers to the networks as visualized in Figure 1.

This incremental nature allows the training to first discover large-scale structure of the image distribution and then shift attention to increasingly finer scale detail, instead of having to learn all scales simultaneously.

第二段（）

We use generator and discriminator networks that are mirror images of each other and always grow in synchrony.

All existing layers in both networks remain trainable throughout the training process.

When new layers are added to the networks, we fade them in smoothly, as illustrated in Figure 2.

This avoids sudden shocks to the already well-trained, smaller-resolution layers. Appendix A describes structure of the generator and discriminator in detail, along with other training parameters.

4 NORMALIZATION IN GENERATOR AND DISCRIMINATOR（生成器和鉴别器的归一化）

开始的简单介绍

GANs are prone to the escalation of signal magnitudes as a result of unhealthy competition between the two networks. Most if not all earlier solutions discourage this by using a variant of batch normalization (Ioffe & Szegedy, 2015; Salimans & Kingma, 2016; Ba et al., 2016) in the generator, and often also in the discriminator.
由于两个网络之间的不健康竞争，GANs容易导致信号量级的升级。大多数(如果不是所有)早期的解决方案通过使用批处理规范化的变体来阻止这一点(Ioffe & Szegedy, 2015;Salimans & Kingma, 2016;Ba et al.， 2016)在发生器中，通常也在鉴别器中。

These normalization methods were originally introduced to eliminate covariate shift.
引入这些归一化方法最初是为了消除协变量移位。

However, we have not observed that to be an issue in GANs, and thus believe that the actual need in GANs is constraining signal magnitudes and competition.
然而，我们没有观察到这在GANs中是一个问题，因此我们认为GANs的实际需求是限制信号量级和竞争。

We use a different approach that consists of two ingredients, neither of which include learnable parameters.
我们使用一种不同的方法，它由两种成分组成，其中不包括可学习的参数。

4.1 EQUALIZED LEARNING RATE

We deviate from the current trend of careful weight initialization, and instead use a trivial N (0, 1) initialization and then explicitly scale the weights at runtime.
我们偏离了当前谨慎的权重初始化趋势，而是使用简单的N(0,1)初始化，然后在运行时显式地缩放权重。

To be precise, we set wˆi = wi/c, where wi are the weights and c is the per-layer normalization constant from He’s initializer (He et al., 2015).
确地说，我们设置w / i = wi/c，其中wi是权重，c是He初始化器中的每层归一化常数(He et al.， 2015)。

The benefit of doing this dynamically instead of during initialization is somewhat subtle, and relates to the scale-invariance in commonly used adaptive stochastic gradient descent methods such as RMSProp (Tieleman & Hinton, 2012) and Adam (Kingma & Ba, 2015).
动态而不是在初始化过程中这样做的好处有点微妙，并与常用的自适应随机梯度下降方法(如RMSProp (Tieleman & Hinton, 2012)和Adam (Kingma & Ba, 2015)中的尺度不变性有关。

These methods normalize a gradient update by its estimated standard deviation, thus making the update independent of the scale of the parameter.
这些方法通过估计的标准偏差将梯度更新归一化，从而使更新与参数的规模无关。

As a result, if some parameters have a larger dynamic range than others, they will take longer to adjust.
因此，如果某些参数比其他参数具有更大的动态范围，则需要更长的调整时间。

This is a scenario modern initializers cause, and thus it is possible that a learning rate is both too large and too small at the same time.
这是现代初始化器导致的一种情况，因此有可能学习率同时过大和过小。

Our approach ensures that the dynamic range, and thus the learning speed, is the same for all weights.
我们的方法确保了动态范围，因此学习速度，对于所有权值是相同的。

A similar reasoning was independently used by van Laarhoven (2017).
van Laarhoven(2017)独立使用了类似的推理。