【论文翻译】Transferring GANs: generating images from limited data

最新推荐文章于 2024-04-26 13:25:19 发布

赛博 AI

最新推荐文章于 2024-04-26 13:25:19 发布

阅读量872

点赞数

分类专栏：论文翻译文章标签：迁移学习 GAN 生成式对抗网络 conditionalGAN

本文链接：https://blog.csdn.net/weixin_40262196/article/details/102532150

版权

论文探讨了预训练网络在生成对抗网络（GANs）中的应用，尤其是在有限数据集上的图像生成任务。研究发现，预训练模型可以加速学习，改善生成图像质量，特别是数据有限时。此外，对于条件GANs，预训练模型也有助于提高性能。研究表明，密度比多样性更重要，选择密集采样的类数据作为源模型优于多样化的数据集。

摘要由CSDN通过智能技术生成

论文下载

论文目录

Abstract.
1 Introduction
2 Related Work
3 Generative Adversarial Networks
- 3.1 Loss functions
- 3.2 Evaluation Metrics
4 Transferring GAN representations
5 Transferring to conditional GANs
6 Conclusions

Abstract.

Transferring knowledge of pre-trained networks to new domains by means of fine-tuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pre-trained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pre-trained model was trained without conditioning. Our results also suggest that density is more important than diversity and a dataset with one or few densely sampled classes is a better source model than more diverse datasets such as ImageNet or Places.
Keywords: Generative adversarial networks, transfer learning, domain adaptation, image generation

摘要通过微调的方式将预先训练好的网络知识转移到新的领域，是基于判别模型的应用中广泛使用的一种实践。据我们所知，这一实践还没有在生成深层网络的背景下进行研究。因此，我们研究了域自适应在生成对抗网络图像生成中的应用。我们评估了域自适应的几个方面，包括目标域大小的影响、源域与目标域的相对距离以及条件GANs的初始化。结果表明，利用预先训练好的网络知识可以缩短收敛时间，显著提高图像质量，特别是在目标数据有限的情况下。我们证明这些结论也适用于有条件的甘斯，即使是在没有条件作用的情况下训练前的模型。我们的研究结果还表明，密度比多样性更重要，与ImageNet或Places等更多样化的数据集相比，具有一个或几个密集采样的类的数据集是更好的源模型。

关键词:生成对抗网络，转移学习，域适应，图像生成

1 Introduction

Generative Adversarial Networks (GANs) can generate samples from compleximage distributions [1]. They consist of two networks: a discriminator whichaims to separate real images from fake (or generated) images, and a generatorwhich is simultaneously optimized to generate images which are classified as realby the discriminator. The theory was later extended to the case of conditionalGANs where the generative process is constrained using a conditioning prior [2]which is provided as an additional input. GANs have further been widely applied in applications, including super-resolution [3], 3D object generation andreconstruction [4], human pose estimation [5], and age estimation [6].Deep neural networks have obtained excellent results for discriminative classification problems for which large datasets exist; for example on the ImageNetdataset which consists of over 1M images [7]. However, for many problems theamount of labeled data is not sufficient to train the millions of parameters typically present in these networks. Fortunately, it was found that the knowledgecontained in a network trained on a large dataset (such as ImageNet) can easilybe transferred to other computer vision tasks. Either by using these networks asoff-the-shelf feature extractors [8], or by adapting them to a new domain by aprocess called fine tuning [9]. In the latter case, the pre-trained network is usedto initialize the weights for a new task (effectively transferring the knowledgelearned from the source domain), which are then fine tuned with the trainingimages from the new domain. It has been shown that much fewer images wererequired to train networks which were initialized with a pre-trained network.GANs are in general trained from scratch. The procedure of using a pretrained network for initialization { which is very popular for discriminative networks { is to the best of our knowledge not used for GANs. However, like inthe case of discriminative networks, the number of parameters in a GAN is vast;for example the popular DC-GAN architecture [10] requires 36M parameters togenerate an image of 64x64. Especially in the case of domains which lack manytraining images, the usage of pre-trained GANs could significantly improve thequality of the generated images.Therefore, in this paper, we set out to evaluate the usage of pre-trainednetworks for GANs. The paper has the following contributions:

We evaluate several transfer configurations, and show that pre-trained networks can effectively accelerate the learning process and provide useful priorknowledge when data is limited.

We study how the relation between source and target domains impacts theresults, and discuss the problem of choosing a suitable pre-trained model,which seems more difficult than in the case of discriminative tasks.

We evaluate the transfer from unconditional GANs to conditional GANs fortwo commonly used methods to condition GANs.

1 简介
生成对抗网络（GAN）可以从复杂图像分布生成样本 [1]。它们由两个网络组成：一个区分器，用于将真实图像与假（或生成的）图像分开，以及一个同时优化以生成被分类为真正的歧视者。该理论后来被扩展到条件AN的情况下，其中生成过程被约束使用条件前[2]，这是作为附加输入提供。局域网进一步广泛应用于应用，包括超分辨率[3]、3D物体生成和重建[4]、人形估计[5]和年龄估计[6]。深度神经网络在存在大型数据集的区分分类问题方面取得了良好的效果;例如，在由超过 1M 图像组成的ImageNetdataset上[7] 。但是，对于许多问题，标记数据的数量不足以训练这些网络中通常存在的数百万个参数。幸运的是，在大型数据集（如 ImageNet）上训练的网络中所包含的知识很容易转移到其他计算机视觉任务。要么将这些网络用作现成的功能提取器[8]，要么通过称为微调[9]的过程将它们调整到新域。在后一种情况下，预先训练的网络用于初始化新任务的权重（有效地传输从源域学到的知识），然后使用新域的训练图像微调这些权重。已经表明，训练使用预先训练的网络初始化的网络所需的图像要少得多。GAN 一般都是从零开始训练的。使用预先训练的网络进行初始化的过程（对于歧视网络非常流行）是我们所知的不是用于 DN 的过程。然而，与区分网络的情况一样，GAN 中的参数数量巨大;例如，流行的 DC-GAN 体系结构 [10] 需要 36M 参数才能生成 64x64 的图像。特别是在缺少许多训练图像的域的情况下，使用预先训练的 GAN 可以显著提高生成的图像的质量。因此，在本文中，我们着手评估预训练的 GAN 网络的使用情况。本文有以下意见：

我们评估了几种传输配置，并表明预先培训的网络可以有效地加快学习过程，并在数据有限时提供有用的先验知识。
研究了源域和目标域之间的关系如何影响结果，并讨论了选择合适的预训练模型的问题，这似乎比区分任务更难。
我们评估从无条件 GAN 到条件 GAN 的转移，用于两种常用方法，以对 GAN 进行条件条件。

2 Related Work

Transfer learning/domain transfer: Learning how to transfer knowledgefrom a source domain to target domain is a well studied problem in computervision [11]. In the deep learning era, complex knowledge is extracted duringthe training stage on large datasets [12,13]. Domain adaptation by means offine tuning a pre-trained network has become the default approach for manyapplications with limited training data or slow convergence [14,9].Several works have investigated transferring knowledge to unsupervised orsparsely labeled domains. Tzeng et al. [15] optimized for domain invariance,while transferring task information that is present in the correlation betweenthe classes of the source domain. Ganin et al. [16] proposed to learn domain invariant features by means of a gradient reversal layer. A network simultaneouslytrained on these invariant features can be transfered to the target domain. Finally, domain transfer has also been studied for networks that learn metrics [17] In contrast to these methods, we do not focus on transferring discriminativefeatures, but transferring knowledge for image generation.
GAN: Goodfellow et al. [1] introduced the first GAN model for image generation. Their architecture uses a series of fully connected layers and thus is limitedto simple datasets. When approaching the generation of real images of highercomplexity, convolutional architectures have shown to be a more suitable option.Shortly afterwards, Deep Convolutional GANs (DC-GAN) quickly became thestandard GAN architecture for image generation problems [10]. In DC-GAN, thegenerator sequentially up-samples the input features by using fractionally-stridedconvolutions, whereas the discriminator uses normal convolutions to classify theinput images. Recent multi-scale architectures [18,19,20] can effectively generatehigh resolution images. It was also found that ensembles can be used to improvethe quality of the generated distribution [21].Independently of the type of architecture used, GANs present multiple challenges regarding their training, such as convergence properties, stability issues,or mode collapse. Arjovksy et al. [22] showed that the original GAN loss [1]are unable to properly deal with ill-suited distributions such as those with disjoint supports, often found during GAN training. Addressing these limitationsthe Wassertein GAN [23] uses the Wasserstein distance as a robust loss, yetrequiring the generator to be 1-Lipschitz. This constrain is originally enforcedby clipping the weights. Alternatively, an even more stable solution is adding agradient penalty term to the loss (known as WGAN-GP) [24].cGAN: Conditional GANs (cGANs) [2] are a class of GANs that use a particular attribute as a prior to build conditional generative models. Examples ofconditions are class labels [25,26,27], text [28,29], another image (image translation [30,31] and style transfer [32