动手学pytorch-打卡3

Generative Adversarial Networks

Throughout most of this book, we have talked about how to make predictions. In some form or another, we used deep neural networks learned mappings from data points to labels. This kind of learning is called discriminative learning, as in, we’d like to be able to discriminate between photos cats and photos of dogs. Classifiers and regressors are both examples of discriminative learning. And neural networks trained by backpropagation have upended everything we thought we knew about discriminative learning on large complicated datasets. Classification accuracies on high-res images has gone from useless to human-level (with some caveats) in just 5-6 years. We will spare you another spiel about all the other discriminative tasks where deep neural networks do astoundingly well.

在本书的大部分内容中,我们都讨论了如何进行预测。在某种形式上,我们使用深度神经网络学习数据点到标签的映射。这种学习称为判别学习,例如,我们希望能够区分照片中的猫和狗。分类器和回归器都是判别学习的例子。通过反向传播训练的神经网络颠覆了我们对大型复杂数据集的判别式学习的所有认识。在短短5-6年的时间里,高分辨率图像已经从无用到人类水平(有一些警告)。我们将为你省去另一个关于所有其他判别任务的高谈阔论,在这些任务中,深层神经网络做得非常好。

But there is more to machine learning than just solving discriminative tasks. For example, given a large dataset, without any labels, we might want to learn a model that concisely captures the characteristics of this data. Given such a model, we could sample synthetic data points that resemble the distribution of the training data. For example, given a large corpus of photographs of faces, we might want to be able to generate a new photorealistic image that looks like it might plausibly have come from the same dataset. This kind of learning is called generative modeling.

但是机器学习不仅仅是解决区分性任务。例如,给定一个大型数据集,而没有任何标签,我们可能想要学习一个能够精确地捕获数据特征的模型。给定这样一个模型,我们就可以类似于训练数据分布的合成数据点进行采样。例如,给定大量的人脸照片,我们可能希望能够生成一个新的逼真的图像,它看起来可能来自相同的数据集。这种学习方式被称为生成式建模。

Until recently, we had no method that could synthesize novel photorealistic images. But the success of deep neural networks for discriminative learning opened up new possibilities. One big trend over the last three years has been the application of discriminative deep nets to overcome challenges in problems that we do not generally think of as supervised learning problems. The recurrent neural network language models are one example of using a discriminative network (trained to predict the next character) that once trained can act as a generative model.

直到最近,我们还没有方法可以合成新的逼真的照片。但是,深度神经网络用于判别学习的成功开辟了新的可能性。在过去的三年里,一个大的趋势是判别性深层网络的应用,来客服我们通常不认为是监督学习的问题中的挑战。循环神经网络语言模型是使用判别网络(经过训练可以预测下一个字符)的例子,该网络一旦受过训练就可以充当生成模型。

In 2014, a breakthrough paper introduced Generative adversarial networks (GANs) Goodfellow.Pouget-Abadie.Mirza.ea.2014, a clever new way to leverage the power of discriminative models to get good generative models. At their heart, GANs rely on the idea that a data generator is good if we cannot tell fake data apart from real data. In statistics, this is called a two-sample test - a test to answer the question whether datasets X = { x 1 , … , x n } X=\{x_1,\ldots, x_n\} X={x1,,xn} and X ′ = { x 1 ′ , … , x n ′ } X'=\{x'_1,\ldots, x'_n\} X={x1,,xn} were drawn from the same distribution. The main difference between most statistics papers and GANs is that the latter use this idea in a constructive way. In other words, rather than just training a model to say “hey, these two datasets do not look like they came from the same distribution”, they use the two-sample test to provide training signals to a generative model. This allows us to improve the data generator until it generates something that resembles the real data. At the very least, it needs to fool the classifier. Even if our classifier is a state of the art deep neural network.

2014年,一篇突破性的论文介绍了生成对抗网络(GANs)Goodfellow.Pouget-Abadie.Mirza.ea.2014,这是一种巧妙的新方法,可以利用判别模型的力量来获得好的生成性模型。GANs的核心思想是,如果我们不能将假数据和真数据区分开,那么数据生成器就是好的。在统计中,这被称为两次抽样建业——用于回答数据集 X = { x 1 , … , x n } X=\{x_1,\ldots, x_n\} X={x1,,xn} X ′ = { x 1 ′ , … , x n ′ } X'=\{x'_1,\ldots, x'_n\} X={x1,,xn} 是否来自相同的分布的问题。大多数统计论文与GANs的区别在于,后者以建设性的方式使用这一理念。换句话说,他们不只是简单地训练模型说“嘿,这两个数据集看起来不像是来自同一分布的”,而是使用双样本测试来向生成模型提供训练信号。这使我们能够改进数据生成器,直到它生成类似于真实数据的内容为止。至少,它需要愚弄分类器。即使我们的分类器是最先进的深度神经网络。

Image Name

The GAN architecture is illustrated.As you can see, there are two pieces in GAN architecture - first off, we need a device (say, a deep network but it really could be anything, such as a game rendering engine) that might potentially be able to generate data that looks just like the real thing. If we are dealing with images, this needs to generate images. If we are dealing with speech, it needs to generate audio sequences, and so on. We call this the generator network. The second component is the discriminator network. It attempts to distinguish fake and real data from each other. Both networks are in competition with each other. The generator network attempts to fool the discriminator network. At that point, the discriminator network adapts to the new fake data. This information, in turn is used to improve the generator network, and so on.

图中可以看到GAN的结构。正如你所看到的,GAN架构有两个部分——首先,我们需要一个设备(例如,深层网络,但实际上可能是任何东西,例如游戏渲染引擎),它可以生成看起来与真实数据一模一样的数据。如果我们要处理图像,这需要生成图像。如果我们要处理语音,它需要生成音频序列,以此类推。我们称之为生成器网络。第二部分是鉴别器网络。它试图将假数据和真数据区分开。这两个网络相互竞争。生成器网络试图欺骗鉴别器网络。此时,鉴别器网络将适应新的假数据。该信息继而用于改善生成器网络,等等。

The discriminator is a binary classifier to distinguish if the input x x x is real (from real data) or fake (from the generator). Typically, the discriminator outputs a scalar prediction o ∈ R o\in\mathbb R oR for input x \mathbf x x, such as using a dense layer with hidden size 1, and then applies sigmoid function to obtain the predicted probability D ( x ) = 1 / ( 1 + e − o ) D(\mathbf x) = 1/(1+e^{-o}) D(x)=1/(1+eo). Assume the label y y y for the true data is 1 1 1 and 0 0 0 for the fake data. We train the discriminator to minimize the cross-entropy loss, i.e.,

鉴别器是一个二进制分类器,用于区分输入的 x x x是真的(来自真实数据)还是假的(来自生成器)。通常,鉴别器为输入 x \mathbf x x 输出标量预测 p ∈ R p\in \mathbb R pR,例如使用具有隐藏大小为1的Dense层,然后应用sigmoid函数以获得预测概率 D ( x ) = 1 / ( 1 + e − o ) D(\mathbf x)=1/(1+e^{-o}) D(x)=1/(1+eo)。假设真实数据的标签 y y y 1 1 1,假数据为 0 0 0。我们训练鉴别器以最小化交叉熵损失。

min ⁡ D { − y log ⁡ D ( x ) − ( 1 − y ) log ⁡ ( 1 − D ( x ) ) } , \min_D \{ - y \log D(\mathbf x) - (1-y)\log(1-D(\mathbf x)) \}, Dmin{ylogD(x)(1y)log(1D(x))},

For the generator, it first draws some parameter z ∈ R d \mathbf z\in\mathbb R^d zRd from a source of randomness, e.g., a normal distribution z ∼ N ( 0 , 1 ) \mathbf z \sim \mathcal{N} (0, 1) zN(0,1). We often call z \mathbf z z as the latent variable. It then applies a function to generate x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z). The goal of the generator is to fool the discriminator to classify x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z) as true data, i.e., we want D ( G ( z ) ) ≈ 1 D( G(\mathbf z)) \approx 1 D(G(z))1. In other words, for a given discriminator D D D, we update the parameters of the generator G G G to maximize the cross-entropy loss when y = 0 y=0 y=0, i.e.,

对于生成器,它首先从随机性源中提取一下参数 z ∈ R d \mathbf z \in \mathbb R^d zRd,例如正态分布 z ∼ N ( 0 , 1 ) \mathbf z \sim \mathcal{N} (0,1) zN(0,1)。我们通将 z \mathbf z z称为潜在变量。然后,它应用一个函数来生成 x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z)。生成器的目的是欺骗鉴别器将 x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z)分类为真实数据,既,我们希望 D ( G ( z ) ) ≈ 1 D(G(\mathbf z)) \approx 1 D(G(z))1。换句话说,对于给定的鉴别器 D D D,我们更新生成器 G G G的参数,以在 y = 0 y=0 y=0时最大化交叉熵损失,既

max ⁡ G { − ( 1 − y ) log ⁡ ( 1 − D ( G ( z ) ) ) } = max ⁡ G { − log ⁡ ( 1 − D ( G ( z ) ) ) } . \max_G \{ - (1-y) \log(1-D(G(\mathbf z))) \} = \max_G \{ - \log(1-D(G(\mathbf z))) \}. Gmax{(1y)log(1D(G(z)))}=Gmax{log(1D(G(z)))}.

If the discriminator does a perfect job, then D ( x ′ ) ≈ 0 D(\mathbf x')\approx 0 D(x)0 so the above loss near 0, which results the gradients are too small to make a good progress for the generator. So commonly we minimize the following loss:

如果判别器做得很好,则 D ( x ′ ) ≈ 0 D(\mathbf x')\approx 0 D(x)0,因此上述损失接近0,这导致梯度太小而无法为生成器带来良好的进展。因此,通常我们将以下损失降到最低:
min ⁡ G { − y log ⁡ ( D ( G ( z ) ) ) } = min ⁡ G { − log ⁡ ( D ( G ( z ) ) ) } , \min_G \{ - y \log(D(G(\mathbf z))) \} = \min_G \{ - \log(D(G(\mathbf z))) \}, Gmin{ylog(D(G(z)))}=Gmin{log(D(G(z)))},

which is just feed x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z) into the discriminator but giving label y = 1 y=1 y=1.
这只是将 x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z)馈入鉴别器,但给出标签 y = 1 y=1 y=1

To sum up, D D D and G G G are playing a “minimax” game with the comprehensive objective function:

总而言之, D D D G G G正在玩具有目标功能的"minimax"游戏:
m i n D m a x G { − E x ∼ Data l o g D ( x ) − E z ∼ Noise l o g ( 1 − D ( G ( z ) ) ) } . min_D max_G \{ -E_{x \sim \text{Data}} log D(\mathbf x) - E_{z \sim \text{Noise}} log(1 - D(G(\mathbf z))) \}. minDmaxG{ExDatalogD(x)EzNoiselog(1D(G(z)))}.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值