《动手学深度学习》task10_2 生成对抗网络GAN

系统学习《动手学深度学习》点击下面这个链接,有全目录哦~
https://blog.csdn.net/Shine_rise/article/details/104754764

Generative Adversarial Networks

Throughout most of this book, we have talked about how to make predictions. In some form or another, we used deep neural networks learned mappings from data points to labels. This kind of learning is called discriminative learning, as in, we’d like to be able to discriminate between photos cats and photos of dogs. Classifiers and regressors are both examples of discriminative learning. And neural networks trained by backpropagation have upended everything we thought we knew about discriminative learning on large complicated datasets. Classification accuracies on high-res images has gone from useless to human-level (with some caveats) in just 5-6 years. We will spare you another spiel about all the other discriminative tasks where deep neural networks do astoundingly well.

But there is more to machine learning than just solving discriminative tasks. For example, given a large dataset, without any labels, we might want to learn a model that concisely captures the characteristics of this data. Given such a model, we could sample synthetic data points that resemble the distribution of the training data. For example, given a large corpus of photographs of faces, we might want to be able to generate a new photorealistic image that looks like it might plausibly have come from the same dataset. This kind of learning is called generative modeling.

Until recently, we had no method that could synthesize novel photorealistic images. But the success of deep neural networks for discriminative learning opened up new possibilities. One big trend over the last three years has been the application of discriminative deep nets to overcome challenges in problems that we do not generally think of as supervised learning problems. The recurrent neural network language models are one example of using a discriminative network (trained to predict the next character) that once trained can act as a generative model.

In 2014, a breakthrough paper introduced Generative adversarial networks (GANs) Goodfellow.Pouget-Abadie.Mirza.ea.2014, a clever new way to leverage the power of discriminative models to get good generative models. At their heart, GANs rely on the idea that a data generator is good if we cannot tell fake data apart from real data. In statistics, this is called a two-sample test - a test to answer the question whether datasets X = { x 1 , … , x n } X=\{x_1,\ldots, x_n\} X={x1,,xn} and X ′ = { x 1 ′ , … , x n ′ } X'=\{x'_1,\ldots, x'_n\} X={x1,,xn} were drawn from the same distribution. The main difference between most statistics papers and GANs is that the latter use this idea in a constructive way. In other words, rather than just training a model to say “hey, these two datasets do not look like they came from the same distribution”, they use the two-sample test to provide training signals to a generative model. This allows us to improve the data generator until it generates something that resembles the real data. At the very least, it needs to fool the classifier. Even if our classifier is a state of the art deep neural network.

Image Name

The GAN architecture is illustrated.As you can see, there are two pieces in GAN architecture - first off, we need a device (say, a deep network but it really could be anything, such as a game rendering engine) that might potentially be able to generate data that looks just like the real thing. If we are dealing with images, this needs to generate images. If we are dealing with speech, it needs to generate audio sequences, and so on. We call this the generator network. The second component is the discriminator network. It attempts to distinguish fake and real data from each other. Both networks are in competition with each other. The generator network attempts to fool the discriminator network. At that point, the discriminator network adapts to the new fake data. This information, in turn is used to improve the generator network, and so on.

The discriminator is a binary classifier to distinguish if the input x x x is real (from real data) or fake (from the generator). Typically, the discriminator outputs a scalar prediction o ∈ R o\in\mathbb R oR for input x \mathbf x x, such as using a dense layer with hidden size 1, and then applies sigmoid function to obtain the predicted probability D ( x ) = 1 / ( 1 + e − o ) D(\mathbf x) = 1/(1+e^{-o}) D(x)=1/(1+eo). Assume the label y y y for the true data is 1 1 1 and 0 0 0 for the fake data. We train the discriminator to minimize the cross-entropy loss, i.e.,
min ⁡ D { − y log ⁡ D ( x ) − ( 1 − y ) log ⁡ ( 1 − D ( x ) ) } , \min_D \{ - y \log D(\mathbf x) - (1-y)\log(1-D(\mathbf x)) \}, Dmin{ylogD(x)(1y)log(1D(x))},
For the generator, it first draws some parameter z ∈ R d \mathbf z\in\mathbb R^d zRd from a source of randomness, e.g., a normal distribution z ∼ N ( 0 , 1 ) \mathbf z \sim \mathcal{N} (0, 1) zN(0,1). We often call z \mathbf z z as the latent variable. It then applies a function to generate x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z). The goal of the generator is to fool the discriminator to classify x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z) as true data, i.e., we want D ( G ( z ) ) ≈ 1 D( G(\mathbf z)) \approx 1 D(G(z))1. In other words, for a given discriminator D D D, we update the parameters of the generator G G G to maximize the cross-entropy loss when y = 0 y=0 y=0, i.e.,
max ⁡ G { − ( 1 − y ) log ⁡ ( 1 − D ( G ( z ) ) ) } = max ⁡ G { − log ⁡ ( 1 − D ( G ( z ) ) ) } . \max_G \{ - (1-y) \log(1-D(G(\mathbf z))) \} = \max_G \{ - \log(1-D(G(\mathbf z))) \}. Gmax{(1y)log(1D(G(z)))}=Gmax{log(1D(G(z)))}.
If the discriminator does a perfect job, then D ( x ′ ) ≈ 0 D(\mathbf x')\approx 0 D(x)0 so the above loss near 0, which results the gradients are too small to make a good progress for the generator. So commonly we minimize the following loss:
min ⁡ G { − y log ⁡ ( D ( G ( z ) ) ) } = min ⁡ G { − log ⁡ ( D ( G ( z ) ) ) } , \min_G \{ - y \log(D(G(\mathbf z))) \} = \min_G \{ - \log(D(G(\mathbf z))) \}, Gmin{ylog(D(G(z)))}=Gmin{log(D(G(z)))},
which is just feed x ′ = G ( z ) \mathbf x'=G(\mathbf z) x=G(z) into the discriminator but giving label y = 1 y=1 y=1.

To sum up, D D D and G G G are playing a “minimax” game with the comprehensive objective function:
m i n D m a x G { − E x ∼ Data l o g D ( x ) − E z ∼ Noise l o g ( 1 − D ( G ( z ) ) ) } . min_D max_G \{ -E_{x \sim \text{Data}} log D(\mathbf x) - E_{z \sim \text{Noise}} log(1 - D(G(\mathbf z))) \}. minDmaxG{ExDatalogD(x)EzNoiselog(1D(G(z)))}.

Many of the GANs applications are in the context of images. As a demonstration purpose, we are going to content ourselves with fitting a much simpler distribution first. We will illustrate what happens if we use GANs to build the world’s most inefficient estimator of parameters for a Gaussian. Let’s get started.

%matplotlib inline
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
from torch import nn
import numpy as np
from torch.autograd import Variable
import torch

Generate some “real” data

Since this is going to be the world’s lamest example, we simply generate data drawn from a Gaussian.

X=np.random.normal(size=(1000,2))
A=np.array([[1,2],[-0.1,0.5]])
b=np.array([1,2])
data=X.dot(A)+b

Let’s see what we got. This should be a Gaussian shifted in some rather arbitrary way with mean b b b and covariance matrix A T A A^TA ATA.

plt.figure(figsize=(3.5,2.5))
plt.scatter(X[:100,0],X[:100,1],color='red')
plt.show()
plt.figure(figsize=(3.5,2.5))
plt.scatter(data[:100,0],data[:100,1],color='blue')
plt.show()
print("The covariance matrix is\n%s" % np.dot(A.T, A))
The covariance matrix is
[[1.01 1.95]
 [1.95 4.25]]
batch_size=8
data_iter=DataLoader(data,batch_size=batch_size)

Generator

Our generator network will be the simplest network possible - a single layer linear model. This is since we will be driving that linear network with a Gaussian data generator. Hence, it literally only needs to learn the parameters to fake things perfectly.

class net_G(nn.Module):
    def __init__(self):
        super(net_G,self).__init__()
        self.model=nn.Sequential(
            nn.Linear(2,2),
        )
        self._initialize_weights()
    def forward(self,x):
        x=self.model(x)
        return x
    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m,nn.Linear):
                m.weight.data.normal_(0,0.02)
                m.bias.data.zero_()

Discriminator

For the discriminator we will be a bit more discriminating: we will use an MLP with 3 layers to make things a bit more interesting.

class net_D(nn.Module):
    def __init__(self):
        super(net_D,self).__init__()
        self.model=nn.Sequential(
            nn.Linear(2,5),
            nn.Tanh(),
            nn.Linear(5,3),
            nn.Tanh(),
            nn.Linear(3,1),
            nn.Sigmoid()
        )
        self._initialize_weights()
    def forward(self,x):
        x=self.model(x)
        return x
    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m,nn.Linear):
                m.weight.data.normal_(0,0.02)
                m.bias.data.zero_()

Training

First we define a function to update the discriminator.

# Saved in the d2l package for later use
def update_D(X,Z,net_D,net_G,loss,trainer_D):
    batch_size=X.shape[0]
    Tensor=torch.FloatTensor
    ones=Variable(Tensor(np.ones(batch_size))).view(batch_size,1)
    zeros = Variable(Tensor(np.zeros(batch_size))).view(batch_size,1)
    real_Y=net_D(X.float())
    fake_X=net_G(Z)
    fake_Y=net_D(fake_X)
    loss_D=(loss(real_Y,ones)+loss(fake_Y,zeros))/2
    loss_D.backward()
    trainer_D.step()
    return float(loss_D.sum())

The generator is updated similarly. Here we reuse the cross-entropy loss but change the label of the fake data from 0 0 0 to 1 1 1.

# Saved in the d2l package for later use
def update_G(Z,net_D,net_G,loss,trainer_G):
    batch_size=Z.shape[0]
    Tensor=torch.FloatTensor
    ones=Variable(Tensor(np.ones((batch_size,)))).view(batch_size,1)
    fake_X=net_G(Z)
    fake_Y=net_D(fake_X)
    loss_G=loss(fake_Y,ones)
    loss_G.backward()
    trainer_G.step()
    return float(loss_G.sum())

Both the discriminator and the generator performs a binary logistic regression with the cross-entropy loss. We use Adam to smooth the training process. In each iteration, we first update the discriminator and then the generator. We visualize both losses and generated examples.

def train(net_D,net_G,data_iter,num_epochs,lr_D,lr_G,latent_dim,data):
    loss=nn.BCELoss()
    Tensor=torch.FloatTensor
    trainer_D=torch.optim.Adam(net_D.parameters(),lr=lr_D)
    trainer_G=torch.optim.Adam(net_G.parameters(),lr=lr_G)
    plt.figure(figsize=(7,4))
    d_loss_point=[]
    g_loss_point=[]
    d_loss=0
    g_loss=0
    for epoch in range(1,num_epochs+1):
        d_loss_sum=0
        g_loss_sum=0
        batch=0
        for X in data_iter:
            batch+=1
            X=Variable(X)
            batch_size=X.shape[0]
            Z=Variable(Tensor(np.random.normal(0,1,(batch_size,latent_dim))))
            trainer_D.zero_grad()
            d_loss = update_D(X, Z, net_D, net_G, loss, trainer_D)
            d_loss_sum+=d_loss
            trainer_G.zero_grad()
            g_loss = update_G(Z, net_D, net_G, loss, trainer_G)
            g_loss_sum+=g_loss
        d_loss_point.append(d_loss_sum/batch)
        g_loss_point.append(g_loss_sum/batch)
    plt.ylabel('Loss', fontdict={'size': 14})
    plt.xlabel('epoch', fontdict={'size': 14})
    plt.xticks(range(0,num_epochs+1,3))
    plt.plot(range(1,num_epochs+1),d_loss_point,color='orange',label='discriminator')
    plt.plot(range(1,num_epochs+1),g_loss_point,color='blue',label='generator')
    plt.legend()
    plt.show()
    print(d_loss,g_loss)
    
    Z =Variable(Tensor( np.random.normal(0, 1, size=(100, latent_dim))))
    fake_X=net_G(Z).detach().numpy()
    plt.figure(figsize=(3.5,2.5))
    plt.scatter(data[:,0],data[:,1],color='blue',label='real')
    plt.scatter(fake_X[:,0],fake_X[:,1],color='orange',label='generated')
    plt.legend()
    plt.show()

Now we specify the hyper-parameters to fit the Gaussian distribution.

if __name__ == '__main__':
    lr_D,lr_G,latent_dim,num_epochs=0.05,0.005,2,20
    generator=net_G()
    discriminator=net_D()
    train(discriminator,generator,data_iter,num_epochs,lr_D,lr_G,latent_dim,data)
0.6932446360588074 0.6927103996276855

Summary

  • Generative adversarial networks (GANs) composes of two deep networks, the generator and the discriminator.
  • The generator generates the image as much closer to the true image as possible to fool the discriminator, via maximizing the cross-entropy loss, i.e., max ⁡ log ⁡ ( D ( x ′ ) ) \max \log(D(\mathbf{x'})) maxlog(D(x)).
  • The discriminator tries to distinguish the generated images from the true images, via minimizing the cross-entropy loss, i.e., min ⁡ − y log ⁡ D ( x ) − ( 1 − y ) log ⁡ ( 1 − D ( x ) ) \min - y \log D(\mathbf{x}) - (1-y)\log(1-D(\mathbf{x})) minylogD(x)(1y)log(1D(x)).

Exercises

  • Does an equilibrium exist where the generator wins, i.e. the discriminator ends up unable to distinguish the two distributions on finite samples?
生成对抗网络(Generative Adversarial Networks,简称GAN)是一种深度学习模型,由一个生成器网络和一个判别器网络组成。生成器网络通过习数据的分布来生成新的样本,而判别器网络则尝试区分生成器生成的样本和真实样本。通过对抗训练的方式,生成器和判别器相互竞争,最终使得生成器能够生成更加逼真的样本。 迁移习是一种将在一个任务上习到的知识迁移到另一个相关任务上的技术。在深度学习中,迁移习可以通过利用已经在大规模数据上训练好的模型来加速小规模数据上的训练,并提升模型性能。通过迁移习,我们可以将已经在一个领域上获得的知识和经验应用到其他领域中,从而充分利用已有的数据和模型。 在使用迁移习时,可以通过以下几种方式进行: 1. 微调(Fine-tuning):将预训练好的模型加载进来,并在新的任务上进行微调。即保持模型的大部分参数不变,只对部分参数进行重新训练,以适应新任务的特点。 2. 特征提取(Feature extraction):将预训练好的模型的前几层作为特征提取器,然后在新任务上添加自定义的分类器。这样可以利用预训练模型提取出的高级特征,再用新的分类器进行训练。 3. 多任务习(Multi-task learning):将多个相关任务联合训练,共享模型的表示能力。通过在多个任务上同时训练模型,可以使得模型能够习到更加通用和泛化的特征,从而在新任务上表现更好。 迁移习可以在数据较少或新任务与已有任务相关性较高的情况下,提高模型的性能和训练效果。它是一种有效的机器习技术,被广泛应用于图像识别、自然语言处理等领域。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值