GAN教程

最新推荐文章于 2023-12-29 00:01:46 发布

Codeur

最新推荐文章于 2023-12-29 00:01:46 发布

阅读量800

点赞数 1

分类专栏：计算机视觉文章标签： GAN

本文链接：https://blog.csdn.net/Codeur/article/details/80798591

版权

计算机视觉专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文中，我们将实现一个生成式对抗网络（GAN），它将随机噪声转化为数字图像。GAN由生成器（将噪声转化为图像）和判别器（区分真实和生成图像）两部分组成。通过训练，生成器最终能够生成逼真的MNIST数字图像。在大约5个周期后开始出现模糊的数字，80个周期后效果较好。尽管判别器准确性和损失未达到理想状态（全局最小值），但已展示出GAN的基本工作原理。

摘要由CSDN通过智能技术生成

GAN-MNIST-Pytorch

In this blog post we’ll implement a generative image model that converts random noise into images of digits! The full code is available here, just clone it to your machine and it’s ready to play. As a former Torch7 user, I attempt to reproduce the results from the Torch7 post.

For this, we employ Generative Adversarial Network. A GAN consists of two components; a generator which converts random noise into images and a discriminator which tries to distinguish between generated and real images. Here, ‘real’ means that the image came from our training set of images in contrast to the generated fakes.

To train the model we let the discriminator and generator play a game against each other. We first show the discriminator a mixed batch of real images from our training set and of fake images generated by the generator. We then simultaneously optimize the discriminator to answer NO to fake images and YES to real images and optimize the generator to fool the discriminator into believing that the fake images were real. This corresponds to minimizing the classification error wrt. the discriminator and maximizing it wrt. the generator. With careful optimization both generator and discriminator will improve and the generator will eventually start generating convincing images.

Implementing a GAN

We implement the generator and discriminator as convnets and train them with stochastic gradient descent.

The discriminator is a mlp with consecutive blocks of Linear Layer and LeakyReLU activation.

D = nn.Sequential(
    nn.Linear(image_size, hidden_size),
    nn.LeakyReLU(0.2),
    nn.Linear(hidden_size, hidden_size),
    nn.LeakyReLU(0.2),
    nn.Linear(hidden_size, 1),
    nn.Sigmoid())

This is a pretty standard architecture. The 28x28 grey images of digits are converted into a 781x1 vector by stacking their columns. The discriminator takes a 784x1 vector as input and predicts YES or NO with a single sigmoid output.

The generator is a mlp with a vector with Linear Layer and ReLU activation repeatedly:

G = nn.Sequential(
    nn.Linear(latent_size, hidden_size),
    nn.ReLU(),
    nn.Linear(hidden_size, hidden_size),
    nn.ReLU(),
    nn.Linear(hidden_size, image_size),
    nn.Tanh())

To generate an image we feed the generator with noise distributed N(0,1). After successful training, the output should be meaningful images!

z = torch.randn(batch_size, latent_size).cuda()
z = Variable(z)
fake_images = G(z)

Generating digits

We train our GAN using images of digits from the MNIST dataset. After at around 5 epochs you should start to see blurr digits. And after 80 epochs the results look pleasant.
after 5 epochs

after 100 epochs

Loss and Discriminator Accuracy

We also record the accuracy of the discriminator and the loss of the discriminator and the generator after each epoch. Refer to GAN, when the global minimum of the training criterion is achieved, the loss should be -log4 and the accuracy should be 0.5. Our model does not achieve the ideal result. Moreover, as we can see from the figure, our model start to overfit at around 80 epochs. The model structure and the training strategy should be improved in the future.