gan训练失败_我尝试过（但失败了）使用GAN来创作艺术品，但这仍然值得。-CSDN博客

gan训练失败

This work borrows heavily from the Pytorch DCGAN Tutorial and the NVIDA paper on progressive GANs.

这项工作大量借鉴了Pytorch DCGAN教程 和 有关渐进式GAN 的 NVIDA论文 。

One area of computer vision I’ve been wanting to explore are GANs. So when my wife and I moved into a home that had some extra wall space, I realized I could create a network to make some wall art and avoid a trip to Bed Bath & Beyond (2 birds with one code!).

我一直想探索的计算机视觉领域之一是GAN。因此，当我和我的妻子搬进一所拥有一些额外墙壁空间的房屋时，我意识到我可以创建一个网络来制作一些墙壁艺术品，并避免去Bed Bath＆Beyond(两只鸟只用一个密码！)旅行。

什么是GAN？ (What are GANs?)

GANs (Generative Adversarial Networks) work using two synergistic neural networks: one that creates forgery images (the generator), and another neural net that takes in the forgery images along with real examples of art and attempts to classify them as either real or fake (the discriminator). The networks then iterate, the generator getting better at making fakes and the discriminator getting better at detecting them. At the end of the process, you hopefully have a generator that can randomly create authentic-looking art. This method can be applied to generate more than images. In her book You Look Like a Thing and I Love You, Janelle Shane discusses using GANs to make everything from cookie recipes to pick up lines (which is where the book gets its namesake).

GAN(Generative Adversarial Networks)使用两个协同神经网络进行工作：一个创建伪造图像(生成器)，另一个神经网络将伪造图像与真实的艺术实例一起输入并尝试将它们分类为真实或伪造(鉴别器)。然后网络进行迭代，生成器在伪造品方面变得更好，而鉴别器在伪造品方面变得更好。在此过程的最后，您希望有一个生成器可以随机创建看起来真实的艺术品。该方法可以应用于生成更多的图像。珍妮尔·谢恩(Janelle Shane)在她的《你看起来像一件东西，我爱你》一书中，讨论了使用GAN制作从饼干食谱到捡拾食物的各种东西(这就是该书的名字)。

If you don’t know what GANs are I suggest reading this Pytorch article for an in-depth explanation.

如果您不知道GAN是什么，我建议您阅读这篇Pytorch 文章以获得更深入的解释。

挑战性 (Challenges)

Creating a GANs model that generates satisfactory results comes with several difficulties which I’ll need to address in my project.

创建可产生令人满意结果的GAN模型会带来一些困难，我将在项目中解决这些困难。

Data. Like all neural networks, you’ll need a lot of data; however, GANs appear to have an even more voracious appetite. Most GAN projects I’ve read about have leveraged tens or hundreds of thousands images. In contrast, my dataset is only a few thousand images that I was able to pull from a Google image search. In terms of style, I’d love to end with something that resembles a Rothko, but I’ll settle for generic Bed Bath and Beyond.

数据。 像所有神经网络一样，您将需要大量数据。但是，GAN的胃口似乎更大。我读过的大多数GAN项目都利用了成千上万的图像。相反，我的数据集仅是我能够从Google图片搜索中提取的几千张图片。在风格方面，我很想以类似于Rothko的东西作为结尾，但是我会选择通用的Bed Bath and Beyond。

Training time. In NVIDA’s paper on progressive GANs, they trained their network for days using multiple GPUs. In my case I’ll be using Google Colab and hope the free-tier hardware will be good enough.

训练时间。 在NVIDA关于渐进式GAN的论文中，他们使用多个GPU训练了几天的网络。就我而言，我将使用Google Colab，希望免费的硬件足够好。

Mode Collapse. Besides being the name of my new dubstep project, mode collapse is what happens when the variety of the generated images begin to converge. Essentially the generator is seeing that a few images are doing well at fooling the discriminator and decides to make all its output look like those few images.

模式崩溃。 除了作为我的新dubstep项目的名称之外，模式崩溃是当生成的各种图像开始融合时发生的情况。本质上，生成器看到一些图像在欺骗鉴别器方面表现良好，并决定使其所有输出看起来像那几幅图像。

Image Resolution. The larger the wanted image, the larger the needed network. So how high of a resolution will I need? Well, the recommended number of pixels per inch for digital prints is 300, so if I want something I can hang in a 12x15" frame I’ll need a final resolution of 54,000 squared pixels! I obviously won’t be able to build a model to that high of a resolution, but for this experiment I’ll say that’s the goal and I’ll see where I end up. To help with this, I’ll also be using a progressive GANs approach. This was pioneered by NVIDA where they first trained a model at a low resolution and then progressively added the extra layers needed to increase the image resolution. You can think of it as wading into the pool instead of diving directly into the deep end. In their paper they were able to generate celebrity images at a resolution of 1024 x 1024 pixels (my target is only 50x that amount).

图像分辨率。 所需的图像越大，所需的网络越大。那么我需要多高的分辨率？好吧，建议的数字打印每英寸像素数是300，因此，如果我想挂在12x15英寸的帧中，则最终分辨率必须为54,000平方像素！我显然无法建立一个分辨率达到如此高的分辨率，但对于本实验，我将说这是目标，然后看看最终结果。为帮助实现这一点，我还将使用渐进式GANs方法。他们首先以低分辨率训练模型，然后逐步添加增加图像分辨率所需的额外图层，您可以将其视为涉入池中，而不是直接潜入较深的一端。生成分辨率为1024 x 1024像素的名人图片(我的目标仅为该数量的50x)。

获取代码 (Getting in the Code)

My full code can be found on github. The main things I want show in this article are the generator and the discriminator.

我的完整代码可以在github上找到。我想在本文中展示的主要内容是生成器和鉴别器。

The Discriminator. My discriminator looks like any other image classification network. The unique thing about this class is that it takes the number of layers (based on the image size) as a parameter. This is allows me to do the “progressive” part of the Progressive GANs without having to rewrite my classes each time I increment the image size.

鉴别器。 我的鉴别器看起来像任何其他图像分类网络。该类的独特之处在于它将层数(基于图像大小)作为参数。这使我可以进行渐进式GAN的“渐进式”部分，而不必每次增加图像大小时都重写类。

class Discriminator(nn.Module):
    def __init__(self, ngpu, n_layers):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.n_layers = n_layers


        # makes the desired number of convolutional layers
        self.layers = nn.ModuleList([nn.Conv2d(N_CHANNELS, N_DISC_CHANNELS * 2, 4, 2, 1, bias=False)])
        self.layers.extend([nn.Conv2d(N_DISC_CHANNELS * 2, N_DISC_CHANNELS * 2, 4, 2, 1, bias=False) for i in range(self.n_layers - 2)])
        self.layers.append(nn.Conv2d(N_DISC_CHANNELS * 2, 1, 4, 1, 0, bias=False))
        
        # transformations
        self.batch2 = nn.BatchNorm2d(N_DISC_CHANNELS * 2)
        self.LeakyReLU = nn.LeakyReLU(0.2)
        self.sigmoid = nn.Sigmoid()


    def forward(self, x):
        for i, name in enumerate(self.layers):
            x = self.layers[i](x)


            if i == 0:
                x = self.LeakyReLU(x)            
            elif self.layers[i].out_channels == N_DISC_CHANNELS * 2:
                x = self.batch2(x)
                x = self.LeakyReLU(x)
            else:
                x = self.sigmoid(x)


        return x

The Generator. The generator is essentially the reverse of the discriminator. It takes a vector of random values as noise and uses transposed convolutional layers to scale up the noise into an image. The more layers I have the larger the end image.

发电机。 生成器本质上是鉴别器的反向。它采用随机值向量作为噪声，并使用转置的卷积层将噪声放大为图像。层越多，最终图像就越大。

class Generator(nn.Module):
    def __init__(self, ngpu, n_layers):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.n_layers = n_layers


        # makes the desired number of transposed convo layers
        self.layers = nn.ModuleList([nn.ConvTranspose2d(GEN_INPUT_SIZE, N_GEN_CHANNELS * 2, 4, 1, 0, bias=False)])
        self.layers.extend([nn.ConvTranspose2d(N_GEN_CHANNELS * 2, N_GEN_CHANNELS * 2, 4, 2, 1, bias=False) for i in range(self.n_layers - 3)])
        self.layers.extend([nn.ConvTranspose2d(N_GEN_CHANNELS * 2, N_GEN_CHANNELS, 4, 2, 1, bias=False),
                            nn.ConvTranspose2d(N_GEN_CHANNELS, N_CHANNELS, 4, 2, 1, bias=False)])                   


        # other transformations
        self.batch1 = nn.BatchNorm2d(N_GEN_CHANNELS)
        self.batch2 = nn.BatchNorm2d(N_GEN_CHANNELS * 2)
        self.relu = nn.ReLU(True)
        self.tanh = nn.Tanh()


    def forward(self, x):
        for i, name in enumerate(self.layers):
            x = self.layers[i](x)


            if self.layers[i].out_channels == N_GEN_CHANNELS * 2:
                x = self.batch2(x)
                x = self.relu(x)
            elif self.layers[i].out_channels == N_GEN_CHANNELS:
                x = self.batch1(x)
                x = self.relu(x)
            else:
                x = self.tanh(x)


        return x

测试网络 (Testing the Network)

Before I dive into trying to generate abstract art, I first want to test my network to make sure things are set up correctly. To do this I’m going to run the network on a dataset of images from another GANs project and then see if I get similar results. The animeGAN project is a good fit for this use-case. For their project they used 143,000 images of anime characters’ faces to create a generator that makes new characters. After downloading their dataset, I ran my model for 100 epochs with a target image size of 32 pixels, and voila!

在尝试生成抽象艺术之前，我首先要测试我的网络以确保正确设置。为此，我将在另一个GANs项目的图像数据集上运行网络，然后查看是否获得相似的结果。 animeGAN项目非常适合此用例。在他们的项目中，他们使用了143,000张动漫人物面Kong图像来创建生成新角色的生成器。下载他们的数据集后，我将模型运行了100个时间段，目标图像尺寸为32像素，瞧！

Image for post — Results from my GAN model

The results are actually better than I expected. With these results, I’m confident that my network is set up correctly and I can move to my dataset.

结果实际上比我预期的要好。有了这些结果，我相信我的网络设置正确并且可以移动到数据集。

训练 (Training)

Now it’s time to finally train the model on the art data. My initial image size is going to be a meager 32 pixels. I’ll train at this size for a while after which I’ll add an additional layer to the generator and discriminator to double the image size to 64. It’s just rinse and repeat until I get to a satisfactory image resolution. But how do I know when to progress on to the next size? There’s a lot of work that’s been done around this question; I’m going to take the simple approach of training until I get a GPU usage limit from Google and then I will manually check the results. If they look like they need more time, I’ll wait a day (so the usage limit is lifted) and train another round.

现在是时候对该艺术数据进行模型训练了。我的初始图像大小将只有32个像素。我将以这种尺寸训练一会儿，然后在生成器和鉴别器上添加一个额外的层，以将图像尺寸增加一倍，达到64。只是冲洗并重复直到获得令人满意的图像分辨率。但是我怎么知道什么时候继续前进到下一个尺寸呢？关于这个问题已经做了很多工作。在我从Google获得GPU使用限制之前，我将采用简单的培训方法，然后我将手动检查结果。如果他们看起来需要更多时间，我将等待一天(因此取消了使用限制)并进行另一轮训练。

32 Pixel Results. My first set of results look great. Not only is there no sign of mode-collapse, the generator even replicated that some images include a frame.

32像素结果。 我的第一组结果看起来很棒。不仅没有模式崩溃的迹象，生成器甚至复制了一些图像包含帧的信息。

64 and 128 Pixel Results. The 64 pixel results also turned out pretty well; however, by the time I increased the size to 128 pixels I was starting to see signs of mode collapse in the generator results.

64和128像素结果。 64像素的结果也很好。但是，当我将大小增加到128像素时，我开始看到生成器结果中出现模式崩溃的迹象。

256 Pixel Results. By the time I got to this image size, mode-collapse had reduced the results to only about 3 or 4 types of images. I suspect this may have to do with my limited dataset. By the time I got to this resolution I only had about 1000 images, and it’s possible that the generator is just mimicking a few of the images in that collection.

256像素结果。 到我达到此图像大小时，模式崩溃将结果减少到仅约3或4种类型的图像。我怀疑这可能与我有限的数据集有关。到达到此分辨率时，我只有大约1000张图像，并且生成器可能只是在模仿该集合中的一些图像。

结论 (Conclusion)

In the end my progressive GANs model didn’t progress very far. However, I am still amazed with what a fairly simple network was able to create. It was shocking when it generated anime faces or when it placed some of its generated paintings in frames. I understand why people consider GANs one of the greatest machine learning breakthroughs in recent years. For now this was just my hello world introduction to GANs, but I’ll probably be coming back.

在结束我的进步甘斯模型没有很远的进展 。但是，我仍然对一个相当简单的网络能够创建的内容感到惊讶。当它生成动漫面Kong或将其生成的某些绘画放置在框架中时，令人震惊。我理解为什么人们将GAN视为近年来最大的机器学习突破之一。目前，这只是我对GAN的介绍，但我可能会回来。