gan生成印刷体字母_字母GAN：AI生成英语字母！

最新推荐文章于 2024-10-31 08:44:16 发布

weixin_26632369

最新推荐文章于 2024-10-31 08:44:16 发布

阅读量702

点赞数

文章标签： python java

原文链接：https://towardsdatascience.com/alphabet-gan-ai-generates-english-letters-589637068808

版权

gan生成印刷体字母

First, you need to know what a GAN really is. Well here’s a brief description. Generative Adversarial Network is a combination of two models namely Generator and Discriminator. The Generator tries to produce fake data mimicking the original data. On the other hand, the Discriminator tries to tell if a given data is original or fake. Thanks to the adversarial setup, eventually, both models keep getting better at their tasks. Of course, there’s much more to understand about GANs. Please watch this video if you are curious…

开始步骤，你需要知道什么是真正的GAN是。好，这里是一个简短的描述。生成对抗网络是生成器和鉴别器两个模型的组合。生成器尝试生成模仿原始数据的伪造数据。另一方面，鉴别器试图判断给定数据是原始数据还是伪造数据。最终，由于对抗性的设置，两个模型的工作水平都不断提高。当然，关于GAN还有很多要了解的地方。如果您感到好奇，请观看此视频...

How do GANs work?

GAN如何工作？

In this article, I want to show you how to implement one such GAN. I’ll also mention a whole bunch of tips that will help you in training your first GAN. But, before jumping into the model let’s understand the dataset.

在本文中，我想向您展示如何实现这样的GAN。我还将提到很多技巧，这些技巧将帮助您训练第一个GAN。但是，在进入模型之前，让我们了解数据集。

数据集：AZ手写字母 (Dataset: A-Z Handwritten Alphabets)

Here, I’m using an MNIST style dataset of handwritten English alphabets. A-Z dataset contains 372,450 characters from 26 classes. Each data sample is a greyscale image of an alphabet. Like the MNIST dataset, the dimension of each image is 28px*28px and represented as a 784 (28*28) dimensional vector. Let’s visualize a few of them…

在这里，我使用的是手写英语字母的MNIST样式数据集。 AZ数据集包含来自26个类的372,450个字符。每个数据样本都是字母的灰度图像。与MNIST数据集类似，每个图像的尺寸为28px * 28px，并表示为784 ( 28 * 28 )尺寸的向量。让我们想象其中的一些……

Image for post — 100 random images from the EMNIST Letters dataset

Originally, the pixel values range between [0, 255] but we should normalize them before feeding to any machine learning model. Generally, we normalize the pixels between [0, 1] by dividing 255.0 but here we normalize them between [-1, 1]. This is because we will use the tanh (range of tanh = [-1, 1]) function later.

最初，像素值的范围在[0，255]之间，但在馈入任何机器学习模型之前，我们应该对其进行规范化。通常，我们通过除以255.0来归一化[0，1]之间的像素，但是这里我们将其归一化为[-1，1 ]之间的像素。这是因为我们将使用正切 ( 正切 =范围 [-1，1] )稍后起作用。

Now let’s build our GAN. I like to do it in 4 steps.

现在，让我们构建我们的GAN。我喜欢分4个步骤进行操作。

1.建立发电机(G) (1. Build the Generator (G))

The generator is a neural network that takes a noise vector (100-dimensional) as input and outputs an image of a single English alphabet. As we are working with image data, it makes sense to use a Convolutional Neural Network. The idea is to increase the spatial dimensions of the input as it passes through different layers until it reaches the desired output shape (28px*28px). The first two layers of the network are Dense layers with ReLu activation. I’d highly recommend using BatchNormalization on the output of each layer.

生成器是一个神经网络，它以噪声矢量( 100维)作为输入并输出单个英文字母的图像。在处理图像数据时，使用卷积神经网络是有意义的。这个想法是在输入通过不同的图层时增加其空间尺寸，直到达到所需的输出形状( 28px * 28px )。网络的前两层是具有ReLu激活的密集层。我强烈建议在每个图层的输出上使用BatchNormalization。

Note: BatchNormalization makes the training converge faster. A lot faster.

注意： BatchNormalization使训练收敛更快。快很多。

Notice that the first Dense layer contains 1024 neurons and the second one contains 6272 neurons. After that comes the Reshape layer. The reshaping is important because we want to use convolution afterward and to apply convolution we need matrix-like entities rather than column/row vectors.

请注意，第一个密集层包含1024个神经元，第二个密集层包含6272个神经元。之后是“重塑”层。重塑很重要，因为我们要在以后使用卷积并应用卷积，我们需要类似矩阵的实体而不是列/行向量。

Note: To find the correct dimensions we need to think backward! First, determine the dimensions of the matrices (7*7) and how many (128) of them you want then multiply them to get the dimension (7*7*128 = 6272) of the Dense layer.

注意：要找到正确的尺寸，我们需要向后考虑！首先，确定矩阵的尺寸( 7 * 7 )以及要多少个( 128 )，然后将它们相乘以获得密集层的尺寸( 7 * 7 * 128 = 6272 )。

Before applying convolution we will upsample the matrices. I’ve used (2, 2) upsampling that will increase the dimension from 7*7 to 14*14.

在应用卷积之前，我们将对矩阵进行升采样。我使用了( 2，2 )上采样，它将尺寸从7 * 7增加到14 * 14 。

UpSampling is a kind of inverse function of Pooling.

UpSampling是Pooling的一种反函数。

After that, we have 2*2 convolution filters (64). Notice that I have initialized the weights of the kernels according to a Normal distribution. The activation for this layer is LeakyReLu. Then again we have an upsampling layer followed by a convolution layer. This time the UpSampling layer will output 28*28 dimensional matrices. The last convolution layer contains only 1 filter because we want only one channel for our grayscale image. The activation function here is tanh. This is the reason why we normalized the pixel values between [-1, 1].

在那之后，我们有2 * 2卷积滤波器( 64 )。注意，我已经根据正态分布初始化了内核的权重。该层的激活是LeakyReLu。再一次，我们有一个上采样层，然后是一个卷积层。这次，UpSampling层将输出28 * 28维矩阵。最后一个卷积层仅包含1个滤镜，因为我们只需要一个通道用于灰度图像。这里的激活功能是tanh 。这就是为什么我们将[-1，1]之间的像素值归一化的原因。

Note: We could have avoided UpSampling layers by using transposed convolutions. Because they can also increase the matrix dimensions.

注意：我们可以通过使用转置卷积来避免UpSampling层。因为它们也会增加矩阵尺寸。

码： (Code:)

The generator

发电机

建筑： (Architecture:)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 1024)              103424    
_________________________________________________________________
batch_normalization_1 (Batch (None, 1024)              4096      
_________________________________________________________________
activation_1 (Activation)    (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 6272)              6428800   
_________________________________________________________________
batch_normalization_2 (Batch (None, 6272)              25088     
_________________________________________________________________
activation_2 (Activation)    (None, 6272)              0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 7, 7, 128)         0         
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        32832     
_________________________________________________________________
batch_normalization_3 (Batch (None, 14, 14, 64)        256       
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 14, 14, 64)        0         
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 28, 28, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 1)         577       
=================================================================
Total params: 6,595,073
Trainable params: 6,580,353
Non-trainable params: 14,720
_________________________________________________________________

Did you notice that I didn’t compile the generator here? This will be done in the 3rd step.

您是否注意到我没有在此处编译生成器？这将在第3步中完成。

2.建立鉴别器(D) (2. Build the Discriminator (D))

Our discriminator is just a binary classifier that takes a grayscale image as input and predicts if it’s an original image or a fake one i.e. created by the generator. The first two layers are convolution layers. Notice that I’ve used a stride of 2 which means the output dimension will be less than the input. So, we don’t need Pooling layers. The filter size is 5*5 for both of the layers but the number of filters is greater in the second layer.

我们的鉴别器只是一个二进制分类器，它将灰度图像作为输入并预测它是原始图像还是由生成器创建的伪图像。前两层是卷积层。注意，我使用的跨度为2 ，这意味着输出尺寸将小于输入尺寸。因此，我们不需要池化层。这两层的过滤器尺寸均为5 * 5 ，但第二层的过滤器数量更多。

Note: While building the discriminator you should keep in mind that our aim is to favor the generator because we want to generate fake images. Hence, make the discriminator a bit weaker than the generator. For example, here I’ve used fewer convolution layers in the discriminator.

注意：在构建鉴别器时，您应该记住，我们的目的是偏爱生成器，因为我们要生成伪造的图像。因此，使鉴别器比生成器弱一些。例如，这里我在鉴别器中使用了较少的卷积层。

After the convolution layers, we need to Flatten the output so that we can pass it to a Dense layer. The size of the Dense layer is 256 with a 50% dropout. At last, we have the sigmoid layer just like any other binary classifier. We have to compile the discriminator now. The loss should be binary cross-entropy and I’ve used a custom Adam optimizer (learning rate= 0.0002).

在卷积层之后，我们需要展平输出，以便可以将其传递到Dense层。密集层的大小为256，且漏失率为50％ 。最后，像其他二进制分类器一样，我们也具有Sigmoid层。我们现在必须编译鉴别器。损失应该是二进制交叉熵，我使用了自定义的Adam优化器(学习率= 0.0002 )。

Note: Default Adam learning rate (0.001) is too high for GANs so always customize the Adam optimizer.

注意：对于GAN，默认的Adam学习率( 0.001 )太高，因此请始终自定义Adam优化器。

码： (Code:)

The discriminator

鉴别器

建筑： (Architecture:)

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 14, 14, 64)        1664      
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 5, 5, 128)         204928    
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 5, 5, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 3200)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               819456    
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 256)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 257       
=================================================================
Total params: 1,026,305
Trainable params: 1,026,305
Non-trainable params: 0
_________________________________________________________________

3.结合捷德 (3. Combine G & D)

According to the original GAN paper, we have to train the generator and the discriminator separately. Then why this step?

根据GAN的原始论文，我们必须分别训练生成器和鉴别器。那为什么要这样做呢？

The discriminator can be trained directly by back-propagating the loss computed at the last sigmoid layer. But for training the generator, we need to send this loss back to the generator without affecting the weights of the discriminator!

可以通过反向传播在最后一个S型层计算的损耗来直接训练鉴别器。但是，为了训练生成器，我们需要将此损失发回生成器，而又不影响鉴别器的权重！

One way to achieve this is by creating a new model by stacking the generator and discriminator. And this is why I didn’t compile the generator before. Let’s call the new model gan. It takes the noise vector as input then passes it through the generator to create a fake image. Then the image is passed through the discriminator that computes the probability of it being an original image. When we would train this gan, the discriminator should not learn anything. Hence, ‘discriminator.trainable = False’. Only the weights of the generator will be modified.

实现此目的的一种方法是通过堆叠生成器和鉴别器来创建新模型。这就是为什么我以前没有编译生成器的原因。我们称这种新模型为gan 。它以噪声矢量作为输入，然后将其通过生成器以创建伪图像。然后，图像通过鉴别器，该鉴别器计算出它是原始图像的概率。当我们训练这个gan时，判别器不应学习任何东西。因此， “ discriminator.trainable = False”。 仅生成器的权重将被修改。

码： (Code:)

generator + discriminator = gan

生成器+鉴别器=甘

建筑： (Architecture:)

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 100)               0         
_________________________________________________________________
sequential_1 (Sequential)    (None, 28, 28, 1)         6595073   
_________________________________________________________________
sequential_2 (Sequential)    (None, 1)                 1026305   
=================================================================
Total params: 7,621,378
Trainable params: 6,580,353
Non-trainable params: 1,041,025
_________________________________________________________________

4.火车 (4. Train)

Finally, we are ready to train our GAN! Does the code look weird to you? Don’t worry, I’m gonna explain each step.

最后，我们准备训练GAN！代码对您来说看起来很奇怪吗？不用担心，我将解释每个步骤。

码： (Code:)

Training loop for GAN

GAN的训练循环

The outer is for traversing through the epochs and the inner one is for batches. I’ve trained the models for 80 epochs and the batch_size is 128. So, in one epoch we will have 2909 (steps_per_epoch = ⌊ no. of data samples/batch_size⌋ = ⌊372,450/128⌋ = 2909) steps.

外部用于遍历时期，内部用于批处理。我已经为80个纪元训练了模型，batch_size为128。因此，在一个纪元中，我们将有2909个步骤(steps_per_epoch =数量的数据样本/batch_size⌋= 372,450 /128⌋= 2909)。

G固定时的D列火车： (Train D while G is fixed:)

First, the batch_size number of noise vectors are formed by drawing numbers randomly from a standard normal distribution. Then these vectors are given to the generator to create fake images. Now we draw the batch_size number of real images from training data. To get the input to the discriminator we need to concatenate the fake and the real data. Accordingly, we need to mention the label vector (0: fake data, 1: real data). But wait, the code says 0.1 and 0.9 instead! WTH is going on?

首先，通过从标准正态分布中随机抽取数字来形成噪声矢量的batch_size数。然后将这些向量提供给生成器以创建伪图像。现在我们从训练数据中绘制真实图像的batch_size数量。为了将输入提供给鉴别器，我们需要将假数据和真实数据连接起来。因此，我们需要提及标签向量(0：伪数据，1：真实数据)。但是，等等，代码改为显示0.1和0.9！ WTH正在进行吗？

This technique is called level smoothing. It prevents the discriminator from being overconfident about its prediction.

这种技术称为水平平滑。它可以防止歧视者对其预测过分自信。

Then we call the train_on_batch function for the discriminator and pass the data-label pairs.

然后，我们为鉴别器调用train_on_batch函数，并传递数据标签对。

G固定时，火车D： (Train D while G is fixed:)

Here, we need only the noise vectors and labels. The label vector contains 1's. Wait, the generator makes fake data so shouldn’t the labels be 0?

在这里，我们只需要噪声向量和标签。标签向量包含1。等等，生成器会生成假数据，因此标签不应该为0吗？

Yes. But here we are deliberately giving wrong labels so that the discriminator makes mistakes. The reason being we want the generator to outperform the discriminator. By doing this G will know how D behaves when it is given real labels and it (G) will change its weights accordingly to fool D. Remember that at this stage, we are not changing the weights of the discriminator so the discriminator is not ‘unlearning’ anything.

是。但是在这里，我们故意给错误的标签贴上标签，以便区分者犯错误。原因是我们希望生成器的性能优于鉴别器。这样，G就会知道D在被赋予实数标签时的行为，并且它(G)会相应地改变其权重以欺骗D。请记住，在此阶段，我们并未更改鉴别器的权重，因此鉴别器不是'什么都学不到。

Now we call the train_on_batch function for the generator and pass the data-label pairs. And that’s friends, how a GAN is trained!

现在，我们为生成器调用train_on_batch函数，并传递数据标签对。这就是朋友，GAN是如何训练的！

Let me show you some of the best (hand-picked) results that my model has produced…

让我向您展示我的模型产生的一些最佳(精选)结果…