使用Ann给灰度图像着色

As my usual hunt for learning variety of Network Architectures, I came across Pix2Pix. Pix2Pix can be used to colorize images, convert map terrains, convert building facades to real buildings. I decided to choose to move forward with the first one. It turned out to be quite interesting. Let’s go ahead and see how does this happen.

在我通常寻求学习各种网络体系结构的过程中,我遇到了Pix2Pix。 Pix2Pix可用于为图像着色,转换地图地形,将建筑立面转换为真实建筑物。 我决定选择第一个。 事实证明这很有趣。 让我们继续前进,看看这是如何发生的。

在继续之前... (Before moving on…)

If you choose to read about it, Do read more of the research paper here. If you would like to check out the inspiration tutorial which is used for converting building facades to real buildings, check it out here.

如果您选择阅读,请在此处阅读更多研究论文。 如果您想查看用于将建筑立面转换为真实建筑物的灵感教程,请在此处进行查看。

Since the tensorflow doc only had the guide for the buildings, I found another paper for the colorization of images. You can find it here.

由于tensorflow文档仅具有建筑物指南,因此我找到了另一篇论文对图像进行着色。 你可以在这里找到它。

Finally, Not in the mood to read, You can check out the my notebook here for the code.

最后,您不想阅读,可以在这里签出我的笔记本中的代码。

介绍 (Introduction)

The basis of this Network is similar to what we find when creating new images from noise i.e. GANs. This Network has a generator network, which however is a U-Net and a discriminator which determines whether the input image is real or not. They keep doing it until the loss is minimized. We will discuss the about the loss as well moving further ahead.

该网络的基础类似于我们从噪声(即GAN)创建新图像时发现的网络。 该网络具有一个生成器网络,该生成器网络是一个U-Net和一个鉴别器,该鉴别器确定输入图像是否真实。 他们一直这样做,直到损失降到最低。 我们将讨论有关损失以及进一步前进。

If you guys are unfamiliar with the concept of Transposed Convolution or Convolutions, You can check it here before moving on.

如果您不熟悉转置卷积或卷积的概念,可以在继续进行之前在此处进行检查。

都需要什么? (What all is required?)

I would suggest if you can open the notebook here. Since I wanted to keep code snippets in the article minimal, We would be reading one and another side by side. I have added comments in each module for self explanation.

我建议您是否可以在这里打开笔记本。 由于我想尽量减少文章中的代码片段,因此我们将并排阅读。 我在每个模块中都添加了注释以进行自我解释。

After imports and downloading the data, we observe that the facades do not have input as gray scale and thus during the preprocessing, we apply only to the target image and at the end of those steps, we create one gray scale image returning as input image.

导入并下载数据后,我们发现立面没有作为灰度的输入,因此在预处理期间,我们仅将其应用于目标图像,并在这些步骤的最后,创建了一个作为输入图像返回的灰度图像。

Modules created:

创建的模块:

  1. Load Image: Loading the image from path

    加载图像:从路径加载图像
  2. Resize: Resize the input to given dimensions

    调整大小:将输入调整为给定尺寸
  3. Random Crop: Cropping the image slightly larger than required size to required size

    随机裁切:将图像略大于所需尺寸裁切为所需尺寸
  4. Normalize: normalizing the pixel values of the image for passing into the network

    归一化:对要传输到网络中的图像的像素值进行归一化
  5. Random Jitter: Using the modules defined above, we resize and randomly crop the image. At the end of the steps, we create input image and return along with the real image.

    随机抖动:使用上面定义的模块,我们调整大小并随机裁剪图像。 在步骤的最后,我们创建输入图像并与真实图像一起返回。

Note: While testing, we do not apply random jitter and thus explicitly created the input image at the end of the load function

注意:在测试期间,我们不应用随机抖动,因此在加载函数的末尾显式创建了输入图像

After these steps, we can create input pipeline using tf.data.Datasets api. This api helps us to stream data in a fashion, while applying our preproccessing steps and not worrying about keeping all the data in memory.

完成这些步骤后,我们可以使用tf.data.Datasets api创建输入管道。 此api帮助我们以一种时尚的方式流数据,同时应用我们的预处理步骤,而不必担心将所有数据保留在内存中。

让我们生成图像 (Let’s generate the Image)

Our generator in this case if slightly different than GANs. We take the gray scale image as input and output a colorized one.

在这种情况下,我们的生成器与GAN略有不同。 我们将灰度图像作为输入并输出彩色的图像。

The generator is modified U-Net. U-Net in simple terms is a net where the first phase is to downsample the input and in the later half, upsample.

生成器是经过修改的U-Net。 简单来说,U-Net是一个网络,其中第一阶段是对输入进行下采样,而后半阶段是对上采样。

We created two modules, downsample and upsample to be utilized for creating the model.

我们创建了两个模块,下采样和上采样,用于创建模型。

  1. Downsample:

    下采样:

Each block in the encoder is Conv -> Batchnorm ->Leakyrelu. There are skip connections from the blocks in the first phase to the decoder in the second phase.

编码器中的每个块都是Conv-> Batchnorm-> Leakyrelu。 从第一阶段的块到第二阶段的解码器存在跳过连接。

2. Upsample:

2.上采样:

Each block in the decoder is Transposed Conv -> Batchnorm ->Dropout(first 3) -> Relu.

解码器中的每个块都是转置转换-> Batchnorm-> Dropout(前3个)-> Relu。

Image for post
U-Net Architecture for the Generator | Observe how the skip connections work as well as input and output of the model.
发电机的U-Net架构| 观察跳过连接的工作方式以及模型的输入和输出。

Loss for the Generator:

发电机损耗:

Two types of the losses are used to calculate the total loss.

两种类型的损失用于计算总损失。

  1. First is the sigmoid cross entropy of the output from discriminator using Generated Image as the input and an array of ones.

    首先是鉴别器的输出的S形交叉熵,使用生成的图像作为输入和一个1的数组。

  2. Other is L1 loss (MAE : Mean Absolute Error) of the Generated Image and the Target Image.

    另一个是生成的图像和目标图像的L1损失(MAE:平均绝对误差)。

Total Loss is then calculated as sum of First and a weighted L1.

然后,将总损失计算为First和加权L1之和。

Total Loss = Sigmoid loss + LAMBDA*L1_loss, where LAMBDA = 100 as per the paper.

总损失= Sigmoid损失+ LAMBDA * L1_loss,其中LAMBDA = 100(根据纸张)。

Image for post
Flow for the calculations to get the loss. | Source: https://www.tensorflow.org/tutorials/generative/pix2pix
计算流程以获取损失。 | 资料来源: https : //www.tensorflow.org/tutorials/generative/pix2pix

让我们建立鉴别器 (Let’s Build the Discriminator)

Discriminator will help us to generate images similar to target by classifying the image as real or fake.

通过将图像分类为真实或伪造,鉴别器将帮助我们生成类似于目标的图像。

  1. The Discriminator is a PatchGAN.

    鉴别器是PatchGAN。
  2. Each Block is Conv -> BatchNorm -> LeakyRelu.

    每个块都是转换-> BatchNorm-> LeakyRelu。
  3. The shape of the last layer is (batch_size,30,30,1). Each 30x30 patch of the output classifies a 70x70 portion of the Input.

    最后一层的形状是(batch_size,30,30,1)。 输出的每个30x30色块将输入的70x70部分分类。
  4. Discriminator receives 2 inputs:

    鉴别器接收2个输入:

Input Image and Target Image, which it should classify as real.

输入图像和目标图像,应将其分类为真实图像。

Input Image and Generated Image, which it should classify as fake.

输入图像和生成的图像,应将其分类为伪造的。

Image for post
The Input of the Discriminator is the Concatenation of the Input Image and the Image to be classified as real or fake.
鉴别器的输入是输入图像和要分类为真实或伪造的图像的串联。

Loss for the Discriminator:

鉴别者的损失:

The loss for the Discriminator comprises of two losses defined as:

鉴别器的损失包括两个损失,定义为:

  1. Real Loss: sigmoid cross entropy of the real image and an array of ones.

    真实损耗:真实图像的S形交叉熵和一堆数组。

  2. Generated Loss: sigmoid cross entropy of the generated image and an array of zeros.

    生成的损失:生成的图像和零数组的 S型交叉熵

Total Loss = Real Loss + Generated Loss

总损失=实际损失+产生的损失

Image for post
Flow for the calculation of the loss | Source: https://www.tensorflow.org/tutorials/generative/pix2pix
流量计算损失 资料来源: https : //www.tensorflow.org/tutorials/generative/pix2pix

Until now, we have created the network and defined our losses (If you are following along on the notebook). Next steps would be Training the network.

到现在为止,我们已经创建了网络并定义了损失(如果您正在笔记本上关注)。 下一步将是训练网络。

火车要走几步? (What would be the train steps?)

  1. Each image input generates an output, lets call it generated image.

    每个图像输入都会生成一个输出,我们称其为生成图像。
  2. We will pass the input image and target image and get the output disc_real_image.

    我们将传递输入图像和目标图像,并获得输出disc_real_image。
  3. Next passing the input and generated image from the generator to get the disc_generated_image.

    接下来,将来自生成器的输入和生成的图像传递给disc_generated_image。
  4. Now, if you remember, we need gradients to update our parameters and losses to calculate gradients.

    现在,如果您还记得的话,我们需要渐变来更新参数,而损耗需要计算渐变。
  5. Follow through the comments in the below snippet, if you would like to check the implementation in tensorflow.

    如果您想检查tensorflow中的实现,请遵循以下代码片段中的注释。
@tf.function 
def train_step(input_image,target,epoch):
  with tf.GradientTape() as gen_tape, tf.GradientTape() as dis_tape:
    gen_output = generator(input_image,training=True) #generated image
    
    dis_real_output = discriminator([input_image,target],training=True) #original image should be classified as real by discriminator
    dis_generated_output = discriminator([input_image,gen_output],training=True) #passing the generated image to the discriminator


    gen_total_loss,gen_gan_loss,gen_l1_loss = generator_loss(dis_generated_output,gen_output,target) #Generator loss
    dis_loss = discriminator_loss(dis_real_output,dis_generated_output) #discriminator loss


  generator_gradients = gen_tape.gradient(gen_total_loss,generator.trainable_variables) #calculated gradients using the losses for the varibles to be updated
  discriminator_gradients = dis_tape.gradient(dis_loss,discriminator.trainable_variables) #calculated gradients using the losses for the varibles to be updated


  #Once we have the gradients, we can updated respective trainable variables in our model for next iteration
  generator_optimizer.apply_gradients(zip(generator_gradients,generator.trainable_variables)) 
  discriminator_optimizer.apply_gradients(zip(discriminator_gradients,discriminator.trainable_variables))


  with summary_writer.as_default():
    tf.summary.scalar('gen_total_loss',gen_total_loss,step=epoch)
    tf.summary.scalar('gen_gan_loss',gen_gan_loss,step=epoch)
    tf.summary.scalar('gen_l1_loss',gen_l1_loss,step=epoch)
    tf.summary.scalar('dis_loss',dis_loss,step=epoch)

最后,让我们的模型适合数据 (At the last, let’s fit our model to the data)

Fit method is more of calling our training step defined above until we have reached a point where losses are minimized. Refer to the below snippet with comments to get good idea about the method. A good practice is to save you model at the end of the training so that it can be used later on.

拟合方法更多地是调用上面定义的训练步骤,直到达到使损失最小化的程度。 请参考以下带有注释的代码片段,以更好地了解该方法。 好的做法是在训练结束时保存您的模型,以便以后使用。

def fit(train_dataset,epochs,test_dataset):
  #an epoch is a pass over the complete dataset
  for epoch in range(epochs): 
    start = time.time()


    display.clear_output(wait=True)
    #printing an image at every epoch to visualize the updates in prediction   
    for example_input,example_target in test_dataset.take(1): 
      generate_images(generator,example_input,example_target)
    print("Epoch: ",epoch)
  
    #running train_step iteratively
    for n,(input_image,target) in train_dataset.enumerate(): 
      print('.',end = '')
      if (n+1)%100 == 0:
        print()
      train_step(input_image,target,epoch)
    print()


    if (epoch+1)%20==0: #checkpoint after 20 epochs.
      checkpoint.save(file_prefix=checkpoint_prefix)
    print('Time Taken for epoch {} is {} sec\n'.format(epoch+1,time.time()-start))
  #checkpoint after finishing last epoch
  checkpoint.save(checkpoint_prefix)

这就是全部? (Is that all?)

Yupp, pretty much. After training you can try out with a random building image to test. If you like to try on another dataset of more dynamic range, feel free to try them out. you might have to change some parameters in the network but it’s attainable. If you do so, drop the link to the work in response.

是的,差不多。 训练后,您可以尝试使用随机的建筑图片进行测试。 如果您想尝试另一个动态范围更大的数据集,请随时尝试。 您可能需要更改网络中的某些参数,但这是可以实现的。 如果这样做,请相应地删除指向该作品的链接。

If you need any clarifications or corrections, feel free to drop a mail or a response. Do follow more articles with similar topics.

如果您需要任何澄清或更正,请随时发送邮件或回复。 请关注具有相似主题的更多文章。

Until next time.

直到下一次。

翻译自: https://medium.com/thenoobengineer/colorizing-gray-scale-images-using-ann-5fd4c2efbec8

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值