小时转换为机器学习特征_使用机器学习将您的图像转换为蒸气波或其他艺术风格...

最新推荐文章于 2024-07-17 09:42:12 发布

weixin_26731327

最新推荐文章于 2024-07-17 09:42:12 发布

阅读量212

点赞数

文章标签：机器学习 python 人工智能深度学习 java

原文链接：https://towardsdatascience.com/using-machine-learning-to-convert-your-image-to-vaporwave-or-other-artistic-styles-df6fb9aa60e0

版权

小时转换为机器学习特征

TL;DR: This article walks through the mechanism of a popular machine learning algorithm called neural style transfer (NST), which is able to convert any image of your choice to your favorite artistic style. The algorithm is a direct application of the famed convolutional neural network and dexterously frame the problem into optimizing for two loss terms. With its succinct formulation, the algorithm offers a straightforward way to come up with your own implementation of a fun image converter (Think of DeepArt or Prisma). Any topic in deep learning is vast and this article only walks over briefly the NST algorithm. While its sequels will deal with the implementation quirkies and some other interesting application of the algorithm, for now let us get some intuition behind the algorithm and have fun playing with it.

TL; DR：本文介绍了一种流行的机器学习算法(称为神经样式转换(NST))的机制，该算法能够将您选择的任何图像转换为您喜欢的艺术风格。该算法是著名的卷积神经网络的直接应用，并且巧妙地将问题构造为针对两个损失项进行优化。凭借简洁的公式，该算法提供了一种直接的方法来提出您自己的有趣图像转换器的实现(DeepArt或Prisma的思考)。深度学习中的任何主题都非常广泛，本文仅简要介绍NST算法。虽然其续集将讨论算法的实现古怪和其他有趣的应用，但现在让我们对算法有所了解，并乐在其中。

问题设定 (Problem Setup)

Our goal is clear: make an image S adopt the style of another image T. At this point, this goal might sound a little bit too high-level and you may have legit questions such as how we represent an image in a neural network and how we quantify style, which will be duly answered in the following sections.

我们的目标很明确：使图像S采纳另一幅图像T的样式。在这一点上，这个目标听起来可能有点过高，并且您可能会遇到一些合法问题，例如我们如何在神经网络中表示图像以及我们如何量化样式，将在以下各节中适当回答。

数值表示 (Numerical Representation)

Simply put, an image is represented as a tensor, which can be thought of as a generalization of a matrix. For example, a colored image of size 512*512 will be represented as a tensor(3-D matrix) of size 512*512*3, the number 3 coming from the fact that any color can be encoded as a tuple of R-G-B values, each ranging from 0 to 255. This matrix will be used as the input to the algorithm later.

简而言之，图像表示为张量，可以将其视为矩阵的一般化。例如，大小为512 * 512的彩色图像将表示为大小为512 * 512 * 3的张量(3-D矩阵)，数字3来自任何颜色都可以编码为RGB值元组的事实，每个范围从0到255。此矩阵稍后将用作算法的输入。

卷积神经网络基础 (Convolutional Neural Network Basics)

Since the algorithm builds on the convolutional neural network (CNN) architecture, it is helpful to clarify some points about it beforehand.

由于该算法建立在卷积神经网络(CNN)体系结构上，因此事先弄清有关它的几点是有帮助的。

Two of the most important building blocks in CNN that pertain to our task are the convolutional layer and the pooling layer. We will look at the inner workings of the convolutional layer first.

CNN中与我们的任务有关的两个最重要的构建块是卷积层和池化层。我们将首先看一下卷积层的内部工作原理。

So how do we go from the input layer to the first convolutional layer? Let us look at the following illustration:

那么我们如何从输入层转到第一个卷积层呢？让我们看下图：

In the illustration above, the sub-matrix of size 3*3*3 (only two dimensions are shown here for easy illustration) at the top-left corner will go through a filter of the same size, which transforms the sub-matrix by applying the convolution operation to it, the result of which becoming the activation of the neuron at the top-left corner of the convolutional layer.

在上面的插图中，左上角大小为3 * 3 * 3的子矩阵(此处为方便说明仅显示了两个维度)将通过相同大小的过滤器，该子矩阵通过向其应用卷积运算，其结果成为在卷积层左上角的神经元激活。

But what is a filter? For now you can understand it as a way to identify certain features that can describe an image: right angles, curvatures, textures etc. It does so by convolving itself with the input sub-matrix. For a fuller treatment, please follow the pointers in this Wikipedia page.

但是什么是过滤器？现在，您可以将其理解为识别可以描述图像的某些特征的方法：直角，曲率，纹理等。它可以通过与输入子矩阵进行卷积来实现。要获得更全面的处理，请按照Wikipedia页面上的指示进行操作。

Sliding the filter by one pixel to the right, we have the following:

将滤镜向右滑动一个像素，可以得到以下内容：

The filter will be applied to every 3*3*3 sub-matrix and we will have our first complete convolutional layer.

该滤镜将应用于每个3 * 3 * 3子矩阵，我们将拥有第一个完整的卷积层。

You can check that if we use a filter of size 3*3*3 to an input matrix of size 6*6*3 as above, then the resulting convolutional layer will be of size 4*4*1. However, generally, the number of filters are more than 1, which means we might want to apply several different filters in order to convert our input matrix into the first convolutional layer. Imagine that we stack the resulting matrices of size 4*4*1 from 4 different filters on top of each other; we will end up with a convolutional layer of size 4*4*4, which in turns becomes input to the next layer, be it another convolutional one or pooling one. Note that in the computer vision jargon, each of the filter output of size 4*4*1 can be called a feature map; therefore here we have 4 feature maps.

您可以检查是否如上所述对大小为6 * 6 * 3的输入矩阵使用大小为3 * 3 * 3的过滤器，则结果卷积层的大小将为4 * 4 * 1。但是，通常，滤波器的数量大于1，这意味着我们可能想应用几个不同的滤波器，以便将输入矩阵转换为第一卷积层。想象一下，我们将来自4个不同过滤器的大小为4 * 4 * 1的矩阵彼此堆叠。我们将得到一个大小为4 * 4 * 4的卷积层，该卷积依次成为下一层的输入，无论是另一卷积层还是池化一层。 注意，在计算机视觉术语中，大小为4 * 4 * 1的每个过滤器输出都可以称为特征图。 因此，这里有4个功能图。

The mechanism of the pooling layer can be understood as dimensionality reduction. In the illustration below, the effect of the pooling layer is to reduce any sub-matrix of size 2*2 to 1*1 in the next layer. Popular methods to do this down-sampling include taking the max or the average.

池化层的机制可以理解为降维。在下图中，池化层的作用是将下一层中大小为2 * 2的任何子矩阵减小为1 * 1。进行这种下采样的流行方法包括采用最大值或平均值。

Generally speaking, the architecture of CNN will include alternations of convolutional layers and pooling layers. An example of a typical architecture is provided below:

一般来说，CNN的体系结构将包括卷积层和池化层的交替。下面提供了典型架构的示例：

神经样式转移(NST)算法(The Neural Style Transfer (NST) Algorithm)

With the fundamentals cleared, let us get down into the details of the algorithm.

清除基础知识后，让我们深入了解算法的细节。

NST makes use of the VGG19 neural network illustrated above, excluding the three full-connection layers at the right end, which has been pre-trained to perform object recognition using the ImageNet dataset. VGG19 is shipped with popular deep learning frameworks such as PyTorch and TensorFlow so you don’t need to actually implement it yourself.

NST利用了上面说明的VGG19神经网络，但不包括右端的三个全连接层，这些层已经过预先训练，可以使用ImageNet数据集执行对象识别。 VGG19随附了流行的深度学习框架，例如PyTorch和TensorFlow，因此您不需要自己真正实现它。

Let us review what we have so far. We have the pre-trained VGG19 CNN at our disposal, one image matrix S to be converted to the style of the image matrix T, as well as the intermediate image matrix S’ at each intermediate layer of the network (the initial value of S’ can be set to white noise, or simply to S).

让我们回顾一下到目前为止。我们有预先训练VGG19 CNN在我们的处置，一个图像矩阵S转换为图像矩阵T的风格，以及作为中间图像矩阵S”在网络的每个中间层(S的初始值“可以被设置为白噪声，或简单地S)。

The next step is to formulate the whole problem into optimization tasks. The NST breaks it into the minimization of the sum of two loss functions, the content loss and the style loss. Let us dive in now.

下一步是将整个问题制定为优化任务。 NST将其分为两个损失函数(内容损失和样式损失)之和最小化。现在让我们潜入。

Intuitively, the content loss quantifies the distance between the intermediate image we have at a certain layer and the content image. So at each layer l, we denote the current state of S’ as x, and the original image as p, and further we have the feature maps of x, denoted as F, and those of p, denoted P. The content loss at layer l is thus simply:

直观地讲，内容损失量化了我们在某一层上拥有的中间图像与内容图像之间的距离。 因此，在每层l处，我们将S'的当前状态表示为x ，将原始图像表示为p ，并且进一步，将x的特征图表示为F ，将p的特征图表示为P。因此，第l层的内容损失很简单：

And to get the total content loss, simply sum over the terms for all layers.

为了获得总的内容损失，只需对所有层的条款求和即可。

Let us now look at the style loss. Here the style can be loosely defined as the correlation between different feature maps. For our intermediate image x, let us define G:

现在让我们看一下样式损失。在这里，可以将样式宽松地定义为不同特征图之间的相关性。对于中间图像x ，让我们定义G ：

Similarly, we can define a matrix A for the original content image p. Then our style loss at layer l can be defined as:

类似地，我们可以为原始内容图像p定义矩阵A。然后，我们在l层的样式损失可以定义为：

Summing across all layers, we have the total style loss:

总结所有层，我们总会损失样式：

In practice, the weight terms w above can be set to be equal for all layers: 1/(number of layers), or you can make more refined decision according to the original paper.

实际上，可以将所有层的权重w设置为相等：1 /(层数)，或者您可以根据原始论文做出更精细的决策。

Ultimately, the total loss function is the weighted sum of the content loss and the style loss: a*L(content) + b*L(style), to be minimized with regards to F(i,j) at each layer using your favorite optimizer. The final F matrix will be your result image.

最终，总损失函数是内容损失和样式损失的加权和： a * L(content)+ b * L(style) ，使用您喜欢的优化器将每层的F(i，j)最小化。 最终的F矩阵将成为您的结果图像。

结果 (Results)

PyTorch has provided some sample code for neural style transfer that is very easy to follow and experiment with. Please refer to the Further Reading section below for more information on the implementation.

PyTorch提供了一些用于神经样式转换的示例代码，非常易于遵循和试验。请参阅下面的“进一步阅读”部分以获取有关实现的更多信息。

Please note that you will need to make your input image and your style image have the same size before feeding them to the PyTorch implementation.

请注意，在将输入图像和样式图像提供给PyTorch实现之前，您需要使其尺寸相同。

Now for the fun part: as a fan of Giorgio de Chirico, I have made the following experiments, trying to force his artworks into a vaporwave-like style:

现在最有趣的部分是：作为Giorgio de Chirico的粉丝，我做了以下实验，试图使他的作品成为蒸气波般的风格：

Using the left as the content input image and the right as the style image, we have following:

使用左边作为内容输入图像，右边作为样式图像，我们具有以下内容：

Similarly, another one of de Chirico’s masterpiece has a different chemistry with the modern illustration style:

同样，de Chirico的另一本杰作具有与现代插图风格不同的化学性质：

This gave the following result:

得到以下结果：

Please be encouraged to explore further neural style transfer by following the pointers below and give new life to your photos or images by converting them to some unexpected style.

请鼓励按照下面的指示探索进一步的神经样式转换，并通过将它们转换为某种意外样式来为您的照片或图像赋予新的活力。