sshpass 缓慢_缓慢而随意的样式转换

最新推荐文章于 2023-07-31 09:35:03 发布

weixin_26752075

最新推荐文章于 2023-07-31 09:35:03 发布

阅读量214

点赞数

文章标签： java

原文链接：https://towardsdatascience.com/slow-and-arbitrary-style-transfer-3860870c8f0e

版权

sshpass 缓慢

神经风格转移，进化 (Neural Style Transfer, Evolution)

介绍 (Introduction)

Style transfer is the technique of combining two images, a content image and a style image, such that the generated image displays the properties of both its constituents. The goal is to generate an image that is similar in style (e.g., color combinations, brush strokes) to the style image and exhibits structural resemblance (e.g., edges, shapes) to the content image.

样式转移是一种组合两个图像( 内容图像和样式图像)的技术，以使生成的图像显示其两个组成部分的属性。目标是生成图像，该图像在样式(例如颜色组合，画笔描边)上与样式图像相似，并且表现出与内容图像相似的结构(例如边缘，形状)。

In this post, we describe an optimization-based approach proposed by Gatys et al. in their seminal work, “Image Style Transfer Using Convolutional Neural Networks”. But, let us first look at some of the building blocks that lead to the ultimate solution.

在这篇文章中，我们描述了Gatys 等人提出的基于优化的方法。在他们的开创性工作中，“ 使用卷积神经网络进行图像样式转换 ”。但是，让我们首先来看一些导致最终解决方案的构件。

CNN学习什么？ (What are CNNs learning?)

At the outset, you can imagine low-level features as features visible in a zoomed-in image. In contrast, high-level features can be best viewed when the image is zoomed-out. Now, how does a computer know how to distinguish between these details of an image? CNNs, to the rescue.

首先，您可以将低级功能想象为放大后可见的功能图片。相比之下，高层缩小图像时，最好查看功能。现在，计算机如何知道如何区分图像的这些细节？ CNN，以营救。

Learned filters of pre-trained convolutional neural networks are excellent general-purpose image feature extractors. Different layers of a CNN extract the features at different scales. The hidden unit in shallow layers, which sees only a relatively small part of the input image, extracts low-level features like edges, colors, and simple textures. Deeper layers, however, with a wider receptive field tend to extract high-level features such as shapes, patterns, intricate textures, and even objects.

预训练卷积神经网络的学习型过滤器是出色的通用图像特征提取器。 CNN的不同层以不同比例提取特征。 浅层中的隐藏单元只能看到输入图像的一小部分，它会提取低层边缘，颜色和简单纹理等功能。然而， 较深的层具有较宽的接受场，往往会提取出高水平的 形状，图案，复杂的纹理甚至对象等特征。

So, how can we leverage these feature extractors for style transfer?

那么，我们如何利用这些特征提取器进行样式转换？

深度图像表示 (Deep Image Representations)

In a convolutional neural network, a layer with N distinct filters (or, C channels) has N (or, C) feature maps each of size HxW, where H and W are the height and width of the feature activation map respectively. The feature activation for this layer is a volume of shape NxHxW (or, CxHxW). Let’s see how to use these activations to separate content and style information from individual images.

在卷积神经网络中，具有N个不同过滤器(或C个通道)的层有N (或C ) 每个特征图的大小为H x W，其中H和W分别是特征激活图的高度和宽度。该层的特征激活为N x H x W (或C x H x W )形状的体积。让我们看看如何使用这些激活来将内容和样式信息与单个图像分开。

内容表示 (Content Representation)

Traditionally, the similarity between two images is measured using L1/L2 loss functions in the pixel-space. While these losses are good to measure the low-level similarity, they do not capture the perceptual difference between the images. For instance, two identical images offset from each other by a single pixel, though perceptually similar, will have a high per-pixel loss.

传统上，使用像素空间中的L1 / L2损失函数来测量两个图像之间的相似度 。这些损失可以很好地衡量低水平 相似，他们没有捕捉到知觉图像之间的差异。例如，尽管彼此感觉上偏移了一个像素的两个相同图像，尽管在感觉上相似，但每个像素的损失却很高。

Intuitively, if the convolutional feature activations of two images are similar, they should be perceptually similar. Therefore, we refer to the feature responses of the network as the content representation, and the difference between feature responses for two images is called the perceptual loss. To find the content reconstruction of an original content image, we can perform gradient descent on a white noise image that triggers similar feature responses.

直觉上，如果两个图像的卷积特征激活相似，则它们应该在感知上类似。因此，我们将网络的特征响应称为内容表示 ，将两幅图像的特征响应之间的差异称为感知损失。 为了找到原始内容图像的内容重建 ，我们可以对触发相似特征响应的白噪声图像执行梯度下降。

The scales of features captured by different layers of the network can be visualized by generating content reconstructions by matching only feature responses from a particular layer (refer Fig 2). Reconstructions from lower layers are almost perfect (a,b,c). In higher layers of the network, detailed pixel information is lost while high-level content is preserved (d,e).

通过仅匹配来自特定层的特征响应(请参见图2)来生成内容重构 ，可以显示网络不同层捕获的特征的比例。从较低层进行的重建几乎是完美的(a，b，c)。在网络的较高层中，丢失了详细的像素信息，同时保留了高级内容(d，e)。

Image for post — **Content and Style Representations**, Image taken from “[R1] Image Style Transfer Using Convolutional Neural Networks” **内容和样式表示** ，图像取自“使用卷积神经网络的[R1]图像样式传递”

样式表示 (Style Representation)

To obtain a representation of the style of an input image, a feature space is built on top of the filter responses in each layer of the network. It consists of the correlation between different filter responses over the spatial extent of the feature maps. Mathematically, the correlation between different filter responses can be calculated as a dot product of the two activation maps. Formally, the style representation of an image can be captured by a Gram Matrix (refer Fig 3) which captures the correlation of all feature activation pairs. For N filters in a layer, the Gram Matrix is an NxN dimensional matrix.

为了获得输入图像样式的表示，在网络的每一层中，在过滤器响应的顶部构建特征空间。它由特征图空间范围内不同滤波器响应之间的相关性组成。在数学上，可以将不同滤波器响应之间的相关性计算为两个激活图的点积。形式上，图像的样式表示可以由Gram矩阵 (请参见图3)捕获，该矩阵可以捕获所有特征激活对的相关性。对于一层中的N个滤镜，克矩阵是N x N维矩阵。

The diagonal elements at a location (i, i) of a Gram Matrix measure how active a filter i is. Suppose filter i is detecting vertical edges in the image, then a high value at (i, i) means the image has a lot of vertical edges.
克矩阵的位置(i，i)上的对角线元素用于衡量过滤器i的激活程度。假设过滤器i正在检测图像中的垂直边缘，则(i， i)处的高值意味着图像具有很多垂直边缘。
The value at location (i, j) of a Gram Matrix measures the similarity of the activations of two different filters i and j. In other words, this implies the co-existence of features captured by both filters i and j.
克矩阵的位置(i，j)上的值用于度量两个不同的滤波器i和j的激活的相似性 。换句话说，这意味着由过滤器i和j捕获的特征并存。

By capturing the prevalence of different types of features (i, i), as well as how much different features occur together (i, j), the Gram Matrix measures the style of an image. Essentially, by discarding the spatial information stored at each location in the feature activation maps, we can successfully extract style information.

通过捕获不同类型特征的普遍性(i，i)以及一起出现多少不同特征(i，j) ，Gram矩阵可以衡量图像的样式。本质上，通过丢弃存储在特征激活图中每个位置的空间信息，我们可以成功提取样式信息。

Similar to content reconstructions, style reconstructions can be generated by minimizing the difference between Gram Matrices of a random white image and a reference style image (Refer Fig 2). This creates images that match the style of a given image on an increasing scale while discarding information of the global arrangement of the scene.

与内容重建相似，可以通过最小化随机白色图像和参考样式图像的Gram矩阵之间的差异来生成样式重建 (请参见图2)。这将以递增的比例创建与给定图像样式匹配的图像，同时丢弃场景的整体布置信息。

Now that we have all the key ingredients for defining our loss functions, let’s jump straight into it.

现在我们已经拥有定义损失函数的所有关键要素，让我们直接进入它。

损失函数 (Loss Functions)

Let C, S, and G be the original content image, original style image and the generated image, and aᶜ, aˢ and aᴳ their respective feature activations from layer l of a pre-trained CNN.

令C，S和G为原始内容图像，原始样式图像和生成的图像，以及aᶜ，aˢ和aᴳ分别来自预训练的CNN的第1层的特征激活。

内容损失 (Content Loss)

The content loss, as described in Fig 4, can be defined as the squared-error loss between the feature representations of the content and the generated image. Along the processing hierarchy of a CNN, the input image is transformed into representations that are increasingly sensitive to the actual content of the image but becomes relatively invariant to its precise appearance. In practice, we can best capture the content of an image by choosing a layer l somewhere in the middle of the network.

如图4所示，内容损失可以定义为内容的特征表示与生成的图像之间的平方误差损失。沿着CNN的处理层次结构，输入图像被转换为对图像的实际内容越来越敏感但对其精确外观相对不变的表示形式。实际上，我们可以通过在网络中间某处选择一个层l来最好地捕获图像的内容。

风格损失 (Style Loss)

The style loss, as described in Fig 5, can be defined as the squared-error loss between Gram Matrices of the style and the generated image. We generally take a weighted contribution of style loss across multiple layers of the pre-trained network.

如图5所示，样式损失可以定义为样式的 Gram矩阵与生成的图像之间的平方误差损失。通常，我们在经过预训练的网络的多个层中对样式损失进行加权处理。

连接点 (Connecting the dots)

Combining the separate content and style losses, the final loss formulation is defined in Fig 6. We start with a random image G, and iteratively optimize this image to match the content of the image C and style of the image S, while keeping the weights of the pre-trained feature extractor network fixed.

结合单独的内容损失和样式损失，最终损失公式在图6中定义。我们从随机图像G开始，并迭代优化此图像以匹配图像C的内容和图像S的样式，同时保持权重预先训练好的特征提取器网络的固定。

In conclusion, it is important to note that, though the optimization process is slow, this method allows style transfer between any arbitrary pair of content and style images.

总之，必须注意的是，尽管优化过程很慢，但是该方法允许在任意一对内容和样式图像之间进行样式转换。

翻译自: https://towardsdatascience.com/slow-and-arbitrary-style-transfer-3860870c8f0e

sshpass 缓慢

weixin_26752075

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sshpass 缓慢_缓慢而随意的样式转换

sshpass 缓慢神经风格转移，进化 (Neural Style Transfer, Evolution) 介绍 (Introduction)Style transfer is the technique of combining two images, a content image and a style image, such that the generated image dis...
复制链接

扫一扫