快速随意的样式转换

最新推荐文章于 2022-12-15 09:47:28 发布

杨_明

最新推荐文章于 2022-12-15 09:47:28 发布

阅读量326

点赞数

文章标签： java python vue ViewUI

原文链接：https://towardsdatascience.com/fast-and-arbitrary-style-transfer-40e29d308dd3

版权

AdaIN风格转移网络通过自适应实例规范化实现了实时任意样式迁移。它结合内容图像和任意样式图像，通过简单编码器-解码器架构生成风格化图像，同时在解码器中避免使用归一化层以保持灵活性。内容损失和风格损失用于训练网络，确保内容的保留和风格的匹配。AdaIN通过调整特征的均值和方差实现风格迁移，而不需要可学习的仿射参数。

摘要由CSDN通过智能技术生成

神经风格转移，进化(Neural Style Transfer, Evolution)

介绍(Introduction)

The seminal work of Gatys et al. [R1] showed that deep neural networks (DNNs) encode not only the content but also the style information of an image. Moreover, the image style and content are somewhat separable: it is possible to change the style of an image while preserving its content. Their approach is flexible enough to combine content and style of arbitrary images. However, it relies on an optimization process that is prohibitively slow.

Gatys等人的开创性工作。 [R1]表明深度神经网络(DNN)不仅对内容进行编码，还对样式进行编码图像信息。此外，图像样式和内容在某种程度上是可分离的：可以在保留图像内容的同时更改其样式。他们的方法足够灵活，可以结合任意图像的内容和样式。但是，它依赖于过慢的优化过程。

Fast approximations [R2, R3] with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is either restricted to a single style, or the network is tied to a finite set of styles.

已提出使用前馈神经网络进行快速逼近[R2，R3]，以加快神经样式的传递。不幸的是，提高速度是有代价的：网络要么局限于单一样式，要么网络与有限的样式集联系在一起。

Huang and Belongie [R4] resolve this fundamental flexibility-speed dilemma. It has been known that the convolutional feature statistics of a CNN can capture the style of an image. While Gatys et al. [R1] use the second-order statistics as their optimization objective, Li et al. [R5] showed that matching many other statistics, including the channel-wise mean and variance, are also effective for style transfer. Hence, we can argue that instance normalization performs a form of style normalization by normalizing the feature statistics, namely the mean and variance.

Huang和Belongie [R4]解决了这种基本的灵活性-速度难题。众所周知，CNN的卷积特征统计可以捕获图像的样式。而盖蒂斯等。 [R1]使用二阶统计量作为优化目标，Li等。 [R5]显示，与许多其他统计信息匹配，包括按通道均值和方差，对于样式转换也有效。因此，我们可以说实例标准化通过对特征统计量(即均值和方差)进行标准化来执行样式标准化的一种形式。

Why not Batch Normalization?

为什么不批量标准化？

Since BN normalizes the feature statistics of a batch of samples instead of a single sample, it can be intuitively understood as normalizing a batch of samples to be centred around a single style, although different target styles are desired.

由于BN标准化了一批样本而不是单个样本的特征统计信息，因此可以直观地理解为标准化一批样本以围绕一种样式为中心，尽管需要不同的目标样式。

On the other hand, IN can normalize the style of each individual sample to the target style: different affine parameters can normalize the feature statistics to different values, thereby normalizing the output image to different styles.

另一方面， IN可以将每个样本的样式归一化为目标样式：不同的仿射参数可以将特征统计量归一化为不同的值，从而将输出图像归一化为不同的样式。

自适应实例规范化 (Adaptive Instance Normalization)

AdaIN receives a content input x and a style input y, and simply aligns the channel-wise mean and variance of x to match those of y. Unlike BN, IN, or CIN(Conditional Instance Normalization), AdaIN has no learnable affine parameters. Instead, it adaptively computes the affine parameters from the style input.

AdaIN接收内容输入x和样式输入y ，并简单地对齐x的通道方向均值和方差以匹配y 。与BN，IN或CIN (条件实例规范化)不同，AdaIN没有可学习的仿射参数。相反，它从样式输入中自适应计算仿射参数。

Image for post — **Fig 1**. Adaptive 图1 。自适应 Instance Normalization 实例规范化

Intuitively, let us consider a feature channel that detects brushstrokes of a certain style. A style image with this kind of strokes will produce a high average activation for this feature. Moreover, the subtle style information for this particular brushstroke would be captured by the variance. Since, AdaIN only scales and shifts the activations, spatial information of the content image is preserved.

直观地，让我们考虑一个特征通道，该通道可以检测某种样式的笔触。具有这种笔触的样式图像将为此功能产生较高的平均激活度。此外，该特定笔触的微妙风格信息将通过方差捕获。由于AdaIN仅缩放和移动了激活，因此保留了内容图像的空间信息。

风格转移网络 (Style Transfer Network)

The AdaIN style transfer network T (Fig 2) takes a content image c and an arbitrary style image s as inputs, and synthesizes an output image T(c, s) that recombines the content and style of the respective input images. The network adopts a simple encoder-decoder architecture, in which the encoder f is fixed to the first few layers of a pre-trained VGG-19. After encoding the content and style images in the feature space, both the feature maps are fed to an AdaIN layer that aligns the mean and variance of the content feature maps to those of the style feature maps, producing the target feature maps t. A randomly initialized decoder g is trained to invert t back to the image space, generating the stylized image T(c, s).

AdaIN样式传输网络T (图2)将内容图像c和任意样式图像s作为输入，并合成将各个输入图像的内容和样式重新组合的输出图像T(c，s) 。该网络采用简单的编码器-解码器体系结构，其中编码器f固定到预训练的VGG-19的前几层。在对特征空间中的内容和样式图像进行编码之后，两个特征图都被馈送到AdaIN层，该层将内容特征图的均值和方差与样式特征图的均值和方差对齐，从而生成目标特征图t 。对随机初始化的解码器g进行训练，以将t反转回图像空间，从而生成风格化图像T(c，s) 。

Normalization Layers in the Decoder?

解码器中的归一化层？

Apart from using nearest up-sampling to reduce checker-board effects, and using reflection padding in both f and g to avoid border artifacts, one key architectural choice is to not use normalization layers in the decoder. Since IN normalizes each sample to a single style while BN normalizes a batch of samples to be centred around a single style, both are undesirable when we want the decoder to generate images in vastly different styles.

除了使用最接近的上采样来减少棋盘效果，以及在f和g中都使用反射填充以避免边界伪像外，一种关键的体系结构选择是不在解码器中使用归一化层。由于IN将每个样本归一化为单一样式，而BN则将一批样本归一化为以单一样式为中心，因此当我们希望解码器生成截然不同样式的图像时，这都是不希望的。

损失函数 (Loss Functions)

The style transfer network T is trained using a weighted combination of the content loss function Lc and the style loss function Ls.

使用内容损失函数Lc和样式损失函数Ls的加权组合来训练样式转移网络T。

The content loss is the Euclidean distance between the target features t and the features of the output image f(g(t)). The AdaIN output t is used as the content target, instead of the commonly used feature responses of the content image, since it aligns with the goal of inverting the AdaIN output t.

内容损失是目标特征t和输出图像特征f(g(t))之间的欧几里得距离。 AdaIN输出t用作内容目标，而不是内容图像的常用特征响应，因为它与反转AdaIN输出t的目标一致。

Since the AdaIN layer only transfers the mean and standard deviation of the style features, the style loss only matches these statistics of feature activations of the style image s and the output image g(t). Style loss is averaged over multiple layers (i=1 to L) of the VGG-19.

由于AdaIN层仅传递样式特征的均值和标准差，因此样式损失仅与样式图像s和输出图像g(t)的特征激活的这些统计信息匹配。在VGG-19的多个层(i = 1到L)上，平均样式丢失。

结论 (Conclusion)

In essence, the AdaIN Style Transfer Network described above provides the flexibility of combining arbitrary content and style images in real-time.

本质上，上述AdaIN样式传输网络提供了实时组合任意内容和样式图像的灵活性。