Photo Animation

最新推荐文章于 2024-07-13 19:15:57 发布

连理o

最新推荐文章于 2024-07-13 19:15:57 发布

阅读量212

点赞数

文章标签： CVPR 2018

本文链接：https://blog.csdn.net/weixin_42437114/article/details/129142321

版权

papers 专栏收录该内容

39 篇文章 1 订阅

订阅专栏

[CVPR 2018] CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
[ISICA 2020] AnimeGAN: A Novel Lightweight GAN for Photo Animation
AnimeGANv2
- Introduction
- Method
AnimeGANv3
References

[CVPR 2018] CartoonGAN: Generative Adversarial Networks for Photo Cartoonization

Chen, Yang, Yu-Kun Lai, and Yong-Jin Liu. “Cartoongan: Generative adversarial networks for photo cartoonization.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

Introduction

现有的将现实场景照片转换为卡通风格的方法效果并不理想，作者认为可能有如下原因：(1) 卡通风格具有高度抽象和简化的特点；(2) 卡通图像一般具有 clear edges、smooth color shading 以及 relatively simple textures，并不适合现有的 texture-descriptor-based loss functions
为此作者提出 CartoonGAN，使用 unpaired photos and cartoon images 进行训练，并提出了两种新的损失函数 semantic content loss 和 edge-promoting adversarial loss 以及 initialization phase，在生成质量和训练高效性上都超过了现有模型

CartoonGAN

CartoonGAN architecture

Generator $G$ . generator 负责学得 photo manifold $\mathcal P$ 到 cartoon manifold $\mathcal C$ 上的映射。输入图像首先经过 flat convolution stage，然后经过 two down-convolution blocks 进行降采样来抽取出 useful local signals，接着 eight residual blocks with identical layout 进一步构建出 content and manifold feature，最后由 two up-convolution blocks 进行上采样 (via Transposed convolution)，重构出 cartoon style images
Discriminator $D$ . discriminator 负责判断输入图像是否为 real cartoon image. 由于这一分类任务比较简单，且 cartoon style discrimination relies on local features of the image，因此 $D$ 设计的比较轻量化。输入图像首先经过 flat layers，然后由 two strided convolutional blocks 降低分辨率并抽取出 local features，最后由 feature construction block 和 $3\times3$ 卷积层获取分类结果

在这里插入图片描述

Loss function

在这里插入图片描述

Adversarial loss $\mathcal L_{adv}(G,D)$

在 GAN 中， $D$ 只是用于判别输入图像是合成的还是真实的，即衡量合成数据分布和真实数据分布之间的 JS 散度。作者认为，对于卡通化任务而言，单独施加这一对抗损失函数是不够的，这是因为 clear edges 是卡通图像的一个重要特征，但 clear edges 在整张图像中的占比又是非常小的，因此具有 correct shading 但没有 clear edges 的输出图像也可能会骗过 discriminator
为了解决上述问题，作者通过去除卡通图像中的 clear edges，由训练集中的卡通图像 $S_{data}(c)\subset\mathcal C$ 生成了 cartoon-like images without clear edges $S_{data}(e)=\{e_i|i=1,...,M\}\subset\mathcal{E}$ ，将该集合内的图像也当作 discriminator 的负样本
In more detail, for each image $c_i\in\mathcal S_{data}(c)$ , we apply the following three steps: (1) detect edge pixels using a standard Canny edge detector, (2) dilate the edge regions, and (3) apply a Gaussian smoothing in the dilated edge regions.
edge-promoting adversarial loss

Content loss $\mathcal L_{con}(G,D)$

Content loss 用于保证 generator 生成的图像保持了原有图像的内容。为此，作者使用预训练的 VGG 模型中 layer ‘conv4_4’ 输出的 feature map $VGG_l$ 来构成 content loss
注意到，这里作者使用了 L1 损失，这是因为卡通图像风格化后，VGG 抽取出的内容特征图也会受到很大影响，主要体现在局部区域上的特征改变 (e.g. 卡通化图像具有 clear edges and smooth shading). 而 L1 损失相比 L2 损失能更好地应对上述改变

Initialization phase

为了加快模型收敛，作者提出了 initialization phase. 具体来说，就是先让 generator 只使用 content loss 进行训练，下图展示了 10 个 epochs 后 generator 的重建效果

Experiments

Different artists have their unique cartoon styles, which can be effectively learned by CartoonGAN.
Comparison with state of the art
Roles of components in loss function

[ISICA 2020] AnimeGAN: A Novel Lightweight GAN for Photo Animation

Chen, Jie, Gang Liu, and Xin Chen. “Animegan: A novel lightweight gan for photo animation.” International symposium on intelligence computation and applications. Springer, Singapore, 2020.
Online access: https://animegan.js.org/
github (tensorflow): https://github.com/TachibanaYoshino/AnimeGAN
github (pytorch): https://github.com/ptran1203/pytorch-animeGAN
知乎: https://zhuanlan.zhihu.com/p/76574388

Introduction

AnimeGAN 是对 CartoonGAN 的改进，新的 generator 架构使得 AnimeGAN 更加轻量化，同时，对损失函数的改进也使得生成图像质量得到了进一步提高

AnimeGAN

AnimeGAN Architecture

Generator $G$ . generator 为对称的 encode-decoder 结构，负责学得 photo manifold $\mathcal P$ 到 cartoon manifold $\mathcal C$ 上的映射。其中降采样模块 Down-Conv 可以用来避免最大值池化带来的特征信息丢失，上采样模块 Up-Conv 可以有效避免转置卷积带来的合成图像中的 checkerboard artifacts，IRB 模块相比标准残差结构能够有效减少参数量， $1\times1$ 卷积 + tanh 激活函数用于输出卡通图像
Discriminator $D$ . discriminator 负责判断输入图像是否为 real cartoon image. 采用的架构与 CartoonGAN 相同

在这里插入图片描述

Loss Function

Content loss $L_{con}(G,D)$

在这里插入图片描述

其中， $\boldsymbol{S_{data}(p)}$ 为 photo 的数据分布， $VGG_l$ 为 pre-trained VGG19 第 $l$ 层抽取出的图像语义特征 (作者使用的是 “conv4-4” 输出的特征图)

Grayscale Style loss $L_{gra}(G,D)$

为了使得输出卡通图像更好地反映输入图像的纹理信息 (which makes the generated images have the clear anime style on the textures and lines)，作者将训练集中的 animation images $\boldsymbol{S_{data}(a)}$ 都转化为了灰度图 $\boldsymbol{S_{data}(x)}$ ，然后计算输出如下的 Grayscale Style loss
其中，Gram 为输出特征图的 Gram matrix，假如将 $C$ 个特征图的通道看作 $C$ 个特征向量 $f_i\in\R^{h\times w}$ ， $i\in\{1,2,...,C\}$ ，则该特征图的 Gram matrix 即为 $C$ 个特征向量计算得到的 $C\times C$ 的内积矩阵，它代表了各个特征之间的相关性

Color Reconstruction loss $L_{col}(G,D)$

grayscale style loss 可能会使得生成图像和灰度图相近，因此还需要使用额外的损失来使得卡通图像保持颜色信息。为此，作者首先将图像由 RGB 格式转为 YUV 格式 (“Y”表示明亮度，也就是灰阶值，“U” 和 “V” 表示的则是色度，作用是描述影像色彩及饱和度，用于指定像素的颜色)，然后计算如下损失：
其中 Y channel 采用 L1 loss，U, V channel 采用 Huber Loss

Adversarial loss

In order to enable AnimeGAN to generate the higher quality images and make the training of the entire network more stable, the least squares loss function in LSGAN is employed as the adversarial loss $L_{adv}(G, D)$ .

Total loss

在这里插入图片描述

其中 $ω_{adv} = 300, ω_{con} = 1.5, ω_{gra} = 3,ω_{col} = 10$ . 作者也加入了 CartoonGAN 中的 edge-promoting adversarial loss ( $L (D)$ 中的第 4 项)， $\boldsymbol{S_{data}(y)}$ 为在 discriminator 中去除 animation images 的 edges 后转换为的灰度图；此外，作者还加入了 grayscale adversarial loss用于防止输出图像接近灰度图 ( $L (D)$ 中的第 3 项) (discriminator 损失函数中，前一项为正样本，编码为 1，后三项为负样本，编码为 0)

Training

作者采用了 CartoonGAN 中的 initialization phase，只使用 content loss 对 generator 训练一个 epoch
For the weight of each convolutional layer, the spectral normalization is used to make the network training more stable.

Experiments

Results
Ablation Study

AnimeGANv2

project page: https://tachibanayoshino.github.io/AnimeGANv2/
github (tensorflow): https://github.com/TachibanaYoshino/AnimeGANv2
github (pytorch): https://github.com/TachibanaYoshino/AnimeGANv2

Introduction

The improvement directions of AnimeGANv2 mainly include the following 4 points:

Solve the problem of high-frequency artifacts in the generated image.
It is easy to train and directly achieve the effects in the paper.
Further reduce the number of parameters of the generator network. (generator size: 8.17 Mb), The lite version has a smaller generator model.
Use new high-quality style data, which come from BD movies as much as possible.

在这里插入图片描述

Method

The problem of high-frequency artifacts. Instance normalization is generally regarded as the best normalization method in style transfer. It can make different channels in the feature map have different feature properties, thereby promoting the diversity of styles in the images generated by the model. Layer normalization can make different channels in the feature map have the same distribution of feature properties, which can effectively prevent the generation of local noise. 但由于使用 instance normalization，AnimeGAN 容易生成 high-frequency artifacts，为此作者将所有 instance normalization 都替换为了 layer normalization
Generator structure. The generator parameter size of AnimeGANv2 is 8.6MB, and the generator parameter size of AnimeGAN is 15.8MB.
Anime style datasets