SinGAN: Learning a Generative Model from a Single Natural Image

最新推荐文章于 2020-04-02 22:17:51 发布

宁弈

最新推荐文章于 2020-04-02 22:17:51 发布

阅读量173

点赞数

分类专栏：学习周报文章标签：论文阅读

本文链接：https://blog.csdn.net/qq_44516871/article/details/103430333

版权

学习周报专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Abstract

SinGAN是一个无条件限制的生成模型，此模型可以用单个自然图像进行学习。
SinGAN包含一个完全卷积GAN的金字塔，可以用来生成风格多样的图片。
This is achieved by a pyramid of fully convolutional light-weight GANs, each is responsible for capturing the distribution of patches at a different scale.Once trained, SinGAN can produce diverse high quality image samples(of arbitrary dimensions), which semantically resemble the training image,yet contain new object conﬁgurations and structures1
在这里插入图片描述

Introduction

对构成单一图像的子图像块儿的 internal distribution进行建模早有应用，如去噪，去模糊，超分辨，去雾，图像修复

在这里插入图片描述

Method

Our goal is to learn an unconditional generative model that captures the internal statistics of a single training image x. This task is conceptually similar to the conventional GAN setting, except that here the training samples are patches of a single image,rather than whole image samples from a database.

如下图所示，为了生成更丰富的内容，我们的生成框架由许多层次化的patch-GANS构成，其中每一个都负责捕获图像 $x$ 的不同尺度的patch分布。

在这里插入图片描述

2.1Multi-scale architecture

该模型由金字塔生成器G ${G_0,.....,G_N\}$ 组成，用图像金字塔 X: ${x_0,.....,x_N\}$ 进行训练，其中 $x_n$ 是x的降采样，采样因子 $r^n,r>1$ ,每一个生成器 $G_n$ 都负责生成与图像patch $x_n$ 的 $d i s t r i b u t i o n$ 对应的图像样本。Gn的目的是用来混淆对应的 $D_n$ ， $D_n$ 的做用是区分生成的样本以及 $x_n$ 。
The generation of an image sample starts at the coarsest scale and sequentially passes through all generators up to the ﬁnest scale, with noise injected at every scale. 每一层的生成器和判别器具有相同感受野。生成过程是自底向上的(上采样）。
1）最底层的生成过程：
在这里插入图片描述
该区域的感受野是整个图片的1/2，该层的GAN可以生成图像的总体布局和全局结构。

2）从最底层向上的每一层生成器都不断的向图片中增加前一层所没有的细节(也就是越往上，细节越丰富)。
也就是：
在这里插入图片描述
其中 $z_n$ 为每一层的空间噪声，第二项为最邻近的下层图像的上采样版本

所有生成器具有相似的结构，如下图所示：
在这里插入图片描述由上图可以得 $G_n$ 的具体表达式

在这里插入图片描述

2.2 Training

训练方法：分层次训练，从最粗糙的尺度训练到最精细的尺度(自下而上进行训练）。每训练好一个GAN，就将其保持固定。第 $n_{th}$ 的训练损失可以表示为：
在这里插入图片描述

Adversarial Loss
使用WGAN-GP loss 作为Adv loss，其中最终的判别分数是patch 判别图上的平均分； $D_n$ 的结构与Gn中的网络结构一样。
Reconstruction Loss
在这里专门指定噪声图z：

其中 $z^*$ 是一个固定的噪声图(在训练期间一直保持固定）。