【论文阅读】High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

最新推荐文章于 2024-07-20 20:13:08 发布

来日可期1314

最新推荐文章于 2024-07-20 20:13:08 发布

阅读量97

点赞数

分类专栏：论文阅读 GAN 文章标签：论文阅读

本文链接：https://blog.csdn.net/ssjq123/article/details/129071361

版权

论文阅读同时被 2 个专栏收录

29 篇文章 0 订阅

订阅专栏

GAN

2 篇文章 0 订阅

订阅专栏

pix2pixHD
bib:

@INPROCEEDINGS{wang2018pix2pixHD,
    author    = {Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},
    title     = {High-Resolution Image Synthesis and Semantic Manipulation with Conditional {GAN}s},
    booktitle = {CVPR},
    year      = {2018},
    pages     = {8798--8807}
}

1. 摘要

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to lowresolution and still far from realistic. In this work, we generate 2048 × 1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution ofdeep image synthesis and editing.

我们提出了一种利用CGAN从语义标记映射合成高分辨率真实感图像的新方法。条件GAN已经实现了多种应用，但其结果往往局限于低分辨率，而且离现实还很远。在这项工作中，我们使用一种新的对抗损失，以及新的多尺度生成器和鉴别器结构，生成了2048×1024视觉上吸引人的结果。此外，我们将我们的框架扩展到交互式视觉操作，并增加了两个特性。首先，我们结合了对象实例分割信息，它支持对象操作，如删除/添加对象和更改对象类别。其次，我们提出了一种在给定相同输入的情况下生成不同结果的方法，允许用户交互地编辑对象外观。人类意见研究表明，我们的方法明显优于现有的方法，提高了深度图像合成和编辑的质量和分辨率。

Note:

首先本文针对的问题是生成高质量。现有的CGAN已有许多应用，但是在生成高精度图像上仍然存在进步空间。
本文的两个贡献：
- 多尺度（multi-scale）生成器，鉴别器结构
- 与之对应的对抗损失

2. 算法描述

优化目标：

$\min_{G}{((\max_{D_1, D_2, D_3}\sum_{k=1, 2, 3}{\mathcal{L}_{GAN}(G, D_k)}) + \lambda \sum_{k=1, 2, 3}\mathcal{L}_{\mathrm{FM}}(G, D_k))}$

$\mathcal{L}_{\mathrm{FM}}(G, D_k)$ :
$\mathcal{L}_{\mathrm{FM}}(G, D_k) = \mathbb{E}(\mathbf{s}, \mathbf{x})\sum_{i =1}^{T}\frac{1}{N_i}[\|D_k^{(i)}(\mathbf{s}, \mathbf{x}) - D_k^{(i)}(\mathbf{s}, G(\mathbf{s}))\|_1]$

NOTE:

多尺度体现在三个鉴别器（ $D_1, D_2, D_3$ ），对不同缩放尺度的图像进行鉴别，从而达到生成高精度的图像。
第二就是增加了一个正则项（ $\mathcal{L}_{\mathrm{FM}}$ ），这个是在模型层级提取特征做的一个正则，我理解的是一种防止过拟合的手段。

3. 代码

实验就不看了，专业知识太强了，主要是不做这方面的研究，看这篇论文主要是增加知识的广度。代码地址可以参加Github

我想带着问题去看代码，对摘要中提出的两个贡献点： 1. 多尺度 2. Loss的正则项

多尺度
Loss的正则项

来日可期1314

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
1
评论
【论文阅读】High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

我们提出了一种利用CGAN从语义标记映射合成高分辨率真实感图像的新方法。条件GAN已经实现了多种应用，但其结果往往局限于低分辨率，而且离现实还很远。在这项工作中，我们使用一种新的对抗损失，以及新的多尺度生成器和鉴别器结构，生成了2048×1024视觉上吸引人的结果。此外，我们将我们的框架扩展到交互式视觉操作，并增加了两个特性。首先，我们结合了对象实例分割信息，它支持对象操作，如删除/添加对象和更改对象类别。其次，我们提出了一种在给定相同输入的情况下生成不同结果的方法，允许用户交互地编辑对象外观。
复制链接

扫一扫