图像结构样式分开生成的生成模型论文代码

最新推荐文章于 2022-05-19 12:47:39 发布

玖丶

最新推荐文章于 2022-05-19 12:47:39 发布

阅读量521

点赞数

本文转自微信公众号擦createamind

图像结构样式分开生成的生成模型论文代码

2016-09-29 zdx3578 大脑模拟

Generative Image Modeling using Style andStructure Adversarial Networks

Xiaolong Wang, Abhinav Gupta

Robotics Institute, Carnegie Mellon University

当前生成框架使用终端到终端的学习和由均匀分布的噪声采样生成图像。然而，这些方法忽略图像形成的最基本的原理：图像的产物：（a）结构：底层三维模型;（二）风格：纹理映射到结构。在本文中，我们因式分解图像

生成过程并提出体例结构生成对抗性网（S2-GAN）。我们的S2-GAN有两个组成部分：StructureGAN产生一个结构图;style-GaN取面法线图作为输入并产生2D图像。除了真正的与生成图片的损失函数，我们使用计算机表面的额外损失

生成的图像。这两个GANS首先独立培训，然后通过共同学习合并在一起。我们展示我们的S2-GAN模型可解释，产生更逼真的图像，并能用于学习的无监督RGBD表示。

我们的style-GAN也可以作为渲染引擎，生成不同的图片。

代码 https://github.com/xiaolonw/ss-gan

我们的训练结果迁移到物体分类识别及对象detection的识别上的效果也不错。

先上图，然后论文部分解读

论文部分解读：

Introduction

Unsupervised learning of visual representations is one of the most fundamentalproblems in computer vision. There are two common approaches for unsupervised learning: (a) using a discriminative framework with auxiliary tasks wheresupervision comes for free, such as context prediction [1,2] or temporal embedding [3,4,5,6,7,8]; (b) using a generative framework where the underlying model iscompositional and attempts to generate realistic images [9,10,11,12]. The underlying hypothesis of the generative framework is that if the model is good enoughto generate novel and realistic images, it should be a good representation for

vision tasks as well. Most of these generative frameworks use end-to-end learningto generate RGB images from control parameters (z also called noise since itis sampled from a uniform distribution). Recently, some impressive results [13]have been shown on restrictive domains such as faces and bedrooms.

However, these approaches ignore one of the most basic underlying principles of image formation. Images are a product of two separate phenomena:Structure: this encodes the underlying geometry of the scene. It refers to theunderlying mesh, voxel representation etc. Style: this encodes the texture on theobjects and the illumination. In this paper, we build upon this IM101 principle ofimage formation and factor the generative adversarial network (GAN) into twogenerative processes as Fig. 1. The first, a structure generative model (namelyStructure-GAN), takes zˆ and generates the underlying 3D structure (y3D ) for the

视觉表现的无监督学习是最根本的一个

计算机视觉问题。有对无监督学习两种常用的方法：（1）使用带有辅助任务，其中一个辨别框架

监督来为免费的，比如环境的预测[1,2]或时间嵌入[3,4,5,6,7,8] （b）使用一个生成框架，底层模型

成分并试图生成逼真的图像[9,10,11,12]。的生成架构的基本假设是，如果该模型是足够好

以生成新的和现实的图像，它应该是一个很好的代表性

视觉任务为好。大多数生成框架使用终端到终端的学习

以生成控制参数的RGB图像（Z也称为噪声，因为它

从均匀分布取样）。最近，一些令人印象深刻的结果[13]

已被证明对限制性领域，如脸和卧室。

然而，这些方法忽略的图像形成的最基本原理之一。图像是两个独立的现象的产物：结构：此编码场景的基本几何形状。它指的是

基本网格，体素表示等, 风格：这个编码的纹理

对象和照明。在本文中，我们建立在这一原则IM101

成像和因子的生成对抗网络（GAN）划分为两个

生成过程如图。 1.首先，结构生成模型（即

结构-GAN），采用z和用于生成基本的3D结构（y3D）

The second, a conditional generative network (namely Style-GAN), takesy3D as input and noise z ̃ to generate the image yI . We call this factored generative network Style and Structure Generative Adversarial Network (S2-GAN).

Why S2-GAN? We believe there are fourfold advantages of factoring thestyle and structure in the image generation process. Firstly, factoring style andstructure simplifies the overall generative process and leads to more realistichigh-resolution images. It also leads to a highly stable and robust learning procedure. Secondly, due to the factoring process, S2-GAN is more interpretable ascompared to its counterparts. One can even factor the errors and understandwhere the surface normal generation failed as compared to texture generation.Thirdly, as our results indicate, S2-GAN allows us to learn RGBD representationin an unsupervised manner. This can be crucial for many robotics and graphicsapplications. Finally, our Style-GAN can also be thought of as a learned rendering engine which, given any 3D input, allows us to render a corresponding image.It also allows us to build applications where one can modify the underlying 3Dstructure of an input image and render a completely new image.

第二，有条件生成网络（即风格-GAN），需要y3D作为输入和噪声Z到生成图像矣。我们把这个因素生成的网络,风格结构生成对抗性网（S2-GAN）。

为什么S2-GAN？我们相信，有融通的优势四倍

式和结构在图像生成处理。首先，保理和风格

结构简化了整个生成过程，并导致更逼真

高分辨率的图像。这也导致高度稳定的和强大的学习过程。其次，由于理过程，S2-GAN是更可解释

相比，其对应物。人们甚至可以在因素的错误和理解

当表面正常生成失败相比，纹理生成的。

第三，我们的结果表明，S2-GAN可以让我们学习RGBD表示

在无人监督的方式。这可用于许多机器人和图形关键

应用程序。最后，我们的风格GAN，也可以看作是一个博学渲染引擎当中，给予任何3D输入，使我们能够呈现相应的图像。

它也让我们来构建应用程序，其中一个可以修改底层3D

输入图像的结构和呈现一个完全新的图像。

However, learning S2-GAN is still not an easy task. To tackle this challenge,we first learn the Style-GAN and Structure-GAN in an independent manner. Weuse the NYUv2 RGBD dataset [14] with more than 200K frames for learning

the initial networks. We train a Structure-GAN using the ground truth surface normals from Kinect.

Because the perspective distortion of texture is more directly related to normals than to depth, we use surface normal to represent image structure in this paper.

We learn in parallel our Style-GAN which is conditional on the ground truth surface normals. While training the Style-GAN, we have two loss functions:

the first loss function takes in an image and the surface normals and tries to predict if they correspond to a real scene or not.

However, this loss function alone does not enforce explicit pixel based constraints for aligning generated images with input surface normals.

To enforce the pixel-wise constraints, we make the following assumption: if the generated image is realistic enough, we should be able to reconstruct or predict the 3D structure based on it.

We achieve this by adding another discriminator network. More specifically, the generated image is not only forwarded to the discriminator network in GAN but also a input for the trained surface normal predictor network.

Once we have trained an initial Style-GAN and Structure-GAN, we combine them together and perform end-to-end learning jointly where images are generated from zˆ, z ̃ and fed to discriminators for real/fake task.

2 Related Work

3 背景GAN

4 Style and Structure GAN

GAN and DCGAN approaches directly generate images from the sampled z.

Instead, we use the fact that image generation has two components: (a) gener-ating the underlying structure based on the objects in the scene; (b) generatingthe texture/style on top of this 3D structure. We use this simple observationto decompose the generative process into two procedures: (i) Structure-GAN -this process generates surface normals from sampled zˆ and (ii) Style-GAN - thismodel generates the images taking as input the surface normals and anotherlatent variable z ̃ sampled from uniform distribution. We train both models withRGBD data, and the ground truth surface normals are obtained from the depth.

GAN和DCGAN办法直接生成从采样ž图像。相反，我们使用该图像生成有两个组成部分的事实：（a）产生根据在场景中的对象的基本结构;（b）产生在这个三维结构的顶部纹理/风格。我们使用这个简单的观察，以分解的生成过程分成两个步骤：（ⅰ）结构 - GAN - 这个过程从采样z和生成结构表面（ⅱ）形式 - GAN - 该模型生成以作为输入表面法线和另一图像潜变量z从均匀分布采样。我们用RGBD数据训练这两种模式，与地面真相表面法线是从深度获得。

更多内容请阅读原文下载论文。

一起学习讨论：qq群号 325921031；微信群请后台留言加群或下方二维码；

线下活动报名请后台留言所在地区；

其他入门干货访问公众号createamind即可。

阅读原文

微信扫一扫
关注该公众号

玖丶

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
图像结构样式分开生成的生成模型论文代码

本文转自微信公众号擦createamind图像结构样式分开生成的生成模型论文代码2016-09-29 zdx3578 大脑模拟Generative Image Modeling using Style andStructure Adversarial NetworksXiaolong Wang, Abhinav Gupta
复制链接

扫一扫