CVPR 2018摘要：第三部分

最新推荐文章于 2021-01-12 13:44:50 发布

天涯海阁未走远

最新推荐文章于 2021-01-12 13:44:50 发布

阅读量210

点赞数

文章标签： CVPR2018

转自：http://ai.yanxishe.com/page/TextTranslation/1195

英文原文：NeuroNuggets: CVPR 2018 in Review, Part III

标签：图像识别

标题

NeuroNuggets: CVPR 2018 in Review, Part III

CVPR 2018摘要：第三部分

NeuroNuggets: CVPR 2018 in Review, Part III

The CVPR 2018 (Computer Vision and Pattern Recognition) conference is long over, but we can’t stop reviewing its wonderful papers; today, Part III is upon us! In the first part, we briefly reviewed the most interesting papers on GANs for computer vision from CVPR 2018; in the second part, added a human touch and talked about pose estimation and tracking for humans. Today, we turn to one of the main focal point of our own internal research here at Neuromation: synthetic data. As usual, the papers are in no particular order, and our reviews are very brief, so we definitely recommend to read the papers in full.

Synthetic data: imitate to learn

Synthetic data means data that has been generated artificially, either through 3D modeling and rendering (as usual for computer vision) or by other means, and then used to train machine learning models. Synthetic data is a surprising topic in machine learning, and the most surprising thing is for how long it had been mostly neglected. Some works on synthetic data can be traced to the 2000s, but before 2016 it basically attracted no interest at all. The only field where it had been used was to train self-driving cars, where the need for simulated environments and the impossibility to collect real datasets come together and make it the perfect situation for synthetic datasets.

Now the interest is rapidly growing: we now have the SUNCG dataset of simulated indoor environments, outdoor environments for driving and navigation, the SURREAL dataset of synthetic humans to learn pose estimation and tracking, and even recent works that apply GANs to generate and refine synthetic data (we hope to get back to this and explain how it works later). So let us see what CVPR 2018 authors have to say about synthetic data. Since this is our main focus, we will consider the works on synthetic data in slightly more detail than usual.

NeuroNuggets：CVPR 2018年回顾，第三部分

CVPR 2018（计算机视觉和模式识别）会议已经结束，但我们不能停止回顾其精彩的论文; 今天，我们学习第三部分。在第一部分中，我们简要回顾了2018年CVPR中关于计算机视觉GAN的最有趣的论文; 在第二部分中，增加了人性化，并谈到了人类的姿势估计和跟踪。今天，我们转向Neuromation中内部研究的主要焦点之一：合成数据。像往常一样，论文没有特别的顺序，我们的评论非常简短，所以我们绝对建议完整阅读论文。

合成数据：模仿学习

合成数据是指通过3D建模和渲染（通常用于计算机视觉）或通过其他方式人工生成的数据，然后用于训练机器学习模型。合成数据在机器学习中是一个令人惊讶的主题，最令人惊讶的是它被忽略了多长时间。有关合成数据的一些着作可以追溯到2000年代，但在2016年之前它基本上没有引起任何兴趣。它所使用的唯一领域是训练自动驾驶汽车，对模拟环境的需求和收集真实数据集的不可能性使其成为合成数据集的完美情况。

现在兴趣正在迅速增长：我们现在拥有模拟室内环境的SUNCG数据集，用于驾驶和导航的室外环境，用于学习姿势估计和跟踪的合成人的SURREAL数据集，以及甚至最近应用GAN来生成和改进合成的数据（我们希望回到这一点并解释它之后的工作原理）。那么让我们看看CVPR 2018作者对合成数据的看法。由于这是我们的主要关注点，因此我们将比通常更详细地考虑合成数据的工作。

Generating Synthetic Data from GANs: Augmentation and Adaptation in Feature Space

R. Volpi et al., Adversarial Feature Augmentation for Unsupervised Domain Adaptation
S. Sankaranarayanan et al., Generate To Adapt: Aligning Domains using Generative Adversarial Networks

There is a very interesting and promising field of using GANs to produce synthetic datasets to train other models. On the surface it makes little sense: if you have enough data to train a GAN, why not just use it to train the model? Or even better, if you have a trained GAN why don’t you just take the discriminator and use it for your problem?

But this idea becomes much more interesting in the domain adaptationsetting. Suppose you have a large source dataset and a small target dataset, and you need to use a model trained on the source dataset for the target, which might be completely unlabeled. Here adversarial domain adaptationtechniques train two networks, a generator and a discriminator, and use it to ensure that the network cannot distingush between the data distributions in the source and target datasets. This field was started in the ICML 2015 paper by Ganin and Lempitsky, where the discriminator is used to ensure that the features stay domain-invariant:

从GAN生成合成数据：特征空间中的增强和自适应

R. Volpi等人，无监督域适应的对抗特征增强
S. Sankaranarayanan等人，生成适应：对齐域使用生成性对抗网络

有一个非常有趣和有前途的领域，即使用GAN生成合成数据集来训练其他模型。从表面上看，没有多大意义：如果你有足够的数据训练GAN，为什么不用它来训练模型呢？或者甚至更好，如果你有一个训练有素的GAN，你为什么不采取鉴别器并将它用于你的问题？

但是这个想法在自适应域设置中变得更加有趣。假设你有一个大的源数据集和一个小的目标数据集，并且需要使用针对目标的源数据集训练的模型，该模型可能完全没有标记。这里，对抗域适应技术训练两个网络，一个生成器和一个鉴别器，并用它来确保网络不能在源数据集和目标数据集中的数据分布之间进行压缩。这个领域是在Ganin和Lempitsky的ICML2015论文中开始的，其中使用鉴别器来确保这些特征保持域不变：

And here is a schematic depiction of how this idea was slightly generalized in the Adversarial Discriminative Domain Adaptation paper from 2017:

In the CVPR 2018 paper by Volpi et al., researchers from Italy and Stanford made the adversarial training work not on the original images but rather in the feature space itself. The GAN operated on features extracted by a pretrained network, which makes it possible to achieve better domain invariance and ultimately improve the quality of domain adaptation. Here is the overall training procedure as it was adapted by Volpi et al.:

Another approach in the same vein was presented in CVPR 2018 by Sankaranarayanan et al., researchers from the University of Maryland. They use GANs to leverage unsupervised data to bring the source and target distributions closer to each other in the feature space. Basically, the idea is to use the discriminator to control that images generated from an embedding remain realistic images for the source distribution even when the embedding was taken from a sample from the target distribution. Here is how it works, and, again, the authors report improved domain adaptation results:

以下是2017年对抗性判别领域适应论文中这一想法如何略微概括的示意图：

在Volpi等人的CVPR 2018论文中，来自意大利和斯坦福的研究人员使对抗训练不是在原始图像上，而是在特征空间本身。 GAN对预训练网络提取的特征进行操作，这使得有可能实现更好的域不变性并最终提高域适应的质量。以下是Volpi等人改编的整体培训程序：

另一种方法是由Sankaranarayanan等人在马里兰大学的研究人员在2018年的CVPR中提出的。他们使用GAN来利用无监督数据，使源和目标分布在特征空间中彼此更接近。基本上，该想法是使用鉴别器来控制从嵌入产生的图像保持用于源分布的真实图像，即使嵌入是从目标分布的样本中获取的。以下是它的工作原理，作者再次报告了改进的域适应结果：

How Well Should You Label? A Study of Label Quality

A. Zlateski et al., On the Importance of Label Quality for Semantic Segmentation

One of the main selling points of synthetic data has always been the pixel-perfect quality of labeling that you can easily achieve with synthetic data. A synthetic scene always comes with perfect segmentation — but just how important is it? The authors of this work studied how fine (or how coarsely) you have to label your training set to get good segmentation quality from modern convolutional architectures… and, of course, what better tool to perform this study than synthetic scenes.

The authors used their specially developed Auto City dataset:

你应该如何标记？标签质量研究

A. Zlateski等，关于标签质量对语义分割的重要性

合成数据的主要卖点之一始终是像素完美的标签质量，你可以使用合成数据轻松实现。合成场景总是带有完美的分割 - 但它有多重要？这项工作的作者研究了如何精确（或多么粗略地）标记你的训练集以从现代卷积体系结构中获得良好的分割质量......当然，与合成场景相比，执行此研究的工具更好。

作者使用他们专门开发的Auto City数据集：

And in their experiments, the authors showed that the final segmentation quality, unsurprisingly, is indeed strongly correlated with the amount of time spent to produce the labels… but not so much with the quality of each individual label. This suggests that it is better to produce lots of coarse labels (say, with crowdsourcing) than to perform strict quality control for every label.

Soccer on Your Tabletop

K.Rematas et al., Soccer on Your Tabletop

Here at Neuromation, we love soccer (yes, the World Cup in Russia cost us a lot of work hours), and this research is just soooooooo cool. The authors present a system that can take a video stream of a soccer game and transform it… into a moving 3D reconstruction that can be projected onto your tabletop and viewed with an augmented reality device!

The system extracts bounding boxes of the players, analyzes the human figures with pose and depth estimation models and produces a quite accurate 3D scene reconstruction. Note how training a model specifically for the soccer domain really improves the results:

It additionally warms our hearts that they actually trained on synthetic data extracted from FIFA games! And the results are simply very cool all around:

在他们的实验中，作者表明，毫不奇怪，最终的分割质量确实与生产标签所花费的时间量密切相关，但与每个单独标签的质量无关。这表明，生产大量粗标签（例如，使用众包）比对每个标签执行严格的质量控制更好。

你的桌面游戏里的足球

K.Rematas等人，桌面游戏里的足球

在Neuromation，我们喜欢足球（是的，俄罗斯世界杯花了我们很多工作时间），这项研究真的太酷了。作者提出了一个系统，可以拍摄足球比赛的视频流并将其转换为移动的3D重建，可以投影到桌面上并使用增强现实设备进行观看。

系统提取玩家的边界框，使用姿势和深度估计模型分析人物图形并产生非常精确的3D场景重建。请注意，专门针对足球领域的模型培训如何真正改善结果：

它还激励我们的心，他们实际上训练从FIFA游戏中提取的合成数据。而且结果非常酷。

But wait, there is more…

Thank you for your attention! Next time we might take an even more detailed look at some of the CVPR 2018 papers regarding synthetic data and domain adaptation. Until then!

Sergey Nikolenko
Chief Research Officer, Neuromation

Aleksey Artamonov
Senior Researcher, Neuromation