2018顶会论文汇编

其它机器学习、深度学习算法的全面系统讲解可以阅读《机器学习-原理、算法与应用》,清华大学出版社,雷明著,由SIGAI公众号作者倾力打造。

CVPR 2018

大会时间:6月18日~22日

会议地点:盐湖城,UTAH

国际计算机视觉与模式识别会议(Conference on Computer Vision and Pattern Recognition,CVPR)是IEEE一年一度的学术性会议,会议的主要内容是计算机视觉与模式识别技术。CVPR是世界顶级的计算机视觉会议,近年来每年有约1000名参加者,收录的论文数量一般300篇左右。本会议每年都会有固定的研讨主题,而每一年都会有公司赞助该会议并获得在会场展示的机会。

 

最佳论文

《Taskonomy:Disentangling Task Transfer Learning》

Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

 

【Abstract】Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and pro- vides a principled way for identifying redundancies across tasks, in order to, for instance, seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.

 

We propose a fully computational approach for modeling the structure of the space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty-six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g.   emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled data points needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

                                             

【论文摘要】视觉任务之间是否有关联,或者它们是否无关?例如,法线(Surface Normals)可以简化估算图像的深度(Depth)吗?直觉回答了这些问题,暗示了视觉任务中存在结构。了解这种结构具有显著的价值;它是迁移学习的基本概念,并提供了一种原则性的方法来识别任务之间的冗余,例如,无缝地重用相关任务之间的监督或在一个系统中解决许多任务而不会增加复杂性。我们提出了一种完全计算的方法来建模视觉任务的空间结构。这是通过在潜在空间中的26个2D,2.5D,3D和语义任务的字典中查找(一阶和更高阶)迁移学习依赖性来完成的。该产品是用于任务迁移学习的计算分类地图。我们研究了这种结构的后果,例如非平凡的关系,并利用它们来减少对标签数据的需求。例如,我们表明,解决一组10个任务所需的标签数据点总数可以减少大约2/3(与独立训练相比),同时保持性能几乎相同。

 

最佳论文提名

《Deep Learning of Graph Matching》

Andrei Zanfir, Cristian Sminchisescu

【Abstract】The problem of graph matching under node and pair- wise constraints is fundamental in areas as diverse as combinatorial optimization, machine learning or computer vision, where representing both the relations between nodes and their neighborhood structure is essential. We present an end-to-end model that makes it possible to learn all parameters of the graph matching process, including the unary and pairwise node neighborhoods, represented as deep feature extraction hierarchies. The challenge is in the formulation of the different matrix computation layers of the model in a way that enables the consistent, efficient propagation of gradients in the complete pipeline from the loss function, through the combinatorial optimization layer solving the matching problem, and the feature extraction hierarchy. Our computer vision experiments and ablation studies on challenging datasets like PASCAL VOC keypoints, Sintel and CUB show that matching models refined end-to-end are superior to counterparts based on feature hierarchies trained for other problems.

 

【论文摘要】在节点和配对约束下的图匹配问题是组合优化、机器学习或计算机视觉等许多领域中的基本问题,其中表示节点之间的关系及其邻域结构是至关重要的。本文提出了一个端到端的模型,使其能够学习图形匹配过程的所有参数,包括表示为深度特征提取层次的一元节点邻域和二元节点邻域。挑战在于通过求解匹配问题的组合优化层和特征提取层次,以能够从损失函数在整个管道(pipeline)中实现梯度的一致。坐着在PASCAL VOC keypoints、Sintel和CUB等具有挑战性的数据集上的计算机视觉实验和消融研究表明,端到端精确匹配模型优于基于针对其他问题训练出的特征层次结构的模型。

 

 

《SPLATNet: Sparse Lattice Networks for Point Cloud Processing》

Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz

【Abstract】We present a network architecture for processing point clouds that directly operates on a collection of points rep- resented as a sparse set of samples in a high-dimensional lattice. Na ̈ıvely applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.

 

【论文摘要】本文提出了用于处理点云的网络结构,该点云直接在高维网格中表示为稀疏样本集的点集合上操作。随着晶格尺寸的增加,在这个晶格上应用卷积在存储和计算成本方面都表现得非常糟糕。相反,我们的网络使用稀疏的双边卷积层作为基本结构。这些层通过使用索引结构来保持效率,从而仅对格子的占用部分应用卷积,并且允许格子结构的灵活规范,从而实现分层和空间感知的特征学习以及联合2D-3D推理。基于点和基于图像的表示都可以很容易地结合到具有此类层的网络中,并且所得到的模型可以用端到端的方式训练。本文在3D分割任务上的结果显示该方法优于现有最优的技术。

 

 

《CodeSLAM-learning a Compact, Optimisable Representation for Dense Visual SLAM》

Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, Andrew J. Davison

【Abstract】The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only.

 

We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM.

 

【论文摘要】实时三维感知系统中的几何表示仍然是一个关键的研究课题。稠密映射可以捕获完整的表面形状,并且可以用语义标签进行扩充,但是它们的高维数使得它们存储和处理的计算成本很高,并且不适合用于严格的概率推断。稀疏的基于特征的表示避免了这些问题,但是只捕获部分场景信息,并且主要用于定位。本文提出一种新的紧凑密集的场景几何表示,它以单个图像的强度数据为条件,并且由含少量参数的编码生成。这个方法的灵感来自于从图像学习的深度和自动编码器两方面的工作。该方法适合在基于关键帧的单目密集SLAM系统中使用:虽然每个带有编码的关键帧可以生成一个深度图,但是可以与姿态变量以及重叠关键帧的编码一起有效地优化编码,以实现全局一致性。对图像上的深度图进行条件化允许编码仅表示不能从图像中直接预测的局部几何体。本文还解释如何学习编码表示,并演示其在单目SLAM中的优势。

 

 

《Efficient Optimization for Rank-based Loss Functions》

Pritish Mohapatra, Michal Rolínek C.V. Jawahar, Vladimir Kolmogorov, M. Pawan Kumar

【Abstract】The accuracy of information retrieval systems is often measured using complex loss functions such as the aver- age precision (AP) or the normalized discounted cumulative gain (NDCG). Given a set of positive and negative samples, the parameters of a retrieval system can be estimated by minimizing these loss functions. However, the non-differentiability and non-decomposability of these loss functions does not allow for simple gradient based optimization algorithms. This issue is generally circumvented by either optimizing a structured hinge-loss upper bound to the loss function or by using asymptotic methods like the direct-loss minimization framework. Yet, the high computational complexity of loss-augmented inference, which is necessary for both the frameworks, prohibits its use in large training data sets. To alleviate this deficiency, we present a novel quicksort flavored algorithm for a large class of non-decomposable loss functions. We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions. Furthermore, we prove that no comparison based algorithm can improve upon the computational complexity of our approach asymptotically. We demonstrate the effectiveness of our approach in the context of optimizing the structured hinge loss upper bound of AP and NDCG loss for learning models for a variety of vision tasks. We show that our approach provides significantly better results than simpler decomposable loss functions, while requiring a comparable training time.

 

【论文摘要】信息检索系统的精度通常使用诸如平均精度(Average Precision,AP)或归一化折扣累积增益(Normalized Discounted Cumulative Gain,NDCG)的复杂损失函数来测量。给定一组正样本和负样本,可以通过最小化这些损失函数来估计检索系统的参数。然而,这些损失函数的不可微性和不可分解性使得我们无法使用简单的基于梯度的优化算法。这个问题通常通过优化损失函数的结构铰链损失(hinge-loss)上界或者使用像直接损失最小化框架(direct-loss minimization framework)这样的渐进方法来避免。然而,损失增强推理(loss-augmented inference)的高计算复杂度限制了它在大型训练数据集中的使用。为了克服这一不足,我们提出了一种针对大规模不可分解损失函数的快速排序算法。我们提供了符合这一算法的损失函数的特征描述,它可以处理包括AP和NDCC系列的损失函数。此外,我们证明了任何基于比较的算法都不能提高我们方法的渐近计算复杂度。在优化各种视觉任务学习模型的结构铰链损失上限的AP和NDCG损失,我们证明了该方法的有效性。我们证明该方法比简单的可分解损失函数提供更好的结果,同时只需要相当的训练时间。

 

 

ECCV 2018

会议时间:9月8日~14日

会议地点:慕尼黑,德国

欧洲计算机视觉国际会议(European Conference on Computer Vision,ECCV)两年一次,是计算机视觉三大会议(另外两个是ICCV和CVPR)之一。每次会议在全球范围录用论文300篇左右,主要的录用论文都来自美国、欧洲等顶尖实验室及研究所,中国大陆的论文数量一般在10-20篇之间。ECCV2010的论文录取率为27%。

 

本届大会收到论文投稿 2439 篇,接收 776 篇(31.8%),59 篇 oral 论文,717 篇 poster 论文。在活动方面,ECCV 2018 共有 43 场 Workshop 和 11 场 Tutorial。

 

最佳论文Best Paper Award(一篇)

《Implicit 3D Orientation Learning for 6D Object Detection from RGB Images》

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel

【Abstract】We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization.

This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOD datasets show that our method outperforms similar model- based approaches and competes with state-of-the art approaches that require real pose-annotated images.

 

【论文摘要】本文提出了一种基于RGB图像的实时物体检测与6维姿态估计的方法。其中,新型的3维目标朝向估计方法是基于降噪自编码器(Denoising Autoencoder)的一个变种,它使用域随机化(Domain Randomization)方法在3维模型的模拟视图上进行训练。这种我们称之为“增强自编码器”(Augmented Autoencoder,AAE)的方法,比现有方法具有很多优点:它不需要真实的姿势标注的训练数据,可泛化到多种测试传感器,且能够内部处理目标和视图的对称性。该方法不学习从输入图像到目标姿势的明确映射,相反,它提供了样本在隐空间(latent space)中定义的目标朝向的隐式表达。在 T-LESS 和 LineMOD 数据集上的测试表明,我们的方法优于类似的基于模型的方法,可以媲美需要真实姿态标注图像的当前最优的方法。

 

最佳论文提名

Best Paper Award, Honorable Mention(两篇)

《Group Normalization》

【Abstract】Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN- based counterparts for object detection and segmentation in COCO,1 and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

 

【论文摘要】批量归一化(Batch Normalization,BN)是深度学习发展中的一项里程碑式技术,可以让各种网络进行训练。但是,批量维度进行归一化会带来一些问题——批量统计估算不准确导致批量变小时,BN的误差会迅速增加。因此,BN在训练大型网络或者将特征转移到计算机视觉任务(包括检测、分割和视频)的应用受到了限制,因为在这类问题中,内存消耗限制了只能使用小批量的BN。在这篇论文中,作者提出了群组归一化(Group Normalization,GN)的方法作为 BN 的替代方法。GN首先将通道(channel)分为许多组(group),对每一组计算均值和方差,以进行归一化。GN的计算与批大小(batch size)无关,并且它的精度在不同批大小的情况中都很稳定。在ImageNet上训练的ResNet-50上,当批量大小为2时,GN的误差比BN低10.6%。当使用经典的批量大小时,GN与BN相当,但优于其他归一化变体。此外,GN 可以很自然地从预训练阶段迁移到微调阶段。在COCO的目标检测和分割任务以及Kinetics的视频分类任务中,GN的性能优于或与BN变体相当,这表明GN可以在一系列不同任务中有效替代BN;在现代的深度学习库中,GN通过若干行代码即可轻松实现。

 

《GANimation: Anatomically-aware Facial Animation from a Single Image》

【Abstract】Recent advances in Generative Adversarial Networks(GANs) have shown impressive results for task of facial expression synthesis. The most successful architecture is StarGAN [4], that conditions GANs’ generation process with images of a specific domain, namely a set of images of persons sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combine several of them. Additionally, we propose a fully unsupervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit attention mechanisms that make our network robust to changing backgrounds and lighting conditions. Extensive evaluation show that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild.

 

【论文摘要】生成式对抗网络(Generative Adversarial Networks, GANs)近期在面部表情合成任务中取得了惊人表现,其中最成功的架构是StarGAN,它把GANs的图像生成过程限定在了特定情形中,即一组不同的人做出同一个表情的图像。这种方法虽然有效,但只能生成若干离散的表情,具体生成哪一种取决于训练数据内容。为了处理这种限制问题,本文提出了一种新的GAN条件限定方法,该方法基于动作单元(Action Units,AU)标注,而在连续的流形中,动作单元标注可以描述定义人类表情的解剖学面部动作。这种方法可以使我们控制每个AU的激活程度,并将之组合。除此以外,本文还提出一种完全无监督的方法用来训练模型,只需要标注了激活的AU的图像,并通过应用注意力机制(attention mechanism)就可使网络对背景和光照条件的改变保持鲁棒性。大量评估表明该方法比其他的条件生成方法有明显更好的表现,不仅表现在有能力根据解剖学上可用的肌肉动作生成多样的表情,而且也能更好地处理来自户外的图像。

 

 

IJCAI-ECAI-2018

会议日期:7月13日~19日

会议地点:斯德哥尔摩,瑞典

国际人工智能联合会议(International Joint Conference on Artificial Intelligence, IJCAI)是人工智能领域中最主要的学术会议之一,原为单数年召开,自2015年起改为每年召开。今年来华人在IJCAI的参与度不断增加,尤其是南京大学的周志华教授将担任 IJCAI-21 的程序主席,成为 IJCAI 史上第一位华人大会程序主席。

欧洲人工智能会议(European Conference on Artificial Intelligence,ECAI)是在欧洲举行的主要人工智能和机器学习会议,始于1974年,由欧洲人工智能协调委员会主办。ECAI通常与IJCAI和AAAI并称AI领域的三大顶会。

今年IJCAI和ECAI两个会议将与7月13日~19日再瑞典首都斯德哥尔摩联合举办。此外,今年IJCAI并未颁发最佳论文、最佳学生论文等奖项,而是一连放出了7篇杰出论文。来自北京大学、武汉大学、清华大学、北京理工大学的研究榜上有名。

 

杰出论文:

《SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks》

Ke Wang, Xiaojun Wan

【Abstract】Generating texts of different sentiment labels is get- ting more and more attention in the area of natural language generation. Recently, Generative Adversarial Net (GAN) has shown promising results in text generation. However, the texts generated by GAN usually suffer from the problems of poor quality, lack of diversity and mode collapse. In this paper, we propose a novel framework SentiGAN, which has multiple generators and one multi-class discriminator, to address the above problems. In our framework, multiple generators are trained simultaneously, aiming at generating texts of different sentiment labels without supervision. We pro- pose a penalty based objective in the generators to force each of them to generate diversified examples of a specific sentiment label. Moreover, the use of multiple generators and one multi-class discriminator can make each generator focus on generating its own examples of a specific sentiment label accurately. Experimental results on four datasets demonstrate that our model consistently outperforms several state-of-the-art text generation methods in the sentiment accuracy and quality of generated texts.

 

【论文摘要】在自然语言生成领域,不同情感文本的生成受到越来越广泛的关注。近年来,生成对抗网(GAN)在文本生成中取得了成功的应用。然而,GAN 所产生的文本通常存在质量差、缺乏多样性和模式崩溃的问题。在本文中,我们提出了一个新的框架——SentiGAN,包含多个生成器和一个多类别判别器,以解决上述问题。在我们的框架中,多个生成器同时训练,旨在无监督环境下产生不同情感标签的文本。我们提出了一个基于目标的惩罚函数,使每个生成器都能在特定情感标签下生成具有多样性的样本。此外,使用多个生成器和一个多类判别器可以使每个生成器专注于准确地生成自己的特定情感标签的例子。在四个数据集上的实验结果表明,我们的模型在情感准确度和生成文本的质量方面始终优于几种最先进的文本生成方法。

 

 

《Reasoning about Consensus when Opinions Diffuse through Majority Dynamics》

Vincenzo Auletta,Diodato Ferraioli,Gianluigi Greco

【Abstract】Opinion diffusion is studied on social graphs where agents hold binary opinions and where social pressure leads them to conform to the opinion manifested by the majority of their neighbors. Within this setting, questions related to whether a minority/majority can spread the opinion it supports to all the other agents are considered. It is shown that, no matter of the underlying graph, there is always a group formed by a half of the agents that can annihilate the opposite opinion. Instead, the influence power of minorities depends on certain features of the given graph, which are NP-hard to be identified. Deciding whether the two opinions can coexist in some stable configuration is NP-hard, too.

 

【论文摘要】在社会图中,agent持有二元意见,并且社会压力导致他们遵从大多数邻居所表示的意见。在这种背景下,考虑有关少数/多数是否能够将其支持的意见传播到所有其他agent的问题。研究结果表明,无论底层图如何,总是存在一个由半数agent组成的群体可以消除相反的意见。相反,少数群体的影响力取决于给定图的某些特征,这些特征的识别是NP难问题。决定这两种观点是否可以在某种稳定的配置中共存也是NP难的。

 

 

《R-SVM+: Robust Learning with Privileged Information》

Xue Li , Bo Du , Chang Xu , Yipe

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值