【论文笔记】Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual

针对视觉定位的语义分割网络, ICCV 2019

论文工作的动机:

using more segmentation labels to create more discriminative, yet still robust, representations for semantic visual localization

本文主要针对Long-term的视觉定位任务,提出了一个精细化分割网络,可以对场景进行精细化fine-grained的分割,得到相比于现有语义分割网络的分割结果更加丰富的语义标签。
同时,网络对于四季交替等场景中的外观变化足够鲁棒,不同季节对同一场景的语义标签输出是一致的,实验证明该网络能够提升视觉定位任务的性能。
此外,为了减少语义标注的工作量,作者使用了无监督训练的方式。
在这里插入图片描述

如下图,使用k-means得到含有丰富分割标签的训练数据,使用带有2D-2Dcorrespondence关系的数据集实现对同一场景能够输出一致的分割标签。
在这里插入图片描述

对于视觉定位任务,实际上语义信息不是必须的,只要有一致、稳定的分割标签即可。本文的目的只是输出更加精细的分割标签用于视觉定位,并非得到语义。根据论文的训练模式,训练数据的语义标签信息是根据k-means得到的,而k-means没有提取语义的能力。论文探究了网络输出的类别ID(cluster indices,横轴)与数据集的标准语义标签(横轴)之间的相关性:
在这里插入图片描述

可知:

  1. 当网络只分割20类时,第19类与标准分割标签的互信息都很高,说明该类丢失了语义信息。
  2. vegetation类别与很多网络输出的大部分cluster的互信息都很高,原因在于CMU dataset中植被较多,分割为多种cluster能够提升辨别性,提升视觉定位的能力。
  3. 有很多预测的cluster index与对应的语义标签并不相关,说明网络权重从初始值(从语义分割数据集上的权重初始化而来)产生了较大变化。

实验验证

使用的定位方法:SSMC(simple semantic match consistency): 基于特征点进行2D-3D匹配,其中3D坐标根据SFM重建得到。然后根据分割结果去除标签不一致的2D-3D匹配对,然后用P3PRansac求解位姿。
在这里插入图片描述

由上图可知:

  • 使用100个cluster数目精度最佳(cluster过少或者过多都会影响定位精度)。
  • 在语义分割任务上对网络进行预训练,相比于在分类任务上,效果更好。
  • 使用fine-grained 分割的结果相比于使用标准语义分割的结果更好。
  • 使用了2D-2D correspondences信息构建的Loss项对定位效果由显著提升
  • 在其他数据集上的泛化性不佳,原因在于2D-2D correspondences信息对数据集是有特异性的,因此这个结果并不意外。
  • Repetition of clustering: Following the method developed by Caron et al. [15] the clustering is repeated after a set number of training iterations. Interestingly, we noticed that not resetting the network actually gives slightly better performance, see entry marked with * in Table 2. We attribute this to the network, pretrained for semantic segmentation, retains semantic information more easily without resetting.Further investigation of this is left as future work

与其他SOTA方法的对比:
在这里插入图片描述

  • SSMC(simple semantic match consistency)
  • GSMC:加入了几何验证的SSMC,即在ransac时,对一个假设位姿评分的方法是基于2D-3D的label一致性计算的。
  • PFSL:基于位姿先验的粒子滤波位姿估计方法。
    使用以上三种定位方法,结合FGSNS的分割结果进行定位,与现有的SOTA方法进行对比.
结论:

改进后的定位方法的性能与SOTA基本接近(improves localization performance closing the performance gap to the current state-of-the-art) 证明了论文所提出的精细化分割思想在定位任务上的有效性,其对于算法在四季变换、视角变换下的定位鲁棒性有明显的提升。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
作者:Xiaohang Zhan,Ziwei Liu,Ping Luo,Xiaoou Tang,Chen Change Loy 摘要:Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g. ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g. image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a "mix-and-match" (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the "mix" stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A "match" stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值