[CVPR 2019] Semantic Projection Network for Zero- and Few-Label Semantic Segmentation

Zero- and Few-Label Semantic Segmentation

在这里插入图片描述Figure 1: We propose (generalized) zero- and few-label semantic segmentation tasks, i.e. segmenting classes whose labels are not seen by the model during training or the model has a few labeled samples of those classes. To tackle these tasks, we propose a model that transfers knowledge from seen classes to unseen classes using side information, e.g. semantic word embedding trained on free text corpus.

思路

在分割网络中嵌入类别语义信息,使用辅助信息(例如基于文本语料库训练得到的语义词嵌入)将已见类的知识迁移到未见类别。

在这里插入图片描述

Figure 2: Our zero-label and few-label semantic segmentation model, i.e. SPNet, consists of two steps: visual semantic embedding and semantic projection. Zero-label semantic segmentation is drawn as an instance of our model. Replacing different components of SPNet, four tasks are addressed (Solid/dashed lines show the training/test procedures respectively).

两步:
一、视觉语义映射;
二、语义映射

域漂移校正
The extreme case of the imbalanced data problem occurs when there is no labeled training images of unseen classes, and this results in predictions being biased to seen classes. To fix this issue, we follow [8] and calibrate the prediction by reducing the scores of seen classes, which leads to:

arg ⁡ max ⁡ u ∈ S ∪ U p ( y ^ i j = u ∣ x ; [ W s ; W u ] ) − γ I [ u ∈ S ] (5) \arg\max_{u ∈ \mathcal {S∪U}} p({\hat y}_{ij} = u | x; [W_s ; W_u]) − γI[u ∈ \mathcal{S}] \tag 5 arguSUmaxp(y^ij=ux;[Ws;Wu])γI[uS](5)

where I = 1 I= 1 I=1 if u u u is a seen class and 0 otherwise, γ ∈ [ 0 , 1 ] γ∈[0,1] γ[0,1] is the calibration factor tuned on a held-out validation set.

在分类任务中,一整图片对应一个类别,语义信息有对应的视觉区域。那么,在分割任务中,每个类别的所有像素似乎无差异,带有语义信息的视觉区域不明显。

实验

词向量的效果

在这里插入图片描述

网络结构的效果

在这里插入图片描述

对象大小的效果

在这里插入图片描述
Figure 3: mIoU of unseen classes on COCO-Stuff ordered wrt average object size (left to right).

GZSL结果

在这里插入图片描述
Figure 4: GZLSS results on COCO-Stuff and PASCALVOC. We report mean IoU of unseen classes, seen classes and their harmonic mean (perception model is based on ResNet101 and the semantic embedding is ft + w2v). SPNet-C represents SPNet with calibration.

Generalized Zero-Shot Image Classification

在这里插入图片描述

Few-Label Semantic Segmentation Task

在这里插入图片描述

定性结果

在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
作者:Xiaohang Zhan,Ziwei Liu,Ping Luo,Xiaoou Tang,Chen Change Loy 摘要:Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g. ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g. image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a "mix-and-match" (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the "mix" stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A "match" stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
CVPR 2019中发表了一篇题为“迁移学习:无监督领域自适应的对比适应网络(Contrastive Adaptation Network for Unsupervised Domain Adaptation)”的论文。这篇论文主要介绍了一种用于无监督领域自适应的对比适应网络。 迁移学习是指将从一个源领域学到的知识应用到一个目标领域的任务中。在无监督领域自适应中,源领域和目标领域的标签信息是不可用的,因此算法需要通过从源领域到目标领域的无监督样本对齐来实现知识迁移。 该论文提出的对比适应网络(Contrastive Adaptation Network,CAN)的目标是通过优化源领域上的特征表示,使其能够适应目标领域的特征分布。CAN的关键思想是通过对比损失来对源领域和目标领域的特征进行匹配。 具体地说,CAN首先通过一个共享的特征提取器来提取源领域和目标领域的特征表示。然后,通过对比损失函数来测量源领域和目标领域的特征之间的差异。对比损失函数的目标是使源领域和目标领域的特征在特定的度量空间中更加接近。最后,CAN通过最小化对比损失来优化特征提取器,以使源领域的特征能够适应目标领域。 该论文还对CAN进行了实验验证。实验结果表明,与其他无监督领域自适应方法相比,CAN在多个图像分类任务上取得了更好的性能,证明了其有效性和优越性。 综上所述,这篇CVPR 2019论文介绍了一种用于无监督领域自适应的对比适应网络,通过对源领域和目标领域的特征进行对比学习,使得源领域的特征能够适应目标领域。该方法在实验中展现了较好的性能,有望在无监督领域自适应任务中发挥重要作用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值