【Few-Shot Segmentation论文阅读笔记】Prototype Mixture Models for Few-Shot Semantic Segmentation, ECCV, 2020

最新推荐文章于 2024-06-06 09:51:44 发布

RaymondLove~

最新推荐文章于 2024-06-06 09:51:44 发布

阅读量3.2k

点赞数 7

本文链接：https://blog.csdn.net/Emma_Love/article/details/112354989

版权

Few-Shot Segmentation 专栏收录该内容

4 篇文章 4 订阅

订阅专栏

Abstract

Target Probelm：Single Prototype ===> semantic ambiguity problem

为此，本文提出了Prototype Mixture Models (PMMs)

PMMs使用多个prototype用来分别对应不同的image regions, 进而提高其语义表征能力
使用EM算法来估算prototype, 使得PMMs能够富含丰富的channel-wised和spatial语义信息
PMMs既可以作为representations, 也可以作为classifier, 能够有效的激活query image中包含的前景信息，同时有效抑制背景区域。

代码： https://github.com/Yang-Bob/PMMs

1. Introduction

Inspiration

目前Few-shot segmentation方法主要是基于metric-learning framework, 并主要采用了prototype model。现有方法有以下两点不足：

prototype model主要是基于global average pooling(GAP) guided by ground-truth masks来计算prototype的。但是在GAP算法中，the spatial layout of objects is completely dropped ==> easily mix semantic from various parts
Single prototype 不足以包含充足的信息，表征能力有限

两点不足均会导致Semantic ambiguity problem.

Main Work:

为了解决这一问题，本文提出了PMMs。在训练过程中，我们使用EM算法来估计prototypes, 并分别为foreground和background进行建模，计算对应的multiple prototypes,以提高模型的判别能力。

优点：

一方面，PMM可以作为spatially squeezed representation, which match (P-Match) with query features to activate feature channels related to the object class.
另一方面，each prototype vector可以看作一个C维的线性分类器，将P-Conv与query features进行element-wised乘，可以生成相应的probability map.

综上，PMMs既包含了channel-wised语义信息，也包含了spatial语义信息，极大的提高了对query image进行语义分割的准确性。

2. Related Work

Few-Shot Segmentation + Key references (需要阅读的)

Ref 12-19, 35
- 12. Tokmakov, P., Wang, Y., Hebert, M.: Learning compositional representations for few-shot recognition. In: IEEE ICCV. (2019) 6372–6381
- 13. Nguyen, K., Todorovic, S.: Feature weighting and boosting for few-shot segmen- tation. In: IEEE ICCV. (2019) 622–63
- 14. Shaban, A., Bansal, S., Liu, Z., Essa, I., Boots, B.: One-shot learning for semantic segmentation. In: BMVC. (2017)
- 15. Zhang, X., Wei, Y., Yang, Y., Huang, T.: Sg-one: Similarity guidance network for one-shot semantic segmentation. CoRR abs/1810.09091 (2018)
- 16. Dong, N., Xing, E.P.: Few-shot semantic segmentation with prototype learning. In: BMVC. (2018) 79
- 17. Hao, F., He, F., Cheng, J., Wang, L., Cao, J., Tao, D.: Collect and select: Semantic alignment metric learning for few-shot learning. In: IEEE ICCV. (2019) 8460–8469
- 18. Wang, K., Liew, J., Zou, Y., Zhou, D., Feng, J.: Panet: Few-shot image semantic segmentation with prototype alignment. (2019) 622–631
- 19. Zhang, C., Lin, G., Liu, F., Yao, R., Shen, C.: Canet: Class-agnostic segmenta- tion networks with iterative refinement and attentive few-shot learning. In: IEEE CVPR. (2019) 5217–5226
- 35. Rakelly, K., Shelhamer, E., Darrell, T., Efros, A.A., Levine, S.: Conditional net- works for few-shot semantic segmentation. In: ICLR Workshop. (2018)
Ref 31, 4 (ASPP), 19, 32
- 31. Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: NeurIPS. (2017) 4077–4087
- 4. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Seman- tic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4) (2018) 834–84
- 32. Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6 (2005) 1345–1382

3. Proposed Method

本文提出的算法也遵循Metric-Learning Framework, 由两个branch组成，分别是support branch和query branch。两个branch的Feature Extraction network共享权重，分别从support image和query image中提取对应的feature map, S, Q ∈𝑅𝑊×𝐻×𝐶, 其中𝑊×𝐻代表resolution of feature map, 𝐶代表channels.

3.1 Prototype Mixture Models (PMMs)

对应代码：PMM.py的generate_prototype()函数

PMM models:

a probability mixture model which linearly combined probabilities from the distributions.

$p_k(s_i | \theta)$ $𝑝_𝑘(𝑠_𝑖|𝜃)$ 表示第k个base model, 即：一个a probability model based on kernel distance function, 可以被表示为：

核距离函数可以使用：Gaussian模型（RBF距离）或者VMF模型（cosine距离）。

由于本文算法是基于metric-learning framework的，因此采用vector distance function更合适，即VMF模型的cosine距离：

==> $p_k(s_i | \theta)$ $𝑝_𝑘(𝑠_𝑖|𝜃)$ 可以被表示为：

$\mu_k$ is the mean vector of the k-th model, 即我们所说的prototype。

该基模型的参数是：

Models Learning: 使用EM算法估计𝝁+和𝝁−作为prototypes vectors：

注意：本文中PMM model (公式（1）)中的mixture coefficient 𝑤𝑘 使用相同权重。

另外，可以看出，每个prototype vector都是the mean of a cluster samples ==> 即：每个prototype都代表一个region around an object part in the original image for the reception field effect.

3.2 Few-Shot Segmentation

P-Match (PMMs as Representation)

代码：PMM.py discriminative_model

每个foreground prototypes都包含着an object part对应的表征信息，将其组合在一起则可包含更多的表征信息，尽可能表示complete object extent。因此，foreground prototypes可以用来match and activate the query features Q.

P-Conv (PMMs as classifiers)

每个prototype vector都包含着discriminative information, 都可以看作一个classifier.

之后，再与𝑄′进行concatenate, 得到activated query map 𝑄′′:

至此，the semantic information across channels and discriminative information related to object parts are collected from the support feature to activate the query feature Q in a dense comparison manner.

最后，将activated query map 𝑄′′ 输入ASPP，再输入卷积层 ==> 生成predicted mask.

Segmentation Model Learning

分割模块is implemented in an E2E framework, see Algorithm 1:

3.3 Residual Prototype Mixture Models (RPMM)

Implement an ensemble model by stacking multiple PMMs ==> RPMM.

RPMM leverages the residual from the previous query branch to supervised the next query branch for fine-grained segmentation.

好处：不仅可以进一步提高模型性能，也定义了一种新的model ensemble strategy.

4. Experiments

对比方法：CANet[19] without iterative optimization

Training过程中：

使用4种Data Augmentation的方法：
- Normalization, horizontal flipping, random cropping, random resizing.
EM算法: iterates 10 rounds for each image
Cross-entropy loss +SGD
Momentum =0.9, learning rate = 0.0035, iteration times = 200,000

数据集:

PASCAL 5i [14]+[34]: 包含20类，分为4组
COCO-20i [13]： 80类，分为4组，每组20类

评估标准: IoU

4.1 Model Analysis

Fig 6 可视化了

Probability maps produced by positive prototypes of PMMs
Activation maps
Segmentation masks

对比可以发现：CANet使用单一的prototype来activate object tends to miss object parts/whole objects. 而PMMs能够alleviate seantic ambiguity problem.

Fig7对比分割结果和Baseline Method:

P-Match能够提高recall rate by segmenting more target pixels.
PMMs reduce False Positive pixels by introduce background prototypes.
RPMMs能够进一步提高分割结果

4.2 消融实验

Table 1

PMMs
- 说明PMMs的P-Match Models 优于 global average pooling, 说明PMMs能够产生更好的prototypes.
- Background & foreground prototypes能够提高performance.
RPMMs, 说明了the effectiveness of the residual ensemble strategy

Table 2: Number of prototypes

为什么k=5反而效果下降？

原因：PMMs是在一张support image上进行分析和生成prototypes, 而一张图片包含的信息和表征能力有限，k越大，则可能会导致过拟合的风险。

Table3：Kernel function

VMF kernel is better than Gaussian.

Inference Speed:

模型大小：OSLSM[14] (272.6M) >> RPMM (19.6M)>PMMs (19.5M) >略大于 CANet[19] (19M)

Inference速度：k=3时，PMM的速度是26FPS，RPMM是20FPS；而CANet是29FPS

4.3 Performance

Conclusion

PMMs 结合了 diverse image regions with multiple prototypes to solve the semantic ambiguity problem.

Training过程中结合了丰富的channel-wise和spatial semantic information from support images
Inference阶段，match PMMs with query images in a duplex manner 实现了准确的分割
在Pascal-5i和COCO 20i上均取得较好效果
并且可以扩展到Few-shot learning问题

Comments

文章的优点在于

既利用了multiple prototypes
又采用了background+foreground prototypes信息对query feature进行激活

（代码写的很好，值得学习）

RaymondLove~

关注

7
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
【Few-Shot Segmentation论文阅读笔记】Prototype Mixture Models for Few-Shot Semantic Segmentation, ECCV, 2020

AbstractTarget Probelm：Single Prototype ===> semantic ambiguity problem为此，本文提出了Prototype Mixture Models (PMMs)PMMs使用多个prototype用来分别对应不同的image regions, 进而提高其语义表征能力使用EM算法来估算prototype, 使得PMMs能够富含丰富的channel-wised和spatial语义信息 PMMs既可以作为representation
复制链接

扫一扫

专栏目录