【论文笔记】Class Re-activation Maps for Weakly-Supervised semantic Segmentation

最新推荐文章于 2024-03-19 23:12:07 发布

磨磨蹭蹭的笨鸟

最新推荐文章于 2024-03-19 23:12:07 发布

阅读量492

点赞数 1

文章标签：论文阅读人工智能深度学习计算机视觉 Powered by 金山文档

本文链接：https://blog.csdn.net/weixin_46242230/article/details/128989571

版权

研究领域：weakly-supervised semantic segmentation(WSSS)

研究对象：BCE loss 和 SCE loss的优劣，如何将SCE loss应用到WSSS中

文章目录：

摘要

模型

可以学到的知识

摘要(Abstract)

摘要里面主要提出了三点：

导致pseudo mask 不准确的主要原因是binary cross-entropy loss（BCE）。因为BCE的 sum-over-class pooling 特性，pseudo mask 可能出现边际不清晰的特点

本文提出的简单有效的改进方法是：使用一个新的loss（softmax cross-entropy loss i.e SCE) 对已经用BCE训练收敛的CAM进行重新激活，也就是ReCAM。使用SCE loss的为了使用它的对比学习的属性。

ReCAM不仅可以生成更高质量的mask，还可以作为一个插件，非常方便。

Extracting class activation maps (CAM) is arguably the most standard step of generating pseudo masks for weakly-supervised semantic segmentation (WSSS). Yet, we find that the crux of the unsatisfactory pseudo masks is the binary cross-entropy loss (BCE) widely used in CAM. Specifically, due to the sum-over-class pooling nature of BCE, each pixel in CAM may be responsive to multiple classes co-occurring in the same receptive field. As a result, given a class, its hot CAM pixels may wrongly invade the area belonging to other classes, or the non-hot ones may be actually a part of the class. To this end, we introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE), dubbed ReCAM. Given an image, we use CAM to extract the feature pixels of each single class, and use them with the class label to learn another fully-connected layer (after the backbone) with SCE. Once converged, we extract ReCAM in the same way as in CAM. Thanks to the contrastive nature of SCE, the pixel response is disentangled into different classes and hence less mask ambiguity is expected. The evaluation on both PASCAL VOC and MS COCO shows that ReCAM not only generates high-quality masks, but also supports plug-and-play in any CAM variant with little overhead.

预备知识（Preliminary）

CAM

CAM的一般步骤是：

用global averange pooling 提取特征

用一般的 prediction 层去做分类（常用的：resnet网络的全连接层）

用BCE（binary cross-entropy loss）训练网络

$\text{[math]}$

其中z[k] 是在第k类上模型的预测值，sigma是激活函数，y[k]是{0，1}标签中的一个值，0表示图片上没有k这个类，1则是表示图片上有第k个类。

这个loss也可以写成

$\text{[math]}$

$\text{[math]}$ 如果图片上没有k这个类

$\text{[math]}$ 如果图片上有k这个类

一旦模型收敛了，我们可以通过公式求出图片在k类上的CAM：

$\text{[math]}$

其中W是第k类分类层的参数，f(x)送入GAP网络之前提取出来的特征，然后对Ak进行下列操作就得到了CAM（可以理解为对CAM做了一个激活和归一化）

$\text{[math]}$

伪掩码（Pseudo Mask)

因为CAM是包含所有的类的mask，我们需要每个类的单独的mask作为pseudo mask。有四种方法可以使用CAM生成pseudo mask：

把CAM 阈为0-1的掩码（0表示不在类中，1表示在类中）

用IRN方法将CAM生成pseudo mask

反复地用分类网络生成每个类的pseuso mask

结合方法2和方法3

语义分割

接下来我们用生成的pseudo mask作为label对语义分割网络做全监督训练，用来训练的loss公式如下所示：

$\text{[math]}$

这个式子乍看很复杂，但是最外层的两个积分就是在全图的像素点上进行积分（图片的维度是H×W的），第三个积分是在所有类（包括背景）一共K+1个类上进行积分。Loss的核心部分就是最后一段：

$\text{[math]}$

其中 y[k]代表第k类的label，z[k]代表第k类的预测值。

动机（Motivation）

最终的分割结果主要还是和分类模型相关，在求一个图片的CAM的时候通常会出现如下两个问题：

False positive pixel：明明是B类中的像素点被错误的分类到了A类，同时B类并不是背景。

False negative pixel：属于A类的像素点被分类为背景点。

同时发现，如果使用了BCE 损失函数，这个问题会尤为明显。

在BCE 损失函数中，函数sigma也就是sigmoid函数的式子是：

$\text{[math]}$

This loss represents the penalty strength for misclassification corresponding to x. The BCE loss is thus not class mutually exclusive—the misclassification of one class does not penalize the activation on others. This is indispensable for training multi-label classifiers. However, when extracting CAM via these classifiers, we see the drawbacks: non-exclusive activation across different classes (resulting in false positive pixels in CAM); and the activation on total classes is limited (resulting in false negative pixels) since partial activation is shared

文中的这段指出BCE 损失函数没办法体现类与类之间的互斥性。没办法体现出一个类错误的分类之后对其他类的惩罚。而当我们使用这样的分类器提取CAM的时候，不同类之间的不独立性会导致CAM中的false positive 像素点；在整个类上的激活被限制住了会导致CAM中的false negative 像素点。