- 三大组成部分,feature map2048、prototype每个channel的重要程度、GAP
- feature map * prototype就是CAM的结果,CAM的结果中有些数值特别大,300多都有
- 默认feature map能定位出次判别区域,只是prototype选不上,设计多组prototype
- 默认feature map就展示不出次判别区域,把强判别区域抹掉/抑制,在原图上抹去/feature map上抑制
- 对feature map进行修正,例如加一个前景分支和feature map相乘
但凡可视化出来的都是值在0-1之间,你想想值是389对应的有颜色吗?没有。特征图和CAM都归一化了
feature map中没有,即便w再大再小都没用。所以让feature map中有次判别区域,才有可能出现次判别,f [f>0.7]=0.7
self.classifier = nn.Conv2d(2048, (self.num_cls)*8, 1, bias=False),多个prototype
self.classifier = nn.Conv2d(2048, (self.num_cls)*8, 1, bias=False)
def forward(self, x, valid_mask):
N, C, H, W = x.size()
# forward
x0 = self.stage0(x)
x1 = self.stage1(x0)
x2 = self.stage2(x1).detach()
x3 = self.stage3(x2)
x4 = self.stage4(x3)
cam = self.classifier(x4)
#########newly initial cam
batch_size, cc, hh, ww = cam.size()
cam_multimaps = cam.view(batch_size, 20, 8, hh, ww)
cam = torch.sum(cam_multimaps, 2) / 8
score = F.adaptive_avg_pool2d(cam, 1)
高亮一点点红,然后送入GAP的值都很大,最大的都有351,经过实验验证过
高亮一点点红,然后送入GAP的值都很大,最大的都有351,经过实验验证过
prototype表示每个feature map channel的重要程度
self.classifier = nn.Conv2d(2048, 20, 1, bias=False)
def forward(self, x, valid_mask):
N, C, H, W = x.size()
# forward
x0 = self.stage0(x)
x1 = self.stage1(x0)
x2 = self.stage2(x1).detach()
x3 = self.stage3(x2)
x4 = self.stage4(x3)
20_channel_cam = self.classifier(x4)
#### 此处不加relu,不加的原因是一部分负数能抵消正数,有扩大判别区域的功效
score = F.adaptive_avg_pool2d(20_channel_cam, 1)
lossCLASS = F.multilabel_soft_margin_loss(score, label)
语义分割结果:
分类 (PASCAL VOC 2012 train sets)
分类 →crf (PASCAL VOC 2012 train sets)
分类 →crf →IRN →deeplab (PASCAL VOC 2012 val and test sets)
- PASCAL VOC2012 train set————CAM大约是resnet50&48.6%,Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation resnet50&58.6%
- resnet38比resnet50的结果要好点
- CRF后处理是很常见,能提升大约5%。CRF代码在ir net中的cam_to_ir_label.py中有写
CAM可视化就是那种,但需要最终的分割结果,背景区域参与进来
背景区域的处理①ir net中直接认为是0.15,结果最好;same王玉德文章中和AdvCAM用遍历的方法0.15,0.16,0.17,0.18,0.19……
AdvCAM/run_sample.py at main · jbeomlee93/AdvCAM · GitHub,代码在106行
②背景区域的处理:使用saliency作为辅助
- Download saliency maps used for background cues.
- GitHub - qjadud1994/DRS: Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation
这篇文章中是saliency
for idx, dat in tqdm(enumerate(val_loader)):
img, label, sal_map, gt_map, _ = dat
logit, cam = model(img, label)
""" obtain CAMs """
cam = cam.cpu().detach().numpy()
gt_map = gt_map.detach().numpy()
sal_map = sal_map.detach().numpy()
""" segmentation label generation """
cam[cam < 0.2] = 0 # object cue
bg = np.zeros((B, 1, H, W), dtype=np.float32)
pred_map = np.concatenate([bg, cam], axis=1) # [B, 21, H, W]
pred_map[:, 0, :, :] = (1. - sal_map) # background cue
pred_map = pred_map.argmax(1)
mIOU.add_batch(pred_map, gt_map)
③背景区域的处理:
论文Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation使用
Unlocking the Potential of Ordinary Classifier: Class-specific Adversarial
Erasing Framework for Weakly Supervised Semantic Segmentation
===============================================================
前20channel前景的处理,各种文章都有GT加进去,一部分channel全变成0
出自王玉德代码
cam = F.relu(cam)
max_v = torch.max(cam.view(N,C,-1),dim=-1)[0].view(N,C,1,1)
min_v = torch.min(cam.view(N,C,-1),dim=-1)[0].view(N,C,1,1)
cam = F.relu(cam -min_v-e)/(max_v-min_v+e)
cam = cam * label
cam[:,0,:,:] = 1-torch.max(cam[:,1:,:,:],dim=1)[0]
cam[:,0,:,:] = 1-torch.max(cam[:,1:,:,:],dim=1)[0]
cam_max = torch.max(cam[:,1:,:,:], dim=1, keepdim=True)[0]
cam[:,1:,:,:][cam[:,1:,:,:] != cam_max] = 0
另外还加入了
cam[cam < 0.2] = 0
===============================================================
方法1,直接从20*128*128大小的feature map中得到CAM,和周博磊的不一样
代码实现分类的时候,feature map从2048通道变成20,不加relu,直接GAP,之后得到logit(20,1),直接送进F.multilabel_soft_margin_loss,代码如下。
因为F.multilabel_soft_margin_loss自带了sigmoid函数——变成0~1之间
loss = F.multilabel_soft_margin_loss(logit, label)
查找的关于“ F.multilabel_soft_margin_loss”的资料
方法2:
ir net和same程序都是三步走:
1.训练分类网络
2.make_cam
3.eval_cam(看分割的miou)
CAM的特性——过激活&欠激活:
针对小物体,覆盖全了,但会过度激活一些背景区域,过度激活会导致它们偏离对象边缘false-detection
针对大物体,就只能覆盖显著性区域 ,non-detection,就是false negatives (FNs)