论文阅读笔记:Learning Deep Features for Discriminative Localization

Introduction

Task

基于弱监督学习的图像分类和定位(检测)
相关工作:

  • 弱监督目标定位
  • 可视化CNN

Method

Class Activation Mapping(CAM)

CAM技术详细且简洁地展示了如何用CNN进行目标定位(检测)以及可视化,原理很简单,主要基于global average pooling(GAP)

在这里插入图片描述

  • Firstly, get the last convolutional layer feature maps f k ( x , y ) f_k(x,y) fk(x,y),is the k t h kth kth channel feature map, channel num is n n n

  • Sencondly,use global average pooling to get F k F_{k} Fk
    F k = ∑ x , y f k ( x , y ) F_{k} = \sum_{x,y} f_k(x,y) Fk=x,yfk(x,y)

  • Thirdly,use a FC layer,get class score S c S_{c} Sc,it can be used to compute softmax cross entropy loss and then to train
    S c = ∑ k w k c ∗ F k S_{c} = \sum_{k} w_{k}^{c} * F_{k} Sc=kwkcFk

  • Finally,we can get class activation map by the weights w k c w_{k}^{c} wkc for every class c c c,the resolution of M c ( x , y ) M_c(x,y) Mc(x,y) and f k ( x , y ) f_k(x,y) fk(x,y) is same, and we can upsample it to get final map(size is same with oral image)
    M c ( x , y ) = ∑ k w k c ∗ f k ( x , y ) M_c(x,y) = \sum_{k}w_{k}^{c} * f_{k}(x,y) Mc(x,y)=kwkcfk(x,y)

Experiments

classification result

在这里插入图片描述
Compared with original network(VGG、GoogleLeNet et al),use GAP there is a small drop of 1%-2%.

Localization

在这里插入图片描述
Compared with fully-supervised methods,use CAM there is a large difference,at last this method not use bounding box.

Conclusion

  • It is important that we can use classification-trained CNNs to learn to localization,without using any bounding box.
  • The class activation mapping method is easy to transfer to other task for example captioning、VAQ et al.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值