论文阅读笔记：Learning Deep Features for Discriminative Localization

最新推荐文章于 2022-10-12 21:52:14 发布

忘泪

最新推荐文章于 2022-10-12 21:52:14 发布

阅读量607

点赞数 1

分类专栏：论文阅读

本文链接：https://blog.csdn.net/wl1710582732/article/details/88083093

版权

10 篇文章 1 订阅

订阅专栏

基于弱监督学习的图像分类和定位(检测)
相关工作:

CAM技术详细且简洁地展示了如何用CNN进行目标定位(检测)以及可视化，原理很简单，主要基于global average pooling(GAP)

在这里插入图片描述

Firstly, get the last convolutional layer feature maps $f_k(x,y)$ ，is the $k t h$ channel feature map, channel num is $n$
Sencondly，use global average pooling to get $F_{k}$
$F_{k} = \sum_{x,y} f_k(x,y)$
Thirdly，use a FC layer，get class score $S_{c}$ ，it can be used to compute softmax cross entropy loss and then to train
$S_{c} = \sum_{k} w_{k}^{c} * F_{k}$
Finally，we can get class activation map by the weights $w_{k}^{c}$ for every class $c$ ，the resolution of $M_c(x,y)$ and $f_k(x,y)$ is same, and we can upsample it to get final map(size is same with oral image)
$M_c(x,y) = \sum_{k}w_{k}^{c} * f_{k}(x,y)$

在这里插入图片描述
Compared with original network(VGG、GoogleLeNet et al)，use GAP there is a small drop of 1%-2%.

在这里插入图片描述
Compared with fully-supervised methods，use CAM there is a large difference，at last this method not use bounding box.

It is important that we can use classification-trained CNNs to learn to localization，without using any bounding box.
The class activation mapping method is easy to transfer to other task for example captioning、VAQ et al.