rcnn中的可视化的理解

最新推荐文章于 2023-08-03 16:37:43 发布

仙女修炼史

最新推荐文章于 2023-08-03 16:37:43 发布

阅读量254

点赞数

分类专栏：目标检测文章标签：深度学习

本文链接：https://blog.csdn.net/weixin_45209433/article/details/120822117

版权

目标检测专栏收录该内容

26 篇文章 4 订阅

订阅专栏

本文详细解析RCNN论文中的可视化过程，重点解释如何通过Selective Search获取region proposals，对Pool5层的特征进行可视化。在大约10 million region proposals上计算激活值，排序并应用非极大值抑制，选取高分区域展示。每个Pool5单元的特征映射到原始输入的195x195像素，通过展示不同通道的前16个最高激活值，揭示网络学习到的不同特征。

摘要由CSDN通过智能技术生成

rcnn中的可视化怎么理解

下图是rcnn论文中的可视化图，第一次看的时候就不是很明白，这次要把它彻底搞明白
在这里插入图片描述
论文中是这样说的：
That is, we compute the unit’s activations on a large set of held-out region proposals (about 10 million), sort the proposals from highest to lowest activation, perform non-maximum suppression, and then display the top-scoring regions。
我们在大概10milion 的region proposals上计算每个proposals 的unit上的activations，然后将这些region proposals按照activations 的大小从大到小排序，使用NMS ，然后将分数比较高的regions 展示出来。
这句话翻译出来，还是不知道什么意思，再看它下面怎么说；
We visualize units from layer pool5. The pool5 feature map is 6 × 6 × 256 = 9216 dimensional. Ignoring boundary effects, each pool5 unit has a receptive field of 195×195 pixels in the original 227×227 pixel input.
Each row in Figure 4 displays the top 16 activations for a pool5 unit from a CNN that we fine-tuned on VOC 2007 trainval. Six of the 256 functionally unique units are visualized (Appendix D includes more).

我们可视化pool5层的单元，pool5层的特征层的大小是6x6x256，feature map 上的每个象素的感受野史195x195，每个象素的特在是256，activation值的就是256 中的值，6 指的是256 其中的6 个， 16 指的是16个regions。

我的理解：
在这里插入图片描述

图A中，每张图像通过seletive search的方法得到一定数量的region proposals，将所有测试图片的region proposals 收集起来，依次经过RCNN网络，我们现在要看的是pool5 层的特征学到的是什么，因此将pool5层的特征展示出来，pool5的特征图的大小是6x6x256，其实可以这样理解，227x227x3 的图像，卷积完之后变为了6x6x256，特征图的感受野为127x127，也就是特征图上1x1x256的特征代表的是原图上对应位置上127x127x3的数据。
接下来计算6x6x256=9216个数据中的最大值，这个最大值max_value 要保留其(x, y, channel)的信息，这样每个regions propoals都有一个自己的max_value。

图B中，将所有的max_value 从大到小排序，这样也将regions 从大到小排序，然后，将最大值在channel 0 位置上的单独提取出来，取前16个，然后将这前16个的max_value 通过特征图上的位置信息(x,y )映射到原图上，这样就得到了原图上的白色框框，可视化出来的值，就是这个max_value.。
然后按照同样的方式，将最大值在channel 1位置上的单独提取出来，取前16 个，然后展示出来。这样依次展示前6个特征。

如果有误敬请指出，互相学习。