【Faster RCNN】Faster R-CNN笔记

最新推荐文章于 2024-06-16 19:42:47 发布

One__Coder

最新推荐文章于 2024-06-16 19:42:47 发布

阅读量2k

点赞数 1

分类专栏：论文阅读

本文链接：https://blog.csdn.net/github_37973614/article/details/81258593

版权

Faster R-CNN通过引入Region Proposal Network (RPN)解决了传统区域提议耗时的问题，实现了端到端的训练。RPN在共享卷积特征上生成多尺度的anchor，并通过分类和回归损失进行训练。Fast R-CNN利用RPN的提议进行分类和边界框回归。训练过程包括交替训练和近似联合训练。关键组件包括平移不变性的多尺度anchor和ROI Pooling层。

摘要由CSDN通过智能技术生成

论文理论笔记部分：

rcnn是将每个proposal都放入到卷积层来进行计算，fast rcnn呢，则是将图片和proposal作为输入，并且proposal是为feature map的提取提供位置信息、为regression提供位置信息、以及在classification提供位置信息。在这里，faster rcnn的输入是一张图，提取到了共享的feature map后，将feature map用来进行RPN提取proposals操作以及联合RPN的输出进行ROIs操作，最后作为fast rcnn网络的输入来做回归和分类。

Two modules:

a deep fully convolution network that proposes regions as an attention mechanism.
the fast RCNN detector that uses the proposed regions.

faster RCNN

1、RPN（Region Proposal Networks）

sppnet和fast RCNN减少了检测网络的时间，但是region proposal还是耗费很多时间。FASTER-RCNN解决了这个问题，提出了Region Proposal Network（RPN）代替selective search部分。

输入：image with any size；

输出：rectangular obect proposals with objectness score。

ultimate goal: share computation with a Fast R-CNN，implement end-to-end network.

fast RCNN结构图

Fast RCNN结构图

为了使RPN和fast rcnn分享卷积特征，所以这两个网络要使用同样的卷积层。在论文中，使用了ZF和VGG19两个网络的卷积层，作为共享卷积层。

如上图所示，为了生成region proposals，在最后一个卷积层上，用一个n*n（n=3）的小窗口（卷积层）滑动每个位置，把特征降为256维。把这256为特征分别输入到连个全连接层cls和reg。

2、Translation-Invariant Anchors（平移不变性）：

如果移动了一张图像中的一个物体，这proposal应该也移动了，而且相同的函数可以预测出热议未知的proposal。MultiBox不具备如此功能。平移不变性可以介绍模型大小。

在每个滑动窗口的位置预测k个region proposal（实验默认k=9）叫做anchor，默认使用3种尺度（scale:实验中使用128^2，256^2，512^2）和3种长宽比（ratio：实验中使用1：1，1：2，2：1），以滑动窗口的中心点为中心（An anchor is centered at the sliding window in question.）。对于一个convolutional feature map of size W*H ，一共有 W*H*k 个anchor（这里因为每个窗口点产生一个feature map 单元，每个单元里有k个anchors）。

【our anchor-based method is built on a pyramind of anchors, which is more cost-efficient.Our method classifies and regresses bounding boxes with reference to anchor boxes of multiple scales and aspect ratios】

3、Multi-Scale Anchor as Regression Reference

Two popular ways for multi-scale predictions

based on image/feature pyramids,如DPM and CNN-based methods。图像被resized成不同尺寸，然后为每一种尺寸计算feature maps(HOG或者deep convolutional features)。这种方法比较费时。
use sliding windows of multiple scales(and/or aspect ratio) on the feature maps——filters金字塔。第二种方法经常和第一种方法一起使用。

在本论文中：anchor金字塔——more cost-efficient，只依靠单尺寸的图像和feature map。

the design of multiscale anchors is a key component for sharing features without extra cost for addressing scales.

4、Loss Function for learning region proposal

为了训练PRNs，赋予anchors二值的类标对应是否包含object（只是是否包含有对象，不分类）。来对anchors赋label：

positive label：
- the anchor/anchors with the highest IO

最低0.47元/天解锁文章

One__Coder

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
6
评论
【Faster RCNN】Faster R-CNN笔记

论文理论笔记部分：rcnn是将每个proposal都放入到卷积层来进行计算，fast rcnn呢，则是将图片和proposal作为输入，并且proposal是为feature map的提取提供位置信息、为regression提供位置信息、以及在classification提供位置信息。在这里，faster rcnn的输入是一张图，提取到了共享的feature map后，将feature m...
复制链接

扫一扫

专栏目录