论文笔记：Faster RCNN

最新推荐文章于 2022-06-20 10:22:50 发布

John2King

最新推荐文章于 2022-06-20 10:22:50 发布

阅读量1.4k

点赞数

分类专栏： DL 文章标签： cnn rcnn

本文链接：https://blog.csdn.net/lebula/article/details/51699157

版权

DL 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

Part I. RCNN

Paper link: http://arxiv.org/pdf/1311.2524v5.pdf

Github link: https://github.com/rbgirshick/rcnn

Detailed notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/

(Regarding to CNN training; pos/neg samples definition; performance of using different layers as feature maps; bounding box regression, etc.)

Notes:

1. architecture

2. advantages & disadvantages:

a. use CNN feature extraction as opposed to traditional feature learning methods.

b. around 2000 proposals to feed into CNN ==> computationally costly ==> SPP net

PART II. SPP net (spatial pyramid pooling)

paper link: http://arxiv.org/pdf/1406.4729v4.pdf

github link: https://github.com/ShaoqingRen/SPP_net

Detailed notes: http://zhangliliang.com/2014/09/13/paper-note-sppnet/

Notes:

1. Motivation: CNN requires standard input size ⇒ crop/wrap leads to information loss ⇒ specifically, only FC need uniform size ⇒ construct SPP layer to transform various sizes of conv outputs to same size of FC input ⇒ application to RCNN: share conv layers for all proposals to reduce cost.

2. Architecture:

Input whole image ⇒ conv layers to get feature maps (256) ⇒ project proposal regions onto feature map (how? Discussed in detailed notes) ⇒ SPP layer: for each proposal, apply different pooling kernels to get 4x4, 2x2, 1x1 outputs (3 levels of pyramid) and concatenate them into a vector (16+4+1 = 21) (how to calculate window size and stride?Paper section 2.3) ⇒ FC + SVM + regression

3. Advantages: quicker (24x); multiple levels of pyramid help to extract different level of information from image, higher accuracy.

PART III. FAST RCNN

Paper link: http://arxiv.org/pdf/1504.08083v2.pdf

Github link: https://github.com/rbgirshick/fast-rcnn

Detailed notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/

Notes:

Motivation: implement SPP to RCNN (RoI pooling); joint SVM, Bbox regression to RCNN
Architecture:

RoI pooling layer: 1 level SPP layer;

Multi-task loss layer:

Where u is true class, v is true regression object, p is prob vector, t = [deltax, deltay, width, height]

classification loss (Lcis): N+1 softmax loss (1 for background)
Regression loss (Lloc): 4*N regressor (for each class, output deltax, deltay, width and height)

PART IV. FASTER RCNN

Paper link: http://arxiv.org/pdf/1506.01497v3.pdf

Github link:https://github.com/ShaoqingRen/faster_rcnn

Notes:

1. Motivation: fast RCNN uses separate pipelines for making proposals and getting feature maps ⇒ making proposal could be done through CNN (Region proposal networks RPN)

2. Architecture of RPN

Whole image feeded into CNN(any benchmark)==> last conv layer ==> 3x3 sliding windows to look at pix’s neighbour, and each sliding position has 9 (3 ratios * 3 sizes) anchors (9 proposals) ==>all together (W*H*9) anchors⇒ 1x1 conv (look at channel’s infor) and get a vector ==> cls and reg loss layer

Multitask loss layer:

3. Training process of RPN and fast RCNN

phase 1: train RPN ⇒ get proposals ⇒ feed to fast RCNN and train

phase 2: feed RCNN convolution weight to RPN conv, keep RCNN & RPN conv layers learning rate =0, only train FC and loss layer of RPN ⇒ feed proposals to fast RCNN and train FC layer