论文笔记:Faster RCNN

Part I. RCNN
(Regarding to  CNN training; pos/neg samples definition; performance of using different layers as feature maps; bounding box regression,  etc.)
Notes:
1. architecture
2. advantages & disadvantages:
a. use CNN feature extraction as opposed to traditional feature learning methods.
b. around 2000 proposals to feed into CNN ==> computationally costly ==> SPP net

PART II. SPP net (spatial pyramid pooling)
Notes:
1. Motivation: CNN requires standard input size ⇒ crop/wrap leads to information loss ⇒ specifically, only FC need uniform size ⇒ construct SPP layer to transform various sizes of conv outputs to same size of FC input ⇒ application to RCNN: share conv layers for all proposals to reduce cost.
2. Architecture:
Input whole image ⇒ conv layers to get feature maps (256) ⇒ project proposal regions onto feature map (how? Discussed in detailed notes) ⇒ SPP layer: for each proposal, apply different pooling kernels to get 4x4, 2x2, 1x1 outputs (3 levels of pyramid) and concatenate them into a vector (16+4+1 = 21) (how to calculate window size and stride?Paper section 2.3) ⇒ FC + SVM + regression
3. Advantages: quicker (24x); multiple levels of pyramid help to extract different level of information from image, higher accuracy.

PART III. FAST RCNN
Notes:
  1. Motivation: implement SPP to RCNN (RoI pooling); joint SVM, Bbox regression to RCNN
  2. Architecture:
RoI pooling layer: 1 level SPP layer;
Multi-task loss layer:
Where u is true class, v is true regression object, p is prob vector, t = [deltax, deltay, width, height]
  1. classification loss (Lcis): N+1 softmax loss (1 for background)
  2. Regression loss (Lloc): 4*N regressor (for each class, output deltax, deltay, width and height)

PART IV. FASTER RCNN
Github link:https://github.com/ShaoqingRen/faster_rcnn
Notes:
1. Motivation: fast RCNN uses separate pipelines for making proposals and getting feature maps ⇒ making proposal could be done through CNN (Region proposal networks RPN)
2. Architecture of RPN

Whole image feeded into CNN(any benchmark)==> last conv layer ==> 3x3 sliding windows to look at pix’s neighbour, and each sliding position has 9 (3 ratios * 3 sizes) anchors (9 proposals) ==>all together (W*H*9) anchors⇒  1x1 conv (look at channel’s infor) and get a vector ==> cls and reg loss layer
Multitask loss layer:
3. Training process of RPN and fast RCNN
phase 1: train RPN ⇒ get proposals ⇒ feed to fast RCNN and train
phase 2: feed RCNN convolution weight to RPN conv, keep RCNN & RPN conv layers learning rate =0, only train FC and loss layer of RPN ⇒ feed proposals to fast RCNN and train FC layer

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值