论文：Faster R-CNN

最新推荐文章于 2021-05-21 14:06:08 发布

xxiaozr

最新推荐文章于 2021-05-21 14:06:08 发布

阅读量253

点赞数

分类专栏：论文

本文链接：https://blog.csdn.net/xxiaozr/article/details/78597794

版权

论文专栏收录该内容

29 篇文章 0 订阅

订阅专栏

论文：
SPPnet 和 Fast R-CNN 在检测问题上减少了运行时间，region proposal 的计算成为了瓶颈。我们提出了Region Proposal Network(RPN).

1.Introduction
region proposal 和 region-based convolutional 网络使得目标检测有了进步，虽然region-based CNN computationally expensive.之后Fast R-CNN通过 share convolution across proposals大幅降低了computation.现在proposals称为了瓶颈。
提出RPN方法解决这一问题，.我们发现被用来做 region-base detecotr 的卷积 feature map也可以用来生成 region proposals.
5fps, 70.4 mAP

2.Related Work
Deep Networks for Object Detection：
R-CNN只要是一个分类器，并没有预测object bounds，它的准确率依赖于region proposal module的好坏。

3.Region Proposal Networks
输入是任意尺寸的image，输出是 a set of rectangular object proposals，each with an objectness score.
这里写图片描述
在最后一层share convolution 层上滑动窗口，每一个窗口产生一个256d的向量。将这个向量输入给两个全连接层，一个是box-regression layer(reg)，一个是box-classification layer(cls).
在每一个滑动窗口处，都预测k个region proposals。所以reg层输出是 4k，cls层输出是 2k。对于一个W*H的feature map ，有W*H*k个anchors.
对每一个anchor,assign a binary class labell(an object or not )
两种情况是 positive label:
1) anchor和groundtruth box 有最高的IoU.
2) anchor和任何一个groundtruth box 有超过0.7的IoU.
对于和所有的groundtruth box 的 IoU都小于0.3的标 negative.
既不是positive,也不是negative的不参与之后的运算。
RPNs multi-task loss如下：
这里写图片描述
pi*=1 or 0.
ti 是 predicted bounding box
ti* 是 groundtruth box associated with a positive anchor.
class loss 是 log loss ，regression loss 是 robust loss function( smooth L1)
N 用来 normalization
λ用来平衡 weight

对于位置的预测相当于是 anchor box 对 nearby ground-truth box 的回归问题

Optimization:
end-to-end by back-propagation and stochastic gradient descent.
每一个mini-batch 都是一个图片，包含许多positive ,negative examples.
zero-mean Gaussian with deviation 0.01
60k lr0.001 20k lr0.0001

Implementation Details
re-scale图片，使他们的短边长是600pixel. 多尺度的特征提取也许会增加精度但是对速度有影响。
anchor有三个尺寸，1:1,1:2,2:1.
忽略掉cross-boundary anchors，训练的时候还是会生成 cross-boundary proposal boxes，clip 这些 image boundary.
使用NMS，将重叠度高的proposal融合。
最后每个image剩下大概2k个proposal.

这里写图片描述
在share convolution之后，接rpn网络，使用3*3卷积映射到 256 map，之后分两部分，分别接 1 * 1 卷积，上面判断某一个元素的几个框属于前景还是背景，下面接1 * 1 卷积预测框的偏移。
之后再根据im_info，偏移量，计算偏移loss，得出精准的proposals，对proposal中有前景的，和groundtruth box重叠度高的设为1。
在featrue map上对 label为1的proposal使用 RoI Pooling。
为什么使用RoI Pooling ,因为生成的proposals大小都不相同，使用其获得相同的输出,固定大小的 feature map。
之后送到cla层判断其具体的物体类别，以及再次进行 regression，获得proposal的偏移量，获得更精准的proposals。

unshared features 的时候，使用RPN+VGG,Fast R-CNN 获得的街而过比 SS 稍好一点点
shared featrues 时，比 strong SS都要好

Two-stage Proposal + Detection:
先使用 RPN 网络产生 proposals，然后使用 ROI pooling

代码阅读：
前几层是基本的卷积网络，生成卷积 feature map
最后一层是 conv5_3

#  con5_3
layer {
  name: "relu5_3"
  type: "ReLU"
  bottom: "conv5_3"
  top: "conv5_3"
}

接下来接 rpn 网络

#========= RPN ============
#对卷积 feature map 做卷积 channel 变为512
layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "conv5_3"
  top: "rpn/output"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
  name: "rpn_relu/3x3"
  type: "ReLU"
  bottom: "rpn/output"
  top: "rpn/output"
}
# 对上一层rpn/output 的 feature map 做卷积操作，生成的 channel 
#为bg/fg*anchors=2*12=48，用于和之后的label,即 rpn_labels 做softmax loss计算
#即rpn loss 的第一项
layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 24   # 2(bg/fg) * 12(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
# 对 rpn/output 做卷积操作，channel 为 coordinate*anchors=4*12=48
#这个用来和之后算出来的 anchor_target，即 rpn_bbox_targets 计算 
#SmoothL1Loss，即 rpn loss 的第二项
layer {
  name: "rpn_bbox_pred"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 48   # 4 * 12(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}#
#由anchor_target_layer.py实现
#根据卷积层featrue map 的大小和 stride ，计算出卷积层 feature map
#的每一个位置在原始图像上对应的位置，根据base_size生成一个（0,0）（base_size-1,base_size）的anchor,后根据 ration 和 scale 生成一个位置上的多个 anchors
#根据feature map在原始图上的坐标，在原始图上得到 W*H*numofanchors 
#W,H 是feature map 的大小，对获得的在原图上的 anchors ,对其和 gt_box
#做 overlap ，对 anchors 标记{0,1-1}的标签，即为rpn_labels. 之后计算 
#anchors 和gt_box具有最高 overlap 的gt_box 的偏移,即为 rpn_bbox_targets 

layer {
  name: 'rpn-data'
  type: 'Python'
  bottom: 'rpn_cls_score'
  bottom: 'gt_boxes'
  bottom: 'im_info'
  bottom: 'data'
  top: 'rpn_labels'
  top: 'rpn_bbox_targets'
  top: 'rpn_bbox_inside_weights'
  top: 'rpn_bbox_outside_weights'
  python_param {
    module: 'rpn.anchor_target_layer'
    layer: 'AnchorTargetLayer'
    param_str: "'feat_stride': 16 \n'scales': !!python/tuple [4, 8, 16, 32]"
  }
}
#用于 labels 做 softmax 分类
layer {
  name: "rpn_loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "rpn_cls_score_reshape"
  bottom: "rpn_labels"
  propagate_down: 1
  propagate_down: 0
  top: "rpn_cls_loss"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
  }
}
#用于anchors 坐标回归
layer {
  name: "rpn_loss_bbox"
  type: "SmoothL1Loss"
  bottom: "rpn_bbox_pred"
  bottom: "rpn_bbox_targets"
  bottom: 'rpn_bbox_inside_weights'
  bottom: 'rpn_bbox_outside_weights'
  top: "rpn_loss_bbox"
  loss_weight: 1
  smooth_l1_loss_param { sigma: 3.0 }
}


#========= RoI Proposal ============

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}

layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  reshape_param { shape { dim: 0 dim: 24 dim: -1 dim: 0 } }
}
#根据 rpn 层计算出来的 rpn_bbox_pred 算出来的偏移和 anchors 相加
#获得 proposals,对这些 proposals 做一些处理，clip ,sort ，nms 等操作
#输出 rois 用于之后计算。
layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rpn_rois'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16 \n'scales': !!python/tuple [4, 8, 16, 32]"
  }
}#
#根据 rpn_roi 和 ground truth box 算出 bbox_target ，即偏移，对 rpn_roi 做一些处理产生 rois
layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'rpn.proposal_target_layer'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 81"
  }
}

总结：
R-CNN proposal 应用在原始的图片上，resize之后输入CNN提取特征，之后使用SVM进行分类，R-CNN只是一个分类器，没有对位置做预测。
Fast R-CNN 在share convolution层上根据proposals使用 ROI pooling。虽然其在share convolution之后使用proposals，加快了运算，但是还是要有单独的proposal 的产生。
Faster R-CNN 将proposal 加入到CNN中。在RPN网络中，生成了proposal。
在RPN网络中的 box 回归是类无关的，SSD网络也是类无关的回归，Fast RCNN等最后的 box 回归是 class-specific 的。