14-目标检测Faster-RCNN

遥远的阿勒泰

已于 2024-07-28 00:21:34 修改

阅读量827

点赞数 17

文章标签：目标检测人工智能计算机视觉

于 2024-07-28 00:20:40 首次发布

本文链接：https://blog.csdn.net/Kang_Kang330/article/details/140741775

版权

IOU

IOU= s交/s并 >=0.5 正样本
TP:T是否被正确分类，P该样本原本是正样本还是负样本（Positive正样本，Negative负样本）
TN
FP
FN

精确度precision和召回率recall

precision 认为是正类且确实是正样本的部分占所有分类器认为是正类的比例——找的对 TP/（TP+FP）
recall 认为是正类且确认是正类的部分占所有确实是正类的比例——找的全 TP/（TP+FN）

边框回归Bounding-Box regression

在这里插入图片描述

Faster-RCNN

two-starge
在这里插入图片描述

1.Conv layers

特征提取。一堆的卷积，relu，池化
13个conv（3*3卷积）
13分relu
4个pooling
在Conv layers中：

所有的conv层都是：kernel_size=3，pad=1，stride=1
所有的pooling层都是：kernel_size=2，pad=1，stride=2
conv和relu层不改变输入输出大小，只有pooling层使输出长
宽都变为输入的1/2。
那么，一个MxN大小的矩阵经过Conv layers固定变为(M/16)x(N/16)。
这样Conv layers生成的特征图feature map都可以和原图对应起来

2.RPN区域生成网络

在这里插入图片描述

分为2条线：

上面一个线通过softmax分类anchors，获得Positive和Negative
下面一个线计算对于anchors的Bounding-Box regression偏移量，以获得精确的proposals
—详细步骤
1.anchors每个点画9个框

最终得到
做减法，用softmax判断Positive、Negative

经过1*1卷积后，通道数为18，有9个anchors（每2个通道对于1个anchors，分别代表Positive和Negative）
前面的positive/negative anchors的矩阵，其在caffe中的存储形式为[1, 18, H, W]。而在softmax
分类时需要进行positive/negative二分类，所以reshape layer会将其变为[1, 2, 9xH, W]大小，即
单独“腾空”出来一个维度以便softmax分类，之后再reshape回复原状
综上所述，RPN网络中利用anchors和softmax初步提取出positive anchors作为候选区域
做边框回归B-box regression。
输出通道为36——每个点都有9个anchors，每个anchors有4个点用于回归（dx,dy,dw,dh）
proposal layer 建议层
Proposal Layer负责综合所有变换量和positive anchors，计算出精准的proposal，送入后续RoI Pooling
Layer。
Proposal Layer有4个输入：
1. positive vs negative anchors分类器结果rpn_cls_prob_reshape，
2. 对应的bbox reg的变换量rpn_bbox_pred，
3. im_info
4. 参数feature_stride=16
  Proposal Layer 按照以下顺序依次处理：
  1. 利用变换量对所有的positive anchors做bbox regression回归
  2. 按照输入的positive softmax scores由大到小排序anchors，提取前pre_nms_topN(e.g. 6000)个anchors，
    即提取修正位置后的positive anchors。
  3. 对剩余的positive anchors进行NMS（non-maximum suppression）。
  4. 之后输出proposal。
    总结：
    生成anchors -> softmax分类器提取positvie anchors -> bbox reg回归positive anchors -> Proposal Layer生成proposals

3.ROI pooling

RoI Pooling原理
新参数pooled_w、pooled_h和spatial_scale（1/16）
RoI Pooling layer forward过程：

由于proposal是对应MN尺度的，所以首先使用spatial_scale参数将其映射回(M/16)(N/16)大小的feature map尺度；
再将每个proposal对应的feature map区域水平分为pooled_w * pooled_h的网格；
对网格的每一份都进行max pooling处理。
这样处理后，即使大小不同的proposal输出结果都是pooled_w* pooled_h固定大小，实现了固定长度输出。
缺点：存在四舍五入的误差

4.分类classification

Classification部分利用已经获得的proposal feature maps，通过full connect层与softmax计算每个proposal具体属于那个类别（如人，车，电视等），输出cls_prob概率向量；
再次利用bounding box regression获得每个proposal的位置偏移量bbox_pred，用于回归更加精确的目标检测框。