Faster RCNN

最新推荐文章于 2022-07-03 10:40:41 发布

zhulf0804

最新推荐文章于 2022-07-03 10:40:41 发布

阅读量234

点赞数

分类专栏：计算机视觉文章标签： faster-rcnn anchor

本文链接：https://blog.csdn.net/zhulf0804/article/details/100748979

版权

计算机视觉专栏收录该内容

21 篇文章 0 订阅

订阅专栏

写的很乱，仅为记录学习过程, 主要是看博客和代码的心得…

好久以前就看过faster rcnn, 但一直是似懂非懂，最近结合着网络上的教程和代码，理清了faster rcnn的整个过程.

前面的数据部分和backbone网络部分都比较常规，就是输入一个batch的图片到backbone，然后输出一个 batch_size * channels * height * width的特征图。接下来重点说一下在得到backbone的特征图之后， faster rcnn 做了什么。

首先是经过了一个卷积核为3的卷积，然后再分别经过两个卷积核为1的卷积，得到了通道数为n_anchor * 4 和 n_anchor * 2的两个特征图，分别表示位置信息和类别信息，如下图所示(通道数画的不规范)。这一步也好理解。接下来我要做两件事情，第一件是产生rois(region of intrest)，第二件是计算rpn损失.
产生rois

在产生rois的使用anchor和网络输出的位置信息和类别信息, 其中anchor的尺寸是128, 256, 512, 三种比例分别是1:1, 1:2, 1:3，这个是提前设置好的. 在特征图的每一个位置上都有对应的9个框，如下图所示:

在特征图的每个位置，都有anchor box和位置(location)信息，通过如下公式进行解析位置信息，进而产生rois.

$t_x = (x - x_a)/w_a, t_y = (y - y_a)/h_a$
$t_w = log(w/w_a), t_h = log(h/h_a)$

对应程序实现如下:
```
anchor_height = anchor_bbox[:, 2] - anchor_bbox[:, 0]
anchor_width = anchor_bbox[:, 3] - anchor_bbox[:, 1]
anchor_ctr_y = anchor_bbox[:, 0] + 0.5 * anchor_height
anchor_ctr_x = anchor_bbox[:, 1] + 0.5 * anchor_width

dy = loc[:, 0]
dx = loc[:, 1]
dh = loc[:, 2]
dw = loc[:, 3]

ctr_y = dy * anchor_height[:, np.newaxis] + anchor_ctr_y[:, np.newaxis]
ctr_x = dx * anchor_width[:, np.newaxis] + anchor_ctr_x[:, np.newaxis]
h = np.exp(dh) * anchor_height[:, np.newaxis]
w = np.exp(dw) * anchor_width[:, np.newaxis]

dst_bbox = np.zeros(loc.shape, dtype=loc.dtype)
dst_bbox[:, 0::4] = ctr_y - 0.5 * h
dst_bbox[:, 1::4] = ctr_x - 0.5 * w
dst_bbox[:, 2::4] = ctr_y + 0.5 * h
dst_bbox[:, 3::4] = ctr_x + 0.5 * w
```
但是这样产生的rois个数太多了，需要做进一步处理.
- 图像外面的框进行裁剪
- 去除宽或高小于给定阈值的框
- 对这些roi根据score进行降序排序，取top 12000(测试时top 6000)
- nms之后取出top 2000(测试时top300)
- 到这里，外面得到了2000个(测试时是300)个候选框
对应程序如下:
```
# Convert anchors into proposal via bbox transformations.
# roi = loc2bbox(anchor, loc)
roi = loc2bbox(anchor, loc) # (hh * ww * 9, 4)

# Clip predicted boxes to image.
roi[:, slice(0, 4, 2)] = np.clip(
    roi[:, slice(0, 4, 2)], 0, img_size[0])
roi[:, slice(1, 4, 2)] = np.clip(
    roi[:, slice(1, 4, 2)], 0, img_size[1])

# Remove predicted boxes with either height or width < threshold.
min_size = self.min_size * scale
hs = roi[:, 2] - roi[:, 0]
ws = roi[:, 3] - roi[:, 1]
keep = np.where((hs >= min_size) & (ws >= min_size))[0]
roi = roi[keep, :]
score = score[keep]

# Sort all (proposal, score) pairs by score from highest to lowest.
# Take top pre_nms_topN (e.g. 6000).
order = score.ravel().argsort()[::-1]
if n_pre_nms > 0:
    order = order[:n_pre_nms]
roi = roi[order, :]

# Apply nms (e.g. threshold = 0.7).
# Take after_nms_topN (e.g. 300).

# unNOTE: somthing is wrong here!
# TODO: remove cuda.to_gpu
keep = non_maximum_suppression(
    cp.ascontiguousarray(cp.asarray(roi)),
    thresh=self.nms_thresh)
if n_post_nms > 0:
    keep = keep[:n_post_nms]
roi = roi[keep]
```
计算rpn loss

在得到位置信息(h*w*n_anchor*4)和类别信息(h*w*n_anchor*2)后，将其进行reshape，分别得到 (R, 4)和(R, 2)的tensor，其中 R = h*w*n_anchor.

为了计算loss，还需要知道ground truth信息, 另外，并不每个所有的R个框都需要计算，论文中选择了256个样本，其中正样本不多于128个。下面说一下正负样本是怎么选择的和ground truth是怎么得到的.

对于一张输入图片，我们会有S个ground truth bbox；
- 让RPN产生的R个anchor box与S个ground truth bboxes进行iou计算
- 与S个ground truth bboxes具有最大的S个anchor box为正样本
- 与任意一个ground truth bbox的iou大于 0.7 的为正样本
- 与任意一个ground truth bbox的iou小于 0.3 的为负样本
- 从正样本里选择不大于128个正样本
- 从负样本里选择不大于 (256 - #已选择正样本)的负样本
- 对选取的不超过256个正负样本计算交叉熵损失
- 对选取的不超过128个正样本计算smooth l1 loss

RoIHead(Fast RCNN)

对RPN产生的2000个rois，挑选出128个样本参与Fast RCNN的训练:

将ground truth bboxes加入到rois，得到新的rois
新的rois与ground truth bboxes 计算iou
对于每个roi，与其有最大iou的ground truth bbox的类别是其类别
对于每个roi，与其有最大iou的ground truth bbox的位置信息是ground truth 位置信息, 但是这里需要做两部处理，1是将gt的位置信息转化为offset，2是进行normalize处理
根据iou阈值选取128个正负样本(1:3)参与训练
相关代码如下:

n_bbox, _ = bbox.shape

    roi = np.concatenate((roi, bbox), axis=0)

    pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)
    iou = bbox_iou(roi, bbox)
    gt_assignment = iou.argmax(axis=1)
    max_iou = iou.max(axis=1)
    # Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].
    # The label with value 0 is the background.
    gt_roi_label = label[gt_assignment] + 1

    # Select foreground RoIs as those with >= pos_iou_thresh IoU.
    pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]
    pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
    if pos_index.size > 0:
        pos_index = np.random.choice(
            pos_index, size=pos_roi_per_this_image, replace=False)

    # Select background RoIs as those within
    # [neg_iou_thresh_lo, neg_iou_thresh_hi).
    neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &
                         (max_iou >= self.neg_iou_thresh_lo))[0]
    neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image
    neg_roi_per_this_image = int(min(neg_roi_per_this_image,
                                     neg_index.size))
    if neg_index.size > 0:
        neg_index = np.random.choice(
            neg_index, size=neg_roi_per_this_image, replace=False)

    # The indices that we're selecting (both positive and negative).
    keep_index = np.append(pos_index, neg_index)
    gt_roi_label = gt_roi_label[keep_index]
    gt_roi_label[pos_roi_per_this_image:] = 0  # negative labels --> 0
    sample_roi = roi[keep_index]

    # Compute offsets and scales to match sampled RoIs to the GTs.
    gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]])
    gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
                   ) / np.array(loc_normalize_std, np.float32))

    return sample_roi, gt_roi_loc, gt_roi_label

挑选出的RoIs结合backbone输出的特征图，进行RoI pooling和全连接操作，产生 128 x 21 和 128 x (21 x 4)的两组tensor，分别表示类别信息和位置信息.

在训练时分别使用smooth l1 loss 和交叉熵loss 进行训练.

在测试时进行nms输出模型最后的结果

下面是一些在看博客时记录的笔记.

Faster RCNN 主要包括四部分:

[外链图片转存失败(img-ZMoSdUew-1568206732880)(./images/faster_rcnn.png)]

Dataset: 数据，提供符合要求的数据格式
Extractor: 利用CNN提取图片特征features
RPN(Region Proposal Network): 负责提供候选区域rois(每张图大概2000个候选框)
RoIHead: 负责对rois分类和微调. 对RPN找出的rois，判断它是否包含目标，并修正框的位置和坐标.

RPN

Faster RCNN最突出的贡献就在于提出了Region Proposal Network(RPN) 代替 Selective Search，从而将候选区域的提取时间开销从2s降到0.01s

Anchor

在RPN中，作者提出了anchor。Anchor是大小和尺寸固定的候选框，论文中用到的anchor有三种尺寸和三种比例，三种尺寸分别是128, 256, 512, 三个比例是1:1, 1:2, 2:1, 3 x 3的组合共有9中anchor.

然后利用这9种anchor在特征图上移动，每一个特征图的点都有9个anchor，最终生成了(H/16) * (W/16) * 9个anchor，对于一个512 x 62 x 37的特征图，有62 x 37 x 9 ~ 20000 个anchor.
训练RPN

RPN在Extractor输出的特征图的基础上，先增加了一个卷积，然后利用两个1 x 1的卷积分别进行二分类和位置回归. 进行分类的卷积核通道数是9 * 2，进行回归的卷积核通道数为9 * 4.

接下来RPN做的事情就是利用AnchorTargetCreator将20000多个候选的anchor选出256个anchor进行分类和回归位置. 选择过程如下:
- 对于每一个ground truth bounding box(gt_bbox)，选择和它重叠度(IoU)最高的一个anchor作为正样本.
- 对于剩下的anchor，从中选择和任意一个gt_bbox重叠度超过0.7的anchor，作为正样本, 正样本的数目不超过128个.
- 随机选择核gt_bbox重叠度小于0.3的anchor作为负样本，负样本和正样本的总和为256.
对于每个anchor, gt_label为1或-1，gt_loc则是由4个位置参数 $t_x, t_y, t_w, t_h)$ 组成, 这样比直接回归坐标更好,

$t_x = (x - x_a)/w_a, t_y = (y - y_a)/h_a$
$t_w = log(w/w_a), t_h = log(h/h_a)$
$t_x^* = (x^* - x_a)/w_a, t_y = (y^* - y_a)/h_a$
$t_w^* = log(w^*/w_a), t_h^* = log(h^*/h_a)$

计算分类损失用的是交叉熵损失，而计算回归损失用的是Smooth_l1_loss，在计算回归损失的时候，只计算正样本(前景)的损失，不计算负样本的位置损失.
RPN生成RoIs

RPN在自身训练的同时，还会提供RoIs给Fast RCNN作为训练样本，RNP生成RoI的过程(ProposalCreator)如下:
- 对于每张特图片，利用它的feature map，计算 (H/16) x (W/9) x 9(大概20000个)anchor属于前景的概率，以及对应的位置参数.
- 选取概率较大的12000个anchor
- 利用回归的位置参数，修正这12000个anchor的位置, 得到RoIs
- 利用非极大值抑制，选出概率最大的2000个RoIs
- 在inference的时候，为了提高处理速度，12000和2000分别变为6000和300.

RPN的输出: RoIs, 形如 2000 x 4或者300 x 4的Tensor.

zhulf0804

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Faster RCNN

写的很乱，仅为记录学习过程, 主要是看博客和代码的心得…好久以前就看过faster rcnn, 但一直是似懂非懂，最近结合着网络上的教程和代码，理清了faster rcnn的整个过程.前面的数据部分和backbone网络部分都比较常规，就是输入一个batch的图片到backbone，然后输出一个 batch_size * channels * height * width的特征图。接下来重点说...
复制链接

扫一扫