Faster RCNN原理与代码解析

本文主要根据这版keras实现https://github.com/dishen12/keras_frcnn来梳理一下Faster RCNN的流程(原作者删了这个实现,这是别人fork的)。同时这个tensorflow实现的版本https://github.com/endernewton/tf-faster-rcnn也比较清楚(个人感觉不如keras版简单),可以对照着看。

数据处理

backbone用的是vgg16,输出feature_map相对于网络输入stride=16,我们知道feature_map每个点对应9个anchor(3个ratio*3个scale),假设feature_map的宽高分别为w,h,则一共有w*h*9个anchor,函数calc_rpn通过计算anchor与gt box的iou得出哪些anchor是分类正样本(包含目标的前景,不考虑具体的类别)、哪些是负样本、哪些是忽略的、以及哪些是参与计算边框回归的。

def calc_rpn(C, img_data, width, height, resized_width, resized_height, img_length_calc_function):
    downscale = float(C.rpn_stride)  # 16
    anchor_sizes = C.anchor_box_scales  # [128, 256, 512]
    anchor_ratios = C.anchor_box_ratios  # [[1, 1], [1./math.sqrt(2), 2./math.sqrt(2)], [2./math.sqrt(2), 1./math.sqrt(2)]]
    num_anchors = len(anchor_sizes) * len(anchor_ratios)  # 9

    # calculate the output map size based on the network architecture
    (output_width, output_height) = img_length_calc_function(resized_width, resized_height)  # //16

    n_anchratios = len(anchor_ratios)  # 3

    # initialize empty output objectives
    y_rpn_overlap = np.zeros((output_height, output_width, num_anchors))
    y_is_box_valid = np.zeros((output_height, output_width, num_anchors))
    y_rpn_regr = np.zeros((output_height, output_width, num_anchors * 4))

    num_bboxes = len(img_data['bboxes'])  # 假设为2

    num_anchors_for_bbox = np.zeros(num_bboxes).astype(int)
    best_anchor_for_bbox = -1 * np.ones((num_bboxes, 4)).astype(int)
    best_iou_for_bbox = np.zeros(num_bboxes).astype(np.float32)
    best_x_for_bbox = np.zeros((num_bboxes, 4)).astype(int)
    best_dx_for_bbox = np.zeros((num_bboxes, 4)).astype(np.float32)

    # get the GT box coordinates, and resize to account for image resizing
    gta = np.zeros((num_bboxes, 4))
    for bbox_num, bbox in enumerate(img_data['bboxes']):
        # get the GT box coordinates, and resize to account for image resizing
        gta[bbox_num, 0] = bbox['x1'] * (resized_width / float(width))
        gta[bbox_num, 1] = bbox['x2'] * (resized_width / float(width))
        gta[bbox_num, 2] = bbox['y1'] * (resized_height / float(height))
        gta[bbox_num, 3] = bbox['y2'] * (resized_height / float(height))

    # rpn ground truth
    for anchor_size_idx in range(len(anchor_sizes)):  # 3
        for anchor_ratio_idx in range(n_anchratios):  # 3
            anchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][0]
            anchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][1]

            for ix in range(output_width):
                # x-coordinates of the current anchor box
                x1_anc = downscale * (ix + 0.5) - anchor_x / 2
                x2_anc = downscale * (ix + 0.5) + anchor_x / 2

                # ignore boxes that go across image boundaries
                if x1_anc < 0 or x2_anc > resized_width:
                    continue

                for jy in range(output_height):
                    # y-coordinates of the current anchor box
                    y1_anc = downscale * (jy + 0.5) - anchor_y / 2
                    y2_anc = downscale * (jy + 0.5) + anchor_y / 2

                    # ignore boxes that go across image boundaries
                    if y1_anc < 0 or y2_anc > resized_height:
                        continue

                    # bbox_type indicates whether an anchor should be a target
                    bbox_type = 'neg'

                    # this is the best IOU for the (x,y) coord and the current anchor
                    # note that this is different from the best IOU for a GT bbox
                    best_iou_for_loc = 0.0  # one of two

                    for bbox_num in range(num_bboxes):
                        # get IOU of the current GT box and the current anchor box
                        curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]],
                                       [x1_anc, y1_anc, x2_anc, y2_anc])
                        # calculate the regression targets if they will be needed
                        if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap:
                            cx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0
                            cy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0
                            cxa = (x1_anc + x2_anc) / 2.0
                            cya = (y1_anc + y2_anc) / 2.0

                            tx = (cx - cxa) / (x2_anc - x1_anc)
                            ty = (cy - cya) / (y2_anc - y1_anc)
                            tw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc))
                            th = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc))

                        if img_data['bboxes'][bbox_num]['class'] != 'bg':
                            # all GT boxes should be mapped to an anchor box,
                            # so we keep track of which anchor box was best
                            if curr_iou > best_iou_for_bbox[bbox_num]:
                                best_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx]  # 由此可以得到anchor的坐标
                                best_iou_for_bbox[bbox_num] = curr_iou
                                best_x_for_bbox[bbox_num, :] = [x1_anc, x2_anc, y1_anc, y2_anc]  # anchor的坐标 好像用不着?
                                best_dx_for_bbox[bbox_num, :] = [tx, ty, tw, th]

                            # we set the anchor to positive if the IOU is > 0.7
                            # (it does not matter if there was another better box, it just indicates overlap)
                            if curr_iou > C.rpn_max_overlap:
                                bbox_type = 'pos'
                                num_anchors_for_bbox[bbox_num] += 1
                                # we update the regression layer target if this IOU
                                # is the best for the current (x,y) and anchor position
                                if curr_iou > best_iou_for_loc:
                                    best_iou_for_loc = curr_iou
                                    best_regr = (tx, ty, tw, th)

                            # if the IOU is > 0.3 and < 0.7, it is ambiguous and no included in the objective
                            if C.rpn_min_overlap < curr_iou < C.rpn_max_overlap:
                                # gray zone between neg and pos
                                if bbox_type != 'pos':
                                    bbox_type = 'neutral'

                    # turn on or off outputs depending on IOUs
                    if bbox_type == 'neg':
                        y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
                        y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
                    elif bbox_type == 'neutral':
                        y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
                        y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
                    elif bbox_type == 'pos':
                        y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
                        y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1

                        start = 4 * (anchor_ratio_idx + n_anchratios * anchor_size_idx)
                        y_rpn_regr[jy, ix, start:start + 4] = best_regr

    # we ensure that every bbox has at least one positive RPN region
    for idx in range(num_anchors_for_bbox.shape[0]):
        if num_anchors_for_bbox[idx] == 0:
            # no box with an IOU greater than zero ...
            if best_anchor_for_bbox[idx, 0] == -1:
                continue
            y_is_box_valid[best_anchor_for_bbox[idx, 0], best_anchor_for_bbox[idx, 1],
                           best_anchor_for_bbox[idx, 2] + n_anchratios * best_anchor_for_bbox[idx, 3]] = 1
            y_rpn_overlap[best_anchor_for_bbox[idx, 0], best_anchor_for_bbox[idx, 1],
                          best_anchor_for_bbox[idx, 2] + n_anchratios * best_anchor_for_bbox[idx, 3]] = 1
            start = 4 * (best_anchor_for_bbox[idx, 2] + n_anchratios * best_anchor_for_bbox[idx, 3])
            y_rpn_regr[best_anchor_for_bbox[idx, 0], best_anchor_for_bbox[idx, 1], start:start + 4] \
                = best_dx_for_bbox[idx, :]

    y_rpn_overlap = np.transpose(y_rpn_overlap, (2, 0, 1))
    y_rpn_overlap = np.expand_dims(y_rpn_overlap, axis=0)

    y_is_box_valid = np.transpose(y_is_box_valid, (2, 0, 1))
    y_is_box_valid = np.expand_dims(y_is_box_valid, axis=0)

    y_rpn_regr = np.transpose(y_rpn_regr, (2, 0, 1))
    y_rpn_regr = np.expand_dims(y_rpn_regr, axis=0)

    pos_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 1, y_is_box_valid[0, :, :, :] == 1))
    neg_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 0, y_is_box_valid[0, :, :, :] == 1))

    num_pos = len(pos_locs[0])

    # one issue is that the RPN has many more negative than positive regions, so we turn off some of the negative
    # regions. We also limit it to 256 regions.
    num_regions = 256

    if len(pos_locs[0]) > num_regions / 2:
        val_locs = random.sample(range(len(pos_locs[0])), len(pos_locs[0]) - num_regions / 2)
        y_is_box_valid[0, pos_locs[0][val_locs], pos_locs[1][val_locs], pos_locs[2][val_locs]] = 0
        num_pos = num_regions / 2

    if len(neg_locs[0]) + num_pos > num_regions:
        val_locs = random.sample(range(len(neg_locs[0])), len(neg_locs[0]) - num_pos)
        y_is_box_valid[0, neg_locs[0][val_locs], neg_locs[1][val_locs], neg_locs[2][val_locs]] = 0

    y_rpn_cls = np.concatenate([y_is_box_valid, y_rpn_overlap], axis=1)
    y_rpn_regr = np.concatenate([np.repeat(y_rpn_overlap, 4, axis=1), y_rpn_regr], axis=1)

    return np.copy(y_rpn_cls), np.copy(y_rpn_regr)

首先遍历输出feature_map上的每个点,乘以步长16映射回原始输入得到anchor的中心点,再遍历9种不同大小和宽高比的anchor,每个anchor与所有的gt box计算iou,根据iou与预先设定的thresh来判定anchor的类别。具体的规则如下:

  1. 忽略超出图片边界的anchor
  2. 分类:iou>0.7的anchor为正样本,0.3<iou<0.7的为忽略样本,iou<0.3的为负样本。
  3. 回归:只有iou>0.7的anchor才参与回归计算,若一个anchor和多个gt box的iou都大于0.7,取iou最大的gt box计算回归。若一个gt box和所有anchor的iou都小于0.7,则取iou最大的那个anchor计算回归(除非和所有anchor的iou都等于0)
  4. 因为负样本的数量远大于正样本,论文限制总样本数量为256,若正样本数大于128,则限制其为128,负样本为128,其余的忽略。若正样本数小于128,则保留所有正样本,负样本数为256-正样本数,其余的忽略。

RPN

假设网络输入的shape是(1,900,600,3),即(batch_size, w, h, channel),则经过vgg16的backbone输出的feature_map的shape为(1,56,38,512)

def rpn(base_layers, num_anchors):
    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)

    x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
    x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)

    return [x_class, x_regr, base_layers]

代码里的base_layers即为backbone输出的feature_map,num_anchors=9,RPN的输出x_class的shape为(1,56,38,9),x_regr的shape为(1,56,38,36),即每个anchor的类别score和边框回归值。RPN的作用就是从数量众多的anchor(56*38*9=19152)中选出少量可能有目标的anchor即proposal,注意这里的分类只分前景和背景,并不分具体的类别,网络到这里即为第一阶段。在训练阶段RPN结束后会进行两步操作,一是与calc_rpn的输出进行loss的计算,其中分类是交叉熵loss,回归是smooth L1 loss。二是会从所有anchor中选出proposal,具体做法是首先根据RPN输出的回归值将anchor回归到真实预测框的坐标,对回归的结果进行裁剪,保证预测框在图内,并删除回归结果不合理的框(例如左上x坐标大于右下的)。然后根据分类score做nms,挑选出300个更有可能包含目标的候选框(结果可能会小于300)。

R = roi_helpers.rpn_to_roi(P_rpn[0], P_rpn[1], C, K.image_dim_ordering(), use_regr=True, overlap_thresh=0.7, max_boxes=300)  # _proposal_layer
# 先用P_rpn[1]在特征图上做回归,再根据P_rpn[0]做nms

得到的300个proposal再和gt box计算iou,去掉iou<0.1即容易分类的背景,0.1<iou<0.5的类别标为背景,iou>0.5的类别标为具体的类别(注意和RPN的GT标注不同,那里只标正负样本),并再一次计算和gt box边框的差值。这里标的具体类别以及和gt box的差值即是第二阶段最终网络输出分类和回归的target。

X2, Y1, Y2, IouS = roi_helpers.calc_iou(R, img_data, C, class_mapping)  # proposal_target_layer
# 去掉容易分类的背景,即和GT的IOU小于0.1的。并又和GT做了一次回归的差值
# (1,245,4), (1,245,21), (1,245,160),

ROI Pooling

这里是第二阶段的开始,在ROI Pooling之前,网络会再次限制参与计算的roi(proposal)数量,代码设定为32,和第一阶段一样,若正样本大于16则随机取16个,若小于16则全取,负样本随机取32-正样本个数。因为最终要用全连接层进行分类和回归,而全连接层的输入要固定大小,roi pooling的作用就是将前面挑选出的不同大小的roi统一固定大小。具体做法是首先取32个roi在输出feature_map上的对应区域,每个区域平分成pool_w*pool_h块,论文中pool_w=pool_h=7,然后每个小块做max_pooling,即将不同大小的roi都变成了7*7大小,然后接全连接层得到最终的分类预测和边框回归预测。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

00000cj

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值