目标检测Faster-RCNN详解

目录

概述

代码与环境说明

网络架构

backbone

整体架构

正负例划分

RPN正负例

模型预测正负例

损失函数

RPN损失

回归损失

分类损失

模型预测损失

回归损失

分类损失


概述

        Faster-RCNN是两阶段目标检测算法的典型算法,它不再像古典的目标检测算法使用类似于selective search提取候选框,而是使用RPN(region proposal network)网络提取候选框,因为有RPN网络的加入,Faster-RCNN是可以端对端训练。本文将详解算法结构,正负例划分,损失函数等

代码与环境说明

代码:GitHub - talhuam/faster-rcnn-tf2: A framework for object detection

环境:tensorflow-gpu==2.4.0

显卡:RTX3080 16G

网络架构

backbone

        backbone采用的是VGG16,最后一层maxpooling没有做,最终的特征图相较于最初的图片长和宽都缩小了16倍,如果输入图片是600 x 600 x 3,所得的特征图是37 x 37 x 512

代码实现:

def VGG16(inputs):
    # block1
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(inputs)
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block1_pool')(x)

    # block2
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block2_pool')(x)

    # block3
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block3_pool')(x)

    # block4
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block4_pool')(x)

    # block5
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
    # x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block4_pool')

    return x

整体架构

         经过vgg16主干网络之后会在经过一次3*3的卷积,此后会分别经过1*1的卷积进行通道降维,两次1*1的卷积如下:

  • 上部分卷积之后会生成(37,37,9)的tensor,9用来预测有物体的概率;
  • 下部分会生成(37,37,9 x 4)的tensor,9*4用来预测框的坐标;

        RPN网络之后会进行NMS(non_max_suppression)极大值抑制生成候选框(proposal)。再结合主干网络生成的特征层经过 7 x 7 的ROI pooling层,此过程就是用候选框在特征层上截取,再将截取的部分分成7 x 7的网格,网格内部进行最大池化。截取之前需要将预测出来的[t_x,t_y,t_w,t_h]转化为[x_{min},y_{min},x_{max},y_{max}]

        最后再将ROI pooling的结果经过全连接层,预测出结果。最后多分类的全连接层用的激活函数是softmax,进行多分类,预测出属于各个类别的概率。回归结果需要将预测出来的[t_x,t_y,t_w,t_h]转化为[x_{min},y_{min},x_{max},y_{max}]同类别的框需要再进行一次NMS(阈值0.3),最终输出结果

正负例划分

RPN正负例

        先验框(anchor)的大小有三种,为[128, 256, 512];三种比例尺寸,为[1:1, 1:2, 2:1],特征图的每个特征点对应三种大小三种比例共9个先验框。如上所述,如果输入图片的尺寸为600*600,则特征图的尺寸为37*37,那么共有37*37*9=12321个先验框。如下代码会基于特征图的尺寸计算所有先验框的[x_{min},y_{min},x_{max},y_{max}]的值并进行归一化

def generate_anchors(sizes=[128, 256, 512], ratios=[[1, 1], [1, 2], [2, 1]]):
    """
    生成基础的9个不同尺寸不同比例的框
    :param sizes:
    :param ratios:
    :return:
    """
    num_anchors = len(sizes) * len(ratios)
    anchors = np.zeros((num_anchors, 4))
    anchors[:, 2:] = np.tile(sizes, [2, len(ratios)]).T

    for i in range(len(ratios)):
        anchors[3 * i:3 * i + 3, 2] = anchors[3 * i:3 * i + 3, 2] * ratios[i][0]
        anchors[3 * i:3 * i + 3, 3] = anchors[3 * i:3 * i + 3, 3] * ratios[i][1]

    anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, [2, 1]).T
    anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, [2, 1]).T
    return anchors


def shift(feature_shape, anchors, stride=16):
    """
    对基础的先验框扩展获得所有的先验框
    :param feature_shape:
    :param anchors:
    :param stride:
    :return:
    """
    shift_x = (np.arange(0, feature_shape[1], dtype=np.float_) + 0.5) * stride
    shift_y = (np.arange(0, feature_shape[0], dtype=np.float_) + 0.5) * stride
    # 框中心点的位置
    shift_x, shift_y = np.meshgrid(shift_x, shift_y)
    # 在一维的shift_x中的元素和shift_y中的元素一一对应形成坐标的x和y
    shift_x = np.reshape(shift_x, (-1))
    shift_y = np.reshape(shift_y, (-1))

    # 将shift_x和shift_y堆叠两次,分别来调整左上角和右下角的坐标
    shifts = np.stack([
        shift_x, shift_y, shift_x, shift_y
    ], axis=0)
    shifts = np.transpose(shifts)
    num_anchors = anchors.shape[0]  # 9

    k = shifts.shape[0]  # 37*37 = 1369
    # 对应维度广播,生成先验框在原图上的左上角和右下角的坐标,shape:[1369(7*7),9,4]
    shifted_anchors = np.reshape(anchors, (1, num_anchors, 4)) + tf.reshape(shifts, (k, 1, 4))
    # (1369 * 9,4)
    shifted_anchors = np.reshape(shifted_anchors, [k * num_anchors, 4])

    return shifted_anchors


def get_anchors(input_shape, sizes=[128, 256, 512], ratios=[[1, 1], [1, 2], [2, 1]], stride=16):
    # ------------------------ #
    # vgg16作为主干特征提取网络,最后一次池化没有做,故而是原有的尺寸的1/16
    # 输入如果是600 * 600,则feature_shape就是37 * 37
    # ------------------------ #
    feature_shape = (int(input_shape[0] / 16), int(input_shape[1] / 16))
    anchors = generate_anchors(sizes, ratios)
    anchors = shift(feature_shape, anchors, stride=stride)
    anchors = anchors.copy()
    # 进行归一化
    anchors[:, 0::2] /= input_shape[1]
    anchors[:, 1::2] /= input_shape[0]
    # 裁剪小于0和大于1的值
    anchors = np.clip(anchors, 0, 1)

    return anchors

        所有的先验框和每个真实框(Groud Truth box)计算IoU,IoU大于0.7的作为正例(标签为1);介于0.3到0.7之间的先验框忽略掉(标签为-1),不参与计算loss;小于0.3的先验框为负例(标签为0)。其中正例的 [x_{min},y_{min},x_{max},y_{max}] 需要转化为 [t_x^*,t_y^*,t_w^*,t_h^*],转化公式如下:

\begin{aligned} &t_x^*=(x^*-x_a)/w_a & t_y^*=(y^*-y_a)/h_a \\ &t_w^*=log(w^*/w_a) & t_h^*=log(h^*/h_a) \end{aligned}

代码实现:

# ----------------------------------------------------------------- #
# 将 真实框 和 重合度高的先验框 转化为FRCNN预测结果的格式:[center_x, center_y, width, height]
# ----------------------------------------------------------------- #
# 真实框转化
box_center = 0.5 * (box[2:] + box[:2])
box_wh = box[2:] - box[:2]
# 先验框转化
assigned_anchors_center = 0.5 * (assigned_anchors[:, 2:4] + assigned_anchors[:, :2])
assigned_anchors_wh = assigned_anchors[:, 2:4] - assigned_anchors[:, :2]
# ----------------------------------------------------------------- #
# 计算t_star
# ----------------------------------------------------------------- #
encoded_box[:, :2][assign_mask] = (box_center - assigned_anchors_center) / assigned_anchors_wh
encoded_box[:, :2][assign_mask] /= np.array(variance)[:2]

encoded_box[:, 2:4][assign_mask] = np.log(box_wh/assigned_anchors_wh)
encoded_box[:, 2:4][assign_mask] /= np.array(variance)[2:]

正负例样本数总和256,正负例各128,正例不足用负例填充,超过总数256的标签值置为-1忽略掉,代码实现如下:

# --------------------------------------- #
# 对正样本和负样本进行筛选,训练样本之和为256
# --------------------------------------- #
pos_idx = np.where(classification > 0)[0]
num_pos = len(pos_idx)
if num_pos > self.num_sample // 2:
	num_pos = self.num_sample // 2
	disable_index = np.random.choice(pos_idx, size=(len(pos_idx) - self.num_sample // 2), replace=False)
	classification[disable_index] = -1
	regression[disable_index, -1] = -1

neg_idx = np.where(classification == 0)[0]
num_neg = self.num_sample - num_pos
if len(neg_idx) > num_neg:
	disable_index = np.random.choice(neg_idx, size=(len(neg_idx) - num_neg), replace=False)
	classification[disable_index] = -1
	regression[disable_index, -1] = -1

模型预测正负例

        RPN网络正向传播之后,会进行NMS非极大值抑制过滤出k个候选框(proposal),NMS阈值为0.7。k个候选框将和每个真实框计算IoU,候选框的标签值(cat,dog,...)为对应IoU最大的真实框的标签值,和真实框最大IoU大于等于0.5的作为正例,介于0到0.5之间的作为负例,正例和负例数相加为128,正例不足的用负例填充

# 获得每个建议框roi最对应的真实框的iou
max_iou = np.max(iou, axis=1)  # [len(R) + len(bboxes),] 又 [num_roi,]
gt_assignment = np.argmax(iou, axis=1)  # [num_roi,]

# 和哪个GT的IoU最大,标签就是哪个GT的标签,[num_roi,]
gt_roi_label = label[gt_assignment]

# ------------------------------------------------------------ #
# 和GT的IoU大于pos_iou_thresh是正例
# 将正例控制在n_sample/2之下,如果超过了则随机截取,如果不够则用负例填充
# ------------------------------------------------------------ #
pos_indices = np.where(max_iou >= self.pos_iou_thresh)[0]
pos_roi_per_this_image = int(min(self.n_sample // 2, pos_indices.size))
if pos_indices.size > pos_roi_per_this_image:
	# replace:True可以重复先择,False不可以重复选择,元素不够则报错
	pos_indices = np.random.choice(pos_indices, size=pos_roi_per_this_image, replace=False)

# ------------------------------------------------------------ #
# 和GT的IoU大于neg_iou_thresh_low,小于neg_iou_thresh_high的作为负例
# 正例数量和负例的数量相加等于n_sample
# ------------------------------------------------------------ #
neg_indices = np.where((max_iou >= self.neg_iou_thresh_low) & (max_iou < self.neg_iou_thresh_high))[0]
neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image
if neg_roi_per_this_image > neg_indices.size:
	neg_indices = np.random.choice(neg_indices, size=neg_roi_per_this_image, replace=True)
else:
	neg_indices = np.random.choice(neg_indices, size=neg_roi_per_this_image, replace=False)

# 正例和负例的索引
keep_indices = np.append(pos_indices, neg_indices)
# 保留下来的正负例框,[n_samples, 4]
sample_roi = R[keep_indices]

损失函数

RPN损失

        RPN损失包含两部分,一部分是坐标的回归损失,另一部分是二分类损失(有无object)

回归损失

        回归损失使用的是smooth L1损失,公式如下,公式中的x=\hat{y}-y

smooth\_l1\left\{\begin{matrix}0.5*x^2, if&|x|<1 & \\ |x|-0.5&otherwise & \end{matrix}\right.

实现代码如下:

def rpn_smooth_l1(sigma=1.0):
    """
    rpn smooth l1 损失,只有正例和GT计算损失
    :param sigma:
    :return:
    """
    sigma_squared = sigma ** 2

    def _rpn_smooth_l1(y_true, y_pred):
        # ----------------------- #
        # y_true [batch_size, num_anchors, 4 + 1]
        # y_pred [batch_size, num_anchors, 4]
        # ----------------------- #
        regression = y_pred
        regression_target = y_true[:, :, :-1]
        # ----------------------- #
        # -1是要忽略的,0是背景,1是存在目标
        # ----------------------- #
        anchor_state = y_true[:, :, -1]
        # ----------------------- #
        # 获取正样本
        # ----------------------- #
        indices = tf.where(tf.keras.backend.equal(anchor_state, 1))
        regression = tf.gather_nd(regression, indices)  # 2-D array
        regression_target = tf.gather_nd(regression_target, indices)  # 2-D array
        # ----------------------- #
        # 计算smooth l1损失:
        # 0.5*x^2 if |x|<1
        # |x|-0.5 otherwise
        # ----------------------- #
        regression_diff = regression - regression_target
        x = tf.abs(regression_diff)
        loss_arr = tf.where(x < 1.0 / sigma_squared,
                            0.5 * sigma_squared * tf.pow(x, 2),
                            x - 0.5 / sigma_squared
                            )
        # 将loss全部加起来
        total_loss = tf.reduce_sum(loss_arr)
        # total_loss 除以样本数,计算平均loss
        num_indices = tf.maximum(1., tf.cast(tf.shape(indices)[0], tf.float32))
        avg_loss = total_loss / num_indices
        return avg_loss

    return _rpn_smooth_l1

分类损失

        分类损失使用的是二分类的交叉熵损失,实现代码如下:

def rpn_cls_loss():
    """
    rpn只做二分类,是否有object
    二分类交叉熵损失
    :return:
    """

    def _rpn_cls_loss(y_true, y_pred):
        # ----------------------- #
        # y_true [batch_size,num_anchor,1]
        # y_pred [batch_size,num_anchor,1]
        # -1是要忽略的,0是背景,1是存在目标
        # ----------------------- #
        anchor_state = y_true
        # ----------------------- #
        # 获得无需忽略的所有样本
        # ----------------------- #
        indices_for_not_ignore = tf.where(tf.keras.backend.not_equal(anchor_state, -1))
        y_true_no_ignore = tf.gather_nd(y_true, indices_for_not_ignore)  # 1-D array
        y_pred_no_ignore = tf.gather_nd(y_pred, indices_for_not_ignore)  # 1-D array
        # ----------------------- #
        # 计算交叉熵
        # ----------------------- #
        y_true_no_ignore = tf.cast(y_true_no_ignore, dtype=tf.float32)
        y_pred_no_ignore = tf.cast(y_pred_no_ignore, dtype=tf.float32)
        cross_entropy_loss = tf.keras.losses.binary_crossentropy(y_true_no_ignore, y_pred_no_ignore)
        return cross_entropy_loss

    return _rpn_cls_loss

模型预测损失

        模型最终的预测损失也有回归损失和分类损失,和RPN分类损失不一样的是,RPN分类是二分类,即是否有物体,最终模型分类损失是多分类,即属于哪一类(cat? dog? car?...)

回归损失

        和RPN回归损失一样,这里用的也是smooth L1损失,代码如下:

def classifier_smooth_l1(num_classes, sigma=1.0):
    """
    最后框的回归损失函数
    :param num_classes:
    :param sigma:
    :return:
    """
    sigma_squared = sigma ** 2
    epsilon = 1e-4

    def _classifier_smooth_l1(y_true, y_pred):
        regression = y_pred
        regression_target = y_true[:, :, 4 * num_classes:]

        regression_diff = regression_target - regression
        x = tf.abs(regression_diff)
        loss_arr = tf.where(x < 1 / sigma_squared,
                            0.5 / sigma_squared * tf.pow(x, 2),
                            x - 0.5 / sigma_squared
                            )

        loss = tf.reduce_sum(loss_arr * y_true[:, :, :4 * num_classes]) * 4
        normalizer = tf.keras.backend.sum(epsilon + y_true[:, :, :4 * num_classes])
        loss = loss / normalizer
        return loss
    return _classifier_smooth_l1

分类损失

        多分类交叉熵,实现代码如下:

def classifier_cls_loss():
    """
    最后分类的损失函数
    :return:
    """
    def _classifier_cls_loss(y_true, y_pred):
        return tf.keras.losses.categorical_crossentropy(y_true, y_pred)

    return _classifier_cls_loss

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值