Faster R-CNN源码中RPN的解析(自用)

参考博客(一定要看前面两个)

一文看懂Faster R-CNN

详细的Faster R-CNN源码解析之RPN源码解析

关于RPN一些我的想法

rpn的中心思想就是在了anchors了,如何产生anchors,如果稍微有些了解的话会知道通过generate_anchors.py产生了9个anchors,然后通过偏移值得到所有的anchors。有一个疑问是如何让网络去学习这些呢,其实是不用学习些anchors的,因为这些anchor都是固定的。同时网络中通过1*1的卷积直接映射成50*38*36(假设输入图像时800*600,得到800/16,600/16),这些映射出来的东西我们是可以学习的,这些参数是什么,干什么用的呢,其实并不是我们想象的直接是边框的中心坐标和长宽。

#在这里将预测的tx,ty,th,tw和标签做减法,并乘以rpn_inside_weights,意思是只对positive anchor计算bbox loss,rpn_bbox_pred是通过1×1卷积产生的
diff = tf.multiply(rpn_inside_weights, rpn_bbox_pred - rpn_bbox_targets)

 里面的rpn_bbox_pred就是上面说的映射出来的,loss函数的设计是让rpn_bbox_pred尽可能的接近rpn_bbox_targets。所以现在得知道rpn_bbox_targets是什么了,rpn_bbox_targets是部分满足条件的anchors与gt_boxes(前景的位置信息)的偏移值。也就是说网络学习的是如何映射出这些偏移值就可以了。_anchor_target_layer_py是可以求出anchors与gt_boxes(前景的位置信息)的偏移值的函数,接下来我会介绍。

对应于Faster RCNN原文,anchors(x_{a}y_{a}w_{a}h_{a})与ground truth(x,y,w,h)之间的平移量 (t_x, t_y) 与尺度因子 (t_w, t_h) 如下:

t_x=(x-x_a)/w_a\ \ \ \  t_y=(x-y_a)/h_a\\

t_w=\log(w/w_a)\ \ \ \ t_h=\log(h/h_a)\\

(t_x, t_y, t_w, t_h)就是anchor与gt_box的pian偏移值,这个就是要学习的,及是feature map映射的结果50*38*36。

具体为什么要这么做,可以去看下面bounding box regression原理

bounding box regression原理

如图9所示绿色框为飞机的Ground Truth(GT),红色为提取的foreground anchors,即便红色的框被分类器识别为飞机,但是由于红色的框定位不准,这张图相当于没有正确的检测出飞机。所以我们希望采用一种方法对红色的框进行微调,使得foreground anchors和GT更加接近。

图10

对于窗口一般使用四维向量 (x, y, w, h) 表示,分别表示窗口的中心点坐标和宽高。对于图 11,红色的框A代表原始的Foreground Anchors,绿色的框G代表目标的GT,我们的目标是寻找一种关系,使得输入原始的anchor A经过映射得到一个跟真实窗口G更接近的回归窗口G',即:

  • 给定:anchor A=(A_{x}, A_{y}, A_{w}, A_{h})GT=[G_{x}, G_{y}, G_{w}, G_{h}]
  • 寻找一种变换F,使得:F(A_{x}, A_{y}, A_{w}, A_{h})=(G_{x}^{'}, G_{y}^{'}, G_{w}^{'}, G_{h}^{'}),其中(G_{x}^{'}, G_{y}^{'}, G_{w}^{'}, G_{h}^{'})≈(G_{x}, G_{y}, G_{w}, G_{h})

图11

那么经过何种变换F才能从图10中的anchor A变为G'呢? 比较简单的思路就是:

  • 先做平移

G_x'=A_w\cdot d_x(A) +A_x\\

G_y'=A_h\cdot d_y(A) +A_y\\

  • 再做缩放

G_w'=A_w\cdot \exp(d_w(A))\\

G_h'=A_h\cdot \exp(d_h(A))\\

观察上面4个公式发现,需要学习的是 d_{x}(A),d_{y}(A),d_{w}(A),d_{h}(A) 这四个变换。当输入的anchor A与GT相差较小时,可以认为这种变换是一种线性变换, 那么就可以用线性回归来建模对窗口进行微调(注意,只有当anchors A和GT比较接近时,才能使用线性回归模型,否则就是复杂的非线性问题了)。
接下来的问题就是如何通过线性回归获得 d_{x}(A),d_{y}(A),d_{w}(A),d_{h}(A) 了。线性回归就是给定输入的特征向量X, 学习一组参数W, 使得经过线性回归后的值跟真实值Y非常接近,即Y=WX。对于该问题,输入X是cnn feature map,定义为Φ;同时还有训练传入A与GT之间的变换量,即(t_{x}, t_{y}, t_{w}, t_{h})。输出是d_{x}(A),d_{y}(A),d_{w}(A),d_{h}(A)四个变换。那么目标函数可以表示为:

d_*(A)=W_*^T\cdot \phi(A)\\

其中 \phi(A) 是对应anchor的feature map组成的特征向量, W_* 是需要学习的参数, d_*(A) 是得到的预测值(*表示 x,y,w,h,也就是每一个变换对应一个上述目标函数)。为了让预测值 d_*(A) 与真实值 t_* 差距最小,设计损失函数:

\text{Loss}=\sum_{i}^{N}{(t_*^i-W_*^T\cdot \phi(A^i))^2}\\

函数优化目标为:

\hat{W}_*=\text{argmin}_{W_*}\sum_{i}^{n}(t_*^i- W_*^T\cdot \phi(A^i))^2+\lambda||W_*||^2\\

需要说明,只有在GT与需要回归框位置比较接近时,才可近似认为上述线性变换成立。
说完原理,对应于Faster RCNN原文,foreground anchor与ground truth之间的平移量 (t_x, t_y) 与尺度因子 (t_w, t_h) 如下:

t_x=(x-x_a)/w_a\ \ \ \  t_y=(x-y_a)/h_a\\

t_w=\log(w/w_a)\ \ \ \ t_h=\log(h/h_a)\\

对于训练bouding box regression网络回归分支,输入是cnn feature Φ,监督信号是Anchor与GT的差距 (t_x, t_y, t_w, t_h),即训练目标是:输入 Φ的情况下使网络输出与监督信号尽可能接近。
那么当bouding box regression工作时,再输入Φ时,回归网络分支的输出就是每个Anchor的平移量和变换尺度 (t_x, t_y, t_w, t_h),显然即可用来修正Anchor位置了。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 17 15:05:05 2017
@author: Kevin Liang
Loss functions
"""
 
from .faster_rcnn_config import cfg
 
import tensorflow as tf
 
 
def rpn_cls_loss(rpn_cls_score,rpn_labels):
    '''
    Calculate the Region Proposal Network classifier loss. Measures how well 
    the RPN is able to propose regions by the performance of its "objectness" 
    classifier.
    
    Standard cross-entropy loss on logits
    '''
    with tf.variable_scope('rpn_cls_loss'):
        # input shape dimensions
        shape = tf.shape(rpn_cls_score)
        
        # Stack all classification scores into 2D matrix
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,3,1,2])
        rpn_cls_score = tf.reshape(rpn_cls_score,[shape[0],2,shape[3]//2*shape[1],shape[2]])
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,2,3,1])
        rpn_cls_score = tf.reshape(rpn_cls_score,[-1,2])
        
        # Stack labels
        rpn_labels = tf.reshape(rpn_labels,[-1]) #在这里先讲label展开成one_hot向量
        
        # Ignore label=-1 (Neither object nor background: IoU between 0.3 and 0.7)
		#在这里对应label中为-1值的位置排除掉score中的值,并且变成[-1,2]的形状方便计算交叉熵loss
        rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score,tf.where(tf.not_equal(rpn_labels,-1))),[-1,2])
		#在这里留下label中的非-1的值,表示对应的anchor与gt的IoU在0.7以上
        rpn_labels = tf.reshape(tf.gather(rpn_labels,tf.where(tf.not_equal(rpn_labels,-1))),[-1]) 
        
        # Cross entropy error 在这里计算交叉熵loss
        rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_labels))
    
    return rpn_cross_entropy
    
    
def rpn_bbox_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_inside_weights, rpn_outside_weights):
    '''
    Calculate the Region Proposal Network bounding box loss. Measures how well 
    the RPN is able to propose regions by the performance of its localization.
    lam/N_reg * sum_i(p_i^* * L_reg(t_i,t_i^*))
    lam: classification vs bbox loss balance parameter     
    N_reg: Number of anchor locations (~2500)
    p_i^*: ground truth label for anchor (loss only for positive anchors)
    L_reg: smoothL1 loss
    t_i: Parameterized prediction of bounding box
    t_i^*: Parameterized ground truth of closest bounding box
    '''    
    with tf.variable_scope('rpn_bbox_loss'):
        # Transposing
        rpn_bbox_targets = tf.transpose(rpn_bbox_targets, [0,2,3,1])
        rpn_inside_weights = tf.transpose(rpn_inside_weights, [0,2,3,1])
        rpn_outside_weights = tf.transpose(rpn_outside_weights, [0,2,3,1])
        
        # How far off was the prediction?
	#在这里将预测的tx,ty,th,tw和标签做减法,并乘以rpn_inside_weights,意思是只对positive anchor计算bbox loss
        diff = tf.multiply(rpn_inside_weights, rpn_bbox_pred - rpn_bbox_targets)
	#在这里计算smooth_L1结果
        diff_sL1 = smoothL1(diff, 3.0)
        
        # Only count loss for positive anchors. Make sure it's a sum.
	#在这里将上面的运算结果乘以rpn_outside_weights并且求和,同样是只对positive anchor计算bbox loss
 
        rpn_bbox_reg = tf.reduce_sum(tf.multiply(rpn_outside_weights, diff_sL1))
    
        # Constant for weighting bounding box loss with classification loss
	#在这里将边框误差再乘以一个lambda参数,作为最终的边框误差
        rpn_bbox_reg = cfg.TRAIN.RPN_BBOX_LAMBDA * rpn_bbox_reg
    
    return rpn_bbox_reg #返回最终的误差

 

 

RPN代码解析,其实都是别人写的

首先,在faster_rcnn_resnet50ish.py文件中,我们看一下训练时数据层输出的是:

# Train data
 self.x['TRAIN'] = tf.placeholder(tf.float32, [1, None, None, 3]) #图片
 self.im_dims['TRAIN'] = tf.placeholder(tf.int32, [None, 2]) #图像尺度 [height, width]
 self.gt_boxes['TRAIN'] = tf.placeholder(tf.int32, [None, 5]) #目标框

   可以看到,输入网络的首先是图片。然后图像的宽高,因为对于不同尺寸的图像生成的anchor坐标也是不同的。最后是目标框信息,目标框信息的第二维包含五元,前四元是目标的坐标,最后一元是目标的类别。

仔细分析一下_anchor_target_layer_py函数,包括以下几个步骤

下面,我们就来仔细分析一下_anchor_target_layer_py函数。在该函数中,首先通过generate_anchors函数生成了9个候选框,然后按照在共享特征上每滑动一次对应到原图的位置生成候选框,即all_anchors。紧接着,排除了全部边框超过图像边界的候选框,得到anchors,之后的操作都是针对图像内部的anchors。然后,通过bbox_overlaps函数计算了所有边界内anchor与包围框之间的IoU值。接着,排除了IoU在0.3到0.7之间的anchor(通过将labels对应的值置为-1),并且为训练安排了合适数量的前景anchor和背景anchor。然后,通过_compute_targets函数计算出了每个anchor对应的坐标变换值(tx,ty,th,tw),存在bbox_targets数组里面。再计算了bbox_inside_weights和bbox_outside_weights,这两个数组在训练anchor边框修正时有重大作用。最后,通过_unmap函数将所有图像边框内部的anchor映射回所有的anchor。

(1)首先通过generate_anchors函数生成了9个候选框,然后按照在共享特征上每滑动一次对应到原图的位置生成候选框,即all_anchors

如果图片输入网络的大小是800*600,那么进过一系列卷积之后的生成的feature map是50*38*512,然后在通过3*3的卷积核得到50*38*256的feature map。

假设将50*38*256压缩成50*38。其实generate_anchors函数生成的anchor就是50*38里面[0,0]这一点的9个anchor,50*38剩下点又和[0,0]这一点有联系,这个联系就是代码里面的偏移值shifts。然后通过这个联系将所有的anchor都求出来。注意:这里面的anchors都是相对于原图的,而且是左上右下的坐标。

_anchor_target_layer_py函数里面关于这方面的代码

    im_dims = im_dims[0]  # 获得原图的尺度[height, width]
    _anchors = generate_anchors(scales=np.array(anchor_scales))  # 生成9个锚点,shape: [9,4],
    _num_anchors = _anchors.shape[0]  # _num_anchors值为9

    # allow boxes to sit over the edge by a small amount
    _allowed_border = 0  # 将anchor超出边界的限度设置为0

    # Only minibatch of 1 supported 在这里核验batch_size是否为1
    assert rpn_cls_score.shape[0] == 1, \
        'Only single item batches are supported'

    # map of shape (..., H, W)
    height, width = rpn_cls_score.shape[1:3]  # 在这里得到了rpn输出的H和W,总的anchor数目应该是H×W×9

    # 1. Generate proposals from bbox deltas and shifted anchors
    # 下面是在原图上生成anchor
    shift_x = np.arange(0, width) * _feat_stride  # shape: [width,]
    shift_y = np.arange(0, height) * _feat_stride  # shape: [height,]
    shift_x, shift_y = np.meshgrid(shift_x,
                                   shift_y)  # 生成网格 shift_x shape: [height, width], shift_y shape: [height, width]
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                        shift_x.ravel(), shift_y.ravel())).transpose()  # shape[height*width, 4]

    # add A anchors (1, A, 4) to
    # cell K shifts (K, 1, 4) to get
    # shift anchors (K, A, 4)
    # reshape to (K*A, 4) shifted anchors
    A = _num_anchors  # A = 9
    K = shifts.shape[0]  # K=height*width(特征图上的)
    all_anchors = (_anchors.reshape((1, A, 4)) +
                   shifts.reshape((1, K, 4)).transpose((1, 0, 2)))  # shape[K,A,4] 得到所有的anchor

接下来解释一下generate_anchors这个函数吧。

# -*- coding: utf-8 -*-

import numpy as np
 
# Verify that we compute the same anchors as Shaoqing's matlab implementation:
#
#    >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
#    >> anchors
#
#    anchors =
#
#       -83   -39   100    56
#      -175   -87   192   104
#      -359  -183   376   200
#       -55   -55    72    72
#      -119  -119   136   136
#      -247  -247   264   264
#       -35   -79    52    96
#       -79  -167    96   184
#      -167  -343   184   360
 
#array([[ -83.,  -39.,  100.,   56.],
#       [-175.,  -87.,  192.,  104.],
#       [-359., -183.,  376.,  200.],
#       [ -55.,  -55.,   72.,   72.],
#       [-119., -119.,  136.,  136.],
#       [-247., -247.,  264.,  264.],
#       [ -35.,  -79.,   52.,   96.],
#       [ -79., -167.,   96.,  184.],
#       [-167., -343.,  184.,  360.]])
 
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2**np.arange(3, 6)):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, 15, 15) window.
    """
    #请注意anchor的表示形式有两种,一种是记录左上角和右下角的坐标,一种是记录中心坐标和宽高
    #这里生成一个基准anchor,采用左上角和右下角的坐标表示[0,0,15,15]
    base_anchor = np.array([1, 1, base_size, base_size]) - 1 #[0,0,15,15]
    ratio_anchors = _ratio_enum(base_anchor, ratios) #shape: [3,4],返回的是不同长宽比的anchor
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in range(ratio_anchors.shape[0])])#生成九个候选框 shape: [9,4] 
    return anchors
 
def _whctrs(anchor):#传入anchor的左上角和右下角的坐标,返回anchor的中心坐标和长宽
    """
    Return width, height, x center, and y center for an anchor (window).
    """
 
    w = anchor[2] - anchor[0] + 1
    h = anchor[3] - anchor[1] + 1
    x_ctr = anchor[0] + 0.5 * (w - 1)
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr
 
def _mkanchors(ws, hs, x_ctr, y_ctr):#由anchor中心和长宽坐标返回window,记录左上角和右下角的坐标
    """
    Given a vector of widths (ws) and heights (hs) around a center
    (x_ctr, y_ctr), output a set of anchors (windows).
    """
 
    ws = ws[:, np.newaxis] #shape: [3,1]
    hs = hs[:, np.newaxis] #shape: [3,1]
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))
    return anchors #shape [3,4],对于每个anchor,返回了左上角和右下角的坐标值
 
def _ratio_enum(anchor, ratios): #这个函数计算不同长宽尺度下的anchor的坐标
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """
 
    w, h, x_ctr, y_ctr = _whctrs(anchor) #找到anchor的中心点和长宽
    size = w * h #返回anchor的面积
    size_ratios = size / ratios #为了计算anchor的长宽尺度设置的数组:array([512.,256.,128.])
    ws = np.round(np.sqrt(size_ratios)) #计算不同长宽比下的anchor的宽:array([23.,16.,11.])
    hs = np.round(ws * ratios) #计算不同长宽比下的anchor的长 array([12.,16.,22.])
    #请大家注意,对应位置上ws和hs相乘,面积都为256左右
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)#返回新的不同长宽比的anchor 返回的数组shape:[3,4],请注意anchor记录的是左上角和右下角的坐标
    return anchors
 
def _scale_enum(anchor, scales): #这个函数对于每一种长宽比的anchor,计算不同面积尺度的anchor坐标
    """
    Enumerate a set of anchors for each scale wrt an anchor.
    """
 
    w, h, x_ctr, y_ctr = _whctrs(anchor) #找到anchor的中心坐标
    ws = w * scales #shape [3,] 得到不同尺度的新的宽
    hs = h * scales #shape [3,] 得到不同尺度的新的高
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr) #得到不同面积尺度的anchor信息,对应的是左上角和右下角的坐标
    return anchors
 
if __name__ == '__main__':
    import time
    t = time.time()
    a = generate_anchors()
    print(time.time() - t)
    print(a)
    from IPython import embed; embed()

(2)紧接着,排除了全部边框超过图像边界的候选框,得到anchors。

inds_inside = np.where(
    (all_anchors[:, 0] >= -_allowed_border) &
    (all_anchors[:, 1] >= -_allowed_border) &
    (all_anchors[:, 2] < im_dims[1] + _allowed_border) &  # width
    (all_anchors[:, 3] < im_dims[0] + _allowed_border)  # height
)[0]

# keep only inside anchors
anchors = all_anchors[inds_inside, :]  # 在这里选出合理的anchors,指的是没超出边界的

(3)然后,通过bbox_overlaps函数计算了所有边界内anchor与包围框之间的IoU值。接着,排除了IoU在0.3到0.7之间的anchor(通过将labels对应的值置为-1),并且为训练安排了合适数量的前景anchor和背景anchor。

首选设置一个labels,这里面就可以记录那些anchors是符合条件的。

# label: 1 is positive, 0 is negative, -1 is dont care
labels = np.empty((len(inds_inside),), dtype=np.float32)  # labels的长度就是合法的anchor的个数
labels.fill(-1)  # 先用-1填充labels

通过函数bbox_overlaps将anchors与gt_boxes的IOU(overlaps)求出来。将这些IOU值保存在一个二维的矩阵overlaps。

    # overlaps between the anchors and the gt boxes
    # overlaps (ex, gt)
    # 对所有的没超过图像边界的anchors计算overlap,得到的shape: [len(anchors), len(gt_boxes)]
    overlaps = bbox_overlaps(
        np.ascontiguousarray(anchors, dtype=np.float),
        np.ascontiguousarray(gt_boxes, dtype=np.float))
    argmax_overlaps = overlaps.argmax(axis=1)  # 对于每个anchor,找到对应的gt_box坐标。shape: [len(anchors),]
    max_overlaps = overlaps[
        np.arange(len(inds_inside)), argmax_overlaps]  # 对于每个anchor,找到最大的overlap的gt_box shape: [len(anchors)]
    gt_argmax_overlaps = overlaps.argmax(axis=0)  # 对于每个gt_box,找到对应的最大overlap的anchor。shape[len(gt_boxes)]
    gt_max_overlaps = overlaps[gt_argmax_overlaps,
                               np.arange(overlaps.shape[1])]  # 对于每个gt_box,找到与anchor的最大IoU值。shape[len(gt_boxes),]
    gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[
        0]  # 再次对于每个gt_box,找到对应的最大overlap的anchor。shape[len(gt_boxes)]

    if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:  # 如果不需要抑制positive的anchor,就先给背景anchor赋值,这样在赋前景值的时候可以覆盖。
        # assign bg labels first so that positive labels can clobber them
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0  # 在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0

    # fg label: for each gt, anchor with highest overlap
    labels[gt_argmax_overlaps] = 1  # 在这里将每个gt_box对应IoU最大的anchor置1

    # fg label: above threshold IOU
    labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1  # 在这里将最大IoU大于阈值(0.7)的某些anchor置1

    if cfg.TRAIN.RPN_CLOBBER_POSITIVES:  # 如果需要抑制positive的anchor,就将背景anchor后赋值
        # assign bg labels last so that negative labels can clobber positives
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0  # 在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0

    # subsample positive labels if we have too many
    num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)  # 计算出一个训练batch中需要的前景的数量
    fg_inds = np.where(labels == 1)[0]  # 找出被置为前景的anchors
    if len(fg_inds) > num_fg:
        disable_inds = npr.choice(

            fg_inds, size=(len(fg_inds) - num_fg), replace=False)
        labels[disable_inds] = -1  # 如果事实存在的前景anchor大于了所需值,就随机抛弃一些前景anchor

    # subsample negative labels if we have too many
    num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)  ##计算出一个训练batch中需要的背景的数量
    bg_inds = np.where(labels == 0)[0]  # 找出被置为背景的anchors
    if len(bg_inds) > num_bg:
        disable_inds = npr.choice(
            bg_inds, size=(len(bg_inds) - num_bg), replace=False)
        labels[disable_inds] = -1  # 如果事实存在的背景anchor大于了所需值,就随机抛弃一些背景anchor

这个函数对于每一个anchor,和所有的ground truth box计算IoU值,代码如下: 

# -*- coding: utf-8 -*-
 
cimport cython
import numpy as np
cimport numpy as np
 
DTYPE = np.float
ctypedef np.float_t DTYPE_t
 
def bbox_overlaps(#计算重合程度,两个框之间的重合区域的面积 / 两个区域一共加起来的面积
        np.ndarray[DTYPE_t, ndim=2] boxes,
        np.ndarray[DTYPE_t, ndim=2] query_boxes):
    """
    Parameters
    ----------
    boxes: (N, 4) ndarray of float
    query_boxes: (K, 4) ndarray of float
    Returns
    -------
    overlaps: (N, K) ndarray of overlap between boxes and query_boxes
    """
    cdef unsigned int N = boxes.shape[0]
    cdef unsigned int K = query_boxes.shape[0]
    cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
    cdef DTYPE_t iw, ih, box_area
    cdef DTYPE_t ua
    cdef unsigned int k, n
    for k in range(K):
        box_area = (
            (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
            (query_boxes[k, 3] - query_boxes[k, 1] + 1)
        )
        for n in range(N):
            iw = (
                min(boxes[n, 2], query_boxes[k, 2]) -
                max(boxes[n, 0], query_boxes[k, 0]) + 1
            )
            if iw > 0:
                ih = (
                    min(boxes[n, 3], query_boxes[k, 3]) -
                    max(boxes[n, 1], query_boxes[k, 1]) + 1
                )
                if ih > 0:
                    ua = float(
                        (boxes[n, 2] - boxes[n, 0] + 1) *
                        (boxes[n, 3] - boxes[n, 1] + 1) +
                        box_area - iw * ih
                    )
                    overlaps[n, k] = iw * ih / ua
    return overlaps

 

(4)然后,通过_compute_targets函数计算出了每个anchor对应的坐标变换值(tx,ty,th,tw),存在bbox_targets数组里面。

bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)  # 对每个在原图内部的anchor,用全0初始化坐标变换值
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])  # 对于每个anchor,找到变换到对应的最大的overlap的gt_box的四个值

在计算anchor的坐标变换值的时候,使用到了bbox_transform函数,请注意在计算坐标变换的时候是将anchor的表示形式变成中心坐标与长宽。 

# -*- coding: utf-8 -*-
 
import numpy as np
 
def bbox_transform(ex_rois, gt_rois):
    '''
    Receives two sets of bounding boxes, denoted by two opposite corners 
    (x1,y1,x2,y2), and returns the target deltas that Faster R-CNN should aim 
    for.
    '''
    ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
    ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
    ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
    ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights  #计算得到每个anchor的中心坐标和长宽
 
    gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
    gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
    gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
    gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights  #计算每个anchor对应的ground truth box对应的中心坐标和长宽
 
    targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths #计算四个坐标变换值
    targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
    targets_dw = np.log(gt_widths / ex_widths)
    targets_dh = np.log(gt_heights / ex_heights)
 
    targets = np.vstack(
        (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()#对于每一个anchor,得到四个关系值 shape: [4, num_anchor]
    return targets

(4)再计算了bbox_inside_weights和bbox_outside_weights,这两个数组在训练anchor边框修正时有重大作用。

    bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)  # 使用全0初始化inside_weights
    bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)  # 在前景anchor处赋权重, TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)

    bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)  # 使用全0初始化outside_weights
    if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:  # 如果RPN_POSITIVE_WEIGHT小于0的话,
        # uniform weighting of examples (given non-uniform sampling)
        num_examples = np.sum(labels >= 0)
        positive_weights = np.ones((1, 4)) * 1.0 / num_examples  # 则positive_weights和negative_weights都一样
        negative_weights = np.ones((1, 4)) * 1.0 / num_examples
    else:
        assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))  # 如果RPN_POSITIVE_WEIGHT位于0和1之间的话,
        positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                            np.sum(labels == 1))
        negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                            np.sum(labels == 0))  # 则positive_weights和negative_weights分别赋值
    bbox_outside_weights[labels == 1, :] = positive_weights
    bbox_outside_weights[labels == 0, :] = negative_weights  # 将positive_weights和negative_weights赋给bbox_outside_weights

(5)最后,通过_unmap函数将所有图像边框内部的anchor映射回所有的anchor。

# map up to original set of anchors
labels = _unmap(labels, total_anchors, inds_inside,
                fill=-1)  # 把图像内部的anchor对应的label映射回总的anchor(加上了那些超出边界的anchor,类别填充-1)
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside,
                      fill=0)  # 把图像内部的anchor对应的bbox_target映射回所有的anchor(加上了那些超出边界的anchor,填充0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside,
                             fill=0)  # 把图像内部的anchor对应的inside_weights映射回总的anchor(加上了那些超出边界的anchor,填充0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside,
                              fill=0)  # 把图像内部的anchor对应的outside_weights映射回总的anchor(加上了那些超出边界的anchor,填充0)

总的源代码

# -*- coding: utf-8 -*-
"""
Created on Sun Jan  1 16:11:17 2017
@author: Kevin Liang (modifications)
Anchor Target Layer: Creates all the anchors in the final convolutional feature
map, assigns anchors to ground truth boxes, and applies labels of "objectness"
Adapted from the official Faster R-CNN repo:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/anchor_target_layer.py
"""

# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------

import sys

sys.path.append('../')

import numpy as np
import numpy.random as npr
import tensorflow as tf

from Lib.bbox_overlaps import bbox_overlaps
from Lib.bbox_transform import bbox_transform
from Lib.faster_rcnn_config import cfg
from Lib.generate_anchors import generate_anchors


# 该函数计算每个anchor对应的ground truth(前景/背景,坐标偏移值)
def anchor_target_layer(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):
    '''
    Make Python version of _anchor_target_layer_py below Tensorflow compatible
    '''
    # 执行_anchor_target_layer_py函数,传参有网络预测的rpn分类分数,ground_truth_box,图像的尺寸,与原图相比特征图缩小的比例和anchor的尺度
    rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = \
        tf.py_func(_anchor_target_layer_py, [rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales],
                   [tf.float32, tf.float32, tf.float32, tf.float32])

    # 转化成tensor
    rpn_labels = tf.convert_to_tensor(tf.cast(rpn_labels, tf.int32), name='rpn_labels')
    rpn_bbox_targets = tf.convert_to_tensor(rpn_bbox_targets, name='rpn_bbox_targets')
    rpn_bbox_inside_weights = tf.convert_to_tensor(rpn_bbox_inside_weights, name='rpn_bbox_inside_weights')
    rpn_bbox_outside_weights = tf.convert_to_tensor(rpn_bbox_outside_weights, name='rpn_bbox_outside_weights')

    return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights


def _anchor_target_layer_py(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):
    """
    Python version

    Assign anchors to ground-truth targets. Produces anchor classification
    labels and bounding-box regression targets.

    # Algorithm:
    #
    # for each (H, W) location i
    #   generate 9 anchor boxes centered on cell i
    #   apply predicted bbox deltas at cell i to each of the 9 anchors
    # filter out-of-image anchors
    # measure GT overlap
    """
    im_dims = im_dims[0]  # 获得原图的尺度[height, width]
    _anchors = generate_anchors(scales=np.array(anchor_scales))  # 生成9个锚点,shape: [9,4],
    _num_anchors = _anchors.shape[0]  # _num_anchors值为9

    # allow boxes to sit over the edge by a small amount
    _allowed_border = 0  # 将anchor超出边界的限度设置为0

    # Only minibatch of 1 supported 在这里核验batch_size是否为1
    assert rpn_cls_score.shape[0] == 1, \
        'Only single item batches are supported'

    # map of shape (..., H, W)
    height, width = rpn_cls_score.shape[1:3]  # 在这里得到了rpn输出的H和W,总的anchor数目应该是H×W×9

    # 1. Generate proposals from bbox deltas and shifted anchors
    # 下面是在原图上生成anchor
    shift_x = np.arange(0, width) * _feat_stride  # shape: [width,]
    shift_y = np.arange(0, height) * _feat_stride  # shape: [height,]
    shift_x, shift_y = np.meshgrid(shift_x,
                                   shift_y)  # 生成网格 shift_x shape: [height, width], shift_y shape: [height, width]
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                        shift_x.ravel(), shift_y.ravel())).transpose()  # shape[height*width, 4]

    # add A anchors (1, A, 4) to
    # cell K shifts (K, 1, 4) to get
    # shift anchors (K, A, 4)
    # reshape to (K*A, 4) shifted anchors
    A = _num_anchors  # A = 9
    K = shifts.shape[0]  # K=height*width(特征图上的)
    all_anchors = (_anchors.reshape((1, A, 4)) +
                   shifts.reshape((1, K, 4)).transpose((1, 0, 2)))  # shape[K,A,4] 得到所有的anchor
    all_anchors = all_anchors.reshape((K * A, 4))
    total_anchors = int(K * A)  # total_anchors记录anchor的数目

    # anchors inside the image inds_inside所有的anchor中没有超过图像边界的
    inds_inside = np.where(
        (all_anchors[:, 0] >= -_allowed_border) &
        (all_anchors[:, 1] >= -_allowed_border) &
        (all_anchors[:, 2] < im_dims[1] + _allowed_border) &  # width
        (all_anchors[:, 3] < im_dims[0] + _allowed_border)  # height
    )[0]

    # keep only inside anchors
    anchors = all_anchors[inds_inside, :]  # 在这里选出合理的anchors,指的是没超出边界的

    # label: 1 is positive, 0 is negative, -1 is dont care
    labels = np.empty((len(inds_inside),), dtype=np.float32)  # labels的长度就是合法的anchor的个数
    labels.fill(-1)  # 先用-1填充labels

    # overlaps between the anchors and the gt boxes
    # overlaps (ex, gt)
    # 对所有的没超过图像边界的anchor计算overlap,得到的shape: [len(anchors), len(gt_boxes)]
    overlaps = bbox_overlaps(
        np.ascontiguousarray(anchors, dtype=np.float),
        np.ascontiguousarray(gt_boxes, dtype=np.float))
    argmax_overlaps = overlaps.argmax(axis=1)  # 对于每个anchor,找到对应的gt_box坐标。shape: [len(anchors),]
    max_overlaps = overlaps[
        np.arange(len(inds_inside)), argmax_overlaps]  # 对于每个anchor,找到最大的overlap的gt_box shape: [len(anchors)]
    gt_argmax_overlaps = overlaps.argmax(axis=0)  # 对于每个gt_box,找到对应的最大overlap的anchor。shape[len(gt_boxes),]
    gt_max_overlaps = overlaps[gt_argmax_overlaps,
                               np.arange(overlaps.shape[1])]  # 对于每个gt_box,找到与anchor的最大IoU值。shape[len(gt_boxes),]
    gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[
        0]  # 再次对于每个gt_box,找到对应的最大overlap的anchor。shape[len(gt_boxes),]

    if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:  # 如果不需要抑制positive的anchor,就先给背景anchor赋值,这样在赋前景值的时候可以覆盖。
        # assign bg labels first so that positive labels can clobber them
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0  # 在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0

    # fg label: for each gt, anchor with highest overlap
    labels[gt_argmax_overlaps] = 1  # 在这里将每个gt_box对应IoU最大的anchor置1

    # fg label: above threshold IOU
    labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1  # 在这里将最大IoU大于阈值(0.7)的某些anchor置1

    if cfg.TRAIN.RPN_CLOBBER_POSITIVES:  # 如果需要抑制positive的anchor,就将背景anchor后赋值
        # assign bg labels last so that negative labels can clobber positives
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0  # 在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0

    # subsample positive labels if we have too many
    num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)  # 计算出一个训练batch中需要的前景的数量
    fg_inds = np.where(labels == 1)[0]  # 找出被置为前景的anchors
    if len(fg_inds) > num_fg:
        disable_inds = npr.choice(

            fg_inds, size=(len(fg_inds) - num_fg), replace=False)
        labels[disable_inds] = -1  # 如果事实存在的前景anchor大于了所需值,就随机抛弃一些前景anchor

    # subsample negative labels if we have too many
    num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)  ##计算出一个训练batch中需要的背景的数量
    bg_inds = np.where(labels == 0)[0]  # 找出被置为背景的anchors
    if len(bg_inds) > num_bg:
        disable_inds = npr.choice(
            bg_inds, size=(len(bg_inds) - num_bg), replace=False)
        labels[disable_inds] = -1  # 如果事实存在的背景anchor大于了所需值,就随机抛弃一些背景anchor

    # bbox_targets: The deltas (relative to anchors) that Faster R-CNN should
    # try to predict at each anchor
    # TODO: This "weights" business might be deprecated. Requires investigation
    # 返回的是,对于每个anchor,得到四个坐标变换值(tx,ty,th,tw)。
    bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)  # 对每个在原图内部的anchor,用全0初始化坐标变换值
    bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])  # 对于每个anchor,找到变换到对应的最大的overlap的gt_box的四个值

    bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)  # 使用全0初始化inside_weights
    bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)  # 在前景anchor处赋权重, TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)

    bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)  # 使用全0初始化outside_weights
    if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:  # 如果RPN_POSITIVE_WEIGHT小于0的话,
        # uniform weighting of examples (given non-uniform sampling)
        num_examples = np.sum(labels >= 0)
        positive_weights = np.ones((1, 4)) * 1.0 / num_examples  # 则positive_weights和negative_weights都一样
        negative_weights = np.ones((1, 4)) * 1.0 / num_examples
    else:
        assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))  # 如果RPN_POSITIVE_WEIGHT位于0和1之间的话,
        positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                            np.sum(labels == 1))
        negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                            np.sum(labels == 0))  # 则positive_weights和negative_weights分别赋值
    bbox_outside_weights[labels == 1, :] = positive_weights
    bbox_outside_weights[labels == 0, :] = negative_weights  # 将positive_weights和negative_weights赋给bbox_outside_weights

    # map up to original set of anchors
    labels = _unmap(labels, total_anchors, inds_inside,
                    fill=-1)  # 把图像内部的anchor对应的label映射回总的anchor(加上了那些超出边界的anchor,类别填充-1)
    bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside,
                          fill=0)  # 把图像内部的anchor对应的bbox_target映射回所有的anchor(加上了那些超出边界的anchor,填充0)
    bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside,
                                 fill=0)  # 把图像内部的anchor对应的inside_weights映射回总的anchor(加上了那些超出边界的anchor,填充0)
    bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside,
                                  fill=0)  # 把图像内部的anchor对应的outside_weights映射回总的anchor(加上了那些超出边界的anchor,填充0)

    # labels
    labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
    labels = labels.reshape((1, 1, A * height, width))  # 将anchor的类别label数组形状置为[1,1,9*height,width]
    rpn_labels = labels

    # bbox_targets
    rpn_bbox_targets = bbox_targets.reshape((1, height, width, A * 4)).transpose(0, 3, 1,
                                                                                 2)  # 将anchor的位置映射数组的形状置为[1,9*4,height,width]

    # bbox_inside_weights
    rpn_bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1,
                                                                                               2)  # 将anchor的inside_weights数组的形状置为[1,9*4,height,width]

    # bbox_outside_weights
    rpn_bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1,
                                                                                                 2)  # 将anchor的outside_weights数组的形状置为[1,9*4,height,width]

    return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights  # 返回所有的ground truth值


def _unmap(data, count, inds, fill=0):  # _unmap函数将图像内部的anchor映射回到生成的所有的anchor
    """ Unmap a subset of item (data) back to the original set of items (of
    size count) """
    if len(data.shape) == 1:
        ret = np.empty((count,), dtype=np.float32)
        ret.fill(fill)
        ret[inds] = data
    else:
        ret = np.empty((count,) + data.shape[1:], dtype=np.float32)
        ret.fill(fill)
        ret[inds, :] = data
    return ret


def _compute_targets(ex_rois, gt_rois):  # _compute_targets函数计算anchor和对应的gt_box的位置映射
    """Compute bounding-box regression targets for an image."""

    assert ex_rois.shape[0] == gt_rois.shape[0]
    assert ex_rois.shape[1] == 4
    assert gt_rois.shape[1] == 5

    return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值