【Faster R-CNN论文精度系列】代码解读并深入理解Region Proposal Network

最新推荐文章于 2024-07-08 21:22:29 发布

Gerwels_JI

最新推荐文章于 2024-07-08 21:22:29 发布

阅读量1.1k

点赞数

分类专栏： DeepLearning 文章标签： faster rcnn

DeepLearning 专栏收录该内容

12 篇文章

订阅专栏

本文深入解析Faster R-CNN目标检测模型，涵盖RPN网络、ROI池化、分类与回归过程，详解anchor生成、IoU计算及损失函数。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

【Faster R-CNN论文精度系列】

（如下为建议阅读顺序）
1【Faster R-CNN论文精度系列】从Faster R-CNN源码中，我们“学习”到了什么？
2【Faster R-CNN论文精度系列】代码解读并深入理解Region Proposal Network
3【Faster R-CNN论文精度系列】代码解读并深入理解Anchor和Anchor Box
4【Faster R-CNN论文精度系列】原文精析

1 train.prototxt文件解读

找到在目录 py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_end2end 下有一个train.prototxt文件，这是整个论文的框架理解，本文选取经典模型进行解读，ZF的框架可以在源码中找到必看！这部分我在【Faster R-CNN论文精度系列】从Faster R-CNN源码中，我们“学习”到了什么？中已经做了详细的解析，现在只是搬运重复其部分内容，推荐阅读论文原文和笔者的解析。

# ============ RPN ==============

# rpn_conv/3x3
# 对卷积网络传来的feature map做RPN的第一步操作：卷积
layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "conv5_3" # 接在conv5_3后接了一个RPN-layer
  top: "rpn/output"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1 # conv参数设定
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
# 接relu激活函数，增加其非线性
layer {
  name: "rpn_relu/3x3"
  type: "ReLU"
  bottom: "rpn/output"
  top: "rpn/output"
}
# # 开始cls和reg
# rpn_cls_score
layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 18   # 2(bg/fg) * 9(anchors)
                     # 含有18个输出，每个anchor对应两个分数：前景和背景的概率分数
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
# rpn_bbox_pred
layer {
  name: "rpn_bbox_pred"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 36   # 4 * 9(anchors)
                     # 含有36个输出，每个anchor有4个坐标值
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}

# rpn-data
# 这一块进行了很多的操作，在送入回归之前，对很多框进行筛选
# 比如在边缘上出界的框怎么处理，多个框重叠怎么处理（NMS）
layer {
  name: 'rpn-data'
  type: 'Python'
  # 输入
  bottom: 'rpn_cls_score'   # 分类得分
  bottom: 'gt_boxes'
  bottom: 'im_info'
  bottom: 'data'    # 输入data
  # 输出
  top: 'rpn_labels' # 利用对9个anchor给一个label（>0.7为前景）（<0.3为背景）
  top: 'rpn_bbox_targets'   # 算出原文中3.1.2节中给出的8个参数
  top: 'rpn_bbox_inside_weights'
  top: 'rpn_bbox_outside_weights'   # 俩weight为了计算loss而设置的
  python_param {
    module: 'rpn.anchor_target_layer'
    layer: 'AnchorTargetLayer'
    param_str: "'feat_stride': 16"
  }
}
# rpn_loss_cls
# 分类的loss
layer {
  name: "rpn_loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "rpn_cls_score_reshape"
  bottom: "rpn_labels"
  propagate_down: 1
  propagate_down: 0
  top: "rpn_cls_loss"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
  }
}
# rpn_loss_bbox
# b.box的回归的loss
layer {
  name: "rpn_loss_bbox"
  type: "SmoothL1Loss"
  bottom: "rpn_bbox_pred"
  bottom: "rpn_bbox_targets"
  bottom: 'rpn_bbox_inside_weights'
  bottom: 'rpn_bbox_outside_weights'
  top: "rpn_loss_bbox"
  loss_weight: 1
  smooth_l1_loss_param { sigma: 3.0 }
}

#============ RoI Proposal ===============

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}

layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
}

layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info' # 输入了三个东西reshape后的分类概率、预测的框和im信息
  top: 'rpn_rois'   # 产生一些region，
#  top: 'rpn_scores'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}

#layer {
#  name: 'debug-data'
#  type: 'Python'
#  bottom: 'data'
#  bottom: 'rpn_rois'
#  bottom: 'rpn_scores'
#  python_param {
#    module: 'rpn.debug_layer'
#    layer: 'RPNDebugLayer'
#  }
#}

layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'rpn.proposal_target_layer'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 21"
  }
}

2 KevinLiang的python代码

https://github.com/kevinjliang/tf-Faster-RCNN

2.1 anchor_target_layer

提议anchor box，并按照一定的规则滤除部分anchor box，然后计算IoU，给出标签值（-1, 0, 1），然后给出Loss并予以回归

# -*- coding: utf-8 -*-
"""
Created on Sun Jan  1 16:11:17 2017
@author: Kevin Liang (modifications)

Anchor Target Layer: Creates all the anchors in the final convolutional feature map, assigns anchors to ground truth boxes, and applies labels of "objectness"

Adapted from the official Faster R-CNN repo: 
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/anchor_target_layer.py
"""
 
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
 
import sys
sys.path.append('../')
 
import numpy as np
import numpy.random as npr
import tensorflow as tf
 
from Lib.bbox_overlaps import bbox_overlaps
from Lib.bbox_transform import bbox_transform
from Lib.faster_rcnn_config import cfg
from Lib.generate_anchors import generate_anchors
 
#该函数计算每个anchor对应的ground truth(前景/背景，坐标偏移值)
def anchor_target_layer(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):
    '''
    Make Python version of _anchor_target_layer_py below Tensorflow compatible
    '''
    #执行_anchor_target_layer_py函数，传参有网络预测的rpn分类分数，ground_truth_box，图像的尺寸，与原图相比特征图缩小的比例和anchor的尺度
    rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights = \
        tf.py_func(_anchor_target_layer_py, [rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales],
                   [tf.float32, tf.float32, tf.float32, tf.float32])
 
    #转化成tensor
    rpn_labels = tf.convert_to_tensor(tf.cast(rpn_labels,tf.int32), name = 'rpn_labels')
    rpn_bbox_targets = tf.convert_to_tensor(rpn_bbox_targets, name = 'rpn_bbox_targets')
    rpn_bbox_inside_weights = tf.convert_to_tensor(rpn_bbox_inside_weights , name = 'rpn_bbox_inside_weights')
    rpn_bbox_outside_weights = tf.convert_to_tensor(rpn_bbox_outside_weights , name = 'rpn_bbox_outside_weights')
 
    return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
 
# 对(H,W)图中的每cell产生9个anchor box，对于超出边界的box，令其像素为0
def _anchor_target_layer_py(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):
    """
    Python version    
    
    Assign anchors to ground-truth targets. Produces anchor classification
    labels and bounding-box regression targets.
    
    # Algorithm:
    #
    # for each (H, W) location i
    #   generate 9 anchor boxes centered on cell i
    #   apply predicted bbox deltas at cell i to each of the 9 anchors
    # filter out-of-image anchors
    # measure GT overlap
    """
    im_dims = im_dims[0] #获得原图的尺度[height, width]
    _anchors = generate_anchors(scales=np.array(anchor_scales))# 生成9个锚点，shape: [9,4]
    _num_anchors = _anchors.shape[0] #_num_anchors值为9
    
    # allow boxes to sit over the edge by a small amount
    _allowed_border =  0 #将anchor超出边界的限度设置为0
    
    # Only minibatch of 1 supported 在这里核验batch_size是否为1
    assert rpn_cls_score.shape[0] == 1, \
        'Only single item batches are supported'    
    
    # map of shape (..., H, W)
    height, width = rpn_cls_score.shape[1:3] #在这里得到了rpn输出的H和W，总的anchor数目应该是H×W×9
    
    # 1. Generate proposals from bbox deltas and shifted anchors
    #下面是在原图上生成anchor
    shift_x = np.arange(0, width) * _feat_stride #shape: [width,]
    shift_y = np.arange(0, height) * _feat_stride #shape: [height,]
    shift_x, shift_y = np.meshgrid(shift_x, shift_y) #生成网格 shift_x shape: [height, width], shift_y shape: [height, width]
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                        shift_x.ravel(), shift_y.ravel())).transpose() # shape[height*width, 4]
 
    # add A anchors (1, A, 4) to
    # cell K shifts (K, 1, 4) to get
    # shift anchors (K, A, 4)
    # reshape to (K*A, 4) shifted anchors
    A = _num_anchors # A = 9
    K = shifts.shape[0] # K=height*width(特征图上的)
    all_anchors = (_anchors.reshape((1, A, 4)) +
                   shifts.reshape((1, K, 4)).transpose((1, 0, 2))) #shape[K,A,4] 得到所有的anchor
    all_anchors = all_anchors.reshape((K * A, 4))
    total_anchors = int(K * A) #total_anchors记录anchor的数目
    
    # anchors inside the image inds_inside所有的anchor中没有超过图像边界的
    inds_inside = np.where(
        (all_anchors[:, 0] >= -_allowed_border) &
        (all_anchors[:, 1] >= -_allowed_border) &
        (all_anchors[:, 2] < im_dims[1] + _allowed_border) &  # width
        (all_anchors[:, 3] < im_dims[0] + _allowed_border)    # height
    )[0]
    
    # keep only inside anchors
    anchors = all_anchors[inds_inside, :]#在这里选出合理的anchors，指的是没超出边界的
    
    # label: 1 is positive, 0 is negative, -1 is dont care
    labels = np.empty((len(inds_inside), ), dtype=np.float32)#labels的长度就是合法的anchor的个数
    labels.fill(-1) #先用-1填充labels
    
    # overlaps between the anchors and the gt boxes
    # overlaps (ex, gt)
    #对所有的没超过图像边界的anchor计算overlap，得到的shape: [len(anchors), len(gt_boxes)]
    overlaps = bbox_overlaps(
        np.ascontiguousarray(anchors, dtype=np.float),
        np.ascontiguousarray(gt_boxes, dtype=np.float))
    argmax_overlaps = overlaps.argmax(axis=1) #对于每个anchor，找到对应的gt_box坐标。shape: [len(anchors),]
    max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #对于每个anchor，找到最大的overlap的gt_box shape: [len(anchors)]
    gt_argmax_overlaps = overlaps.argmax(axis=0) #对于每个gt_box，找到对应的最大overlap的anchor。shape[len(gt_boxes),]
    gt_max_overlaps = overlaps[gt_argmax_overlaps,
                               np.arange(overlaps.shape[1])]#对于每个gt_box，找到与anchor的最大IoU值。shape[len(gt_boxes),]
    gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]#再次对于每个gt_box，找到对应的最大overlap的anchor。shape[len(gt_boxes),]
    
    if not cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果不需要抑制positive的anchor，就先给背景anchor赋值，这样在赋前景值的时候可以覆盖。
        # assign bg labels first so that positive labels can clobber them
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0
 
    # fg label: for each gt, anchor with highest overlap
    labels[gt_argmax_overlaps] = 1 #在这里将每个gt_box对应IoU最大的anchor置1
 
    # fg label: above threshold IOU
    labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 #在这里将最大IoU大于阈值(0.7)的某些anchor置1
 
    if cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果需要抑制positive的anchor，就将背景anchor后赋值
        # assign bg labels last so that negative labels can clobber positives
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0
 
    # subsample positive labels if we have too many
    num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)#计算出一个训练batch中需要的前景的数量
    fg_inds = np.where(labels == 1)[0] #找出被置为前景的anchors
    if len(fg_inds) > num_fg:
        disable_inds = npr.choice(
            fg_inds, size=(len(fg_inds) - num_fg), replace=False)
        labels[disable_inds] = -1 #如果事实存在的前景anchor大于了所需值，就随机抛弃一些前景anchor
 
    # subsample negative labels if we have too many
    num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1) ##计算出一个训练batch中需要的背景的数量
    bg_inds = np.where(labels == 0)[0] #找出被置为背景的anchors
    if len(bg_inds) > num_bg:
        disable_inds = npr.choice(
            bg_inds, size=(len(bg_inds) - num_bg), replace=False)
        labels[disable_inds] = -1 #如果事实存在的背景anchor大于了所需值，就随机抛弃一些背景anchor
 
    # bbox_targets: The deltas (relative to anchors) that Faster R-CNN should 
    # try to predict at each anchor
    # TODO: This "weights" business might be deprecated. Requires investigation
    #返回的是，对于每个anchor，得到四个坐标变换值(tx,ty,th,tw)。
    bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) #对每个在原图内部的anchor,用全0初始化坐标变换值
    bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) #对于每个anchor，找到变换到对应的最大的overlap的gt_box的四个值
 
    bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) #使用全0初始化inside_weights
    bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS) #在前景anchor处赋权重
 
    bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) #使用全0初始化outside_weights
    if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0: #如果RPN_POSITIVE_WEIGHT小于0的话，
        # uniform weighting of examples (given non-uniform sampling)
        num_examples = np.sum(labels >= 0)
        positive_weights = np.ones((1, 4)) * 1.0 / num_examples #则positive_weights和negative_weights都一样
        negative_weights = np.ones((1, 4)) * 1.0 / num_examples
    else:
        assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1)) #如果RPN_POSITIVE_WEIGHT位于0和1之间的话，
        positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                            np.sum(labels == 1))
        negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                            np.sum(labels == 0)) #则positive_weights和negative_weights分别赋值
    bbox_outside_weights[labels == 1, :] = positive_weights
    bbox_outside_weights[labels == 0, :] = negative_weights #将positive_weights和negative_weights赋给bbox_outside_weights
 
    # map up to original set of anchors
    labels = _unmap(labels, total_anchors, inds_inside, fill=-1)#把图像内部的anchor对应的label映射回总的anchor(加上了那些超出边界的anchor，类别填充-1)
    bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)#把图像内部的anchor对应的bbox_target映射回所有的anchor(加上了那些超出边界的anchor，填充0)
    bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) #把图像内部的anchor对应的inside_weights映射回总的anchor(加上了那些超出边界的anchor，填充0)
    bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0) #把图像内部的anchor对应的outside_weights映射回总的anchor(加上了那些超出边界的anchor，填充0)
    
    # labels
    labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
    labels = labels.reshape((1, 1, A * height, width)) #将anchor的类别label数组形状置为[1,1,9*height,width]
    rpn_labels = labels
 
    # bbox_targets
    rpn_bbox_targets = bbox_targets.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #将anchor的位置映射数组的形状置为[1,9*4,height,width]
    
    # bbox_inside_weights
    rpn_bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #将anchor的inside_weights数组的形状置为[1,9*4,height,width]
 
    # bbox_outside_weights
    rpn_bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #将anchor的outside_weights数组的形状置为[1,9*4,height,width]
 
    return rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights #返回所有的ground truth值
    
 
def _unmap(data, count, inds, fill=0): #_unmap函数将图像内部的anchor映射回到生成的所有的anchor
    """ Unmap a subset of item (data) back to the original set of items (of
    size count) """
    if len(data.shape) == 1:
        ret = np.empty((count, ), dtype=np.float32)
        ret.fill(fill)
        ret[inds] = data
    else:
        ret = np.empty((count, ) + data.shape[1:], dtype=np.float32)
        ret.fill(fill)
        ret[inds, :] = data
    return ret
 
def _compute_targets(ex_rois, gt_rois): #_compute_targets函数计算anchor和对应的gt_box的位置映射
    """Compute bounding-box regression targets for an image."""
 
    assert ex_rois.shape[0] == gt_rois.shape[0]
    assert ex_rois.shape[1] == 4
    assert gt_rois.shape[1] == 5
 
    return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)

anchor_target_layer函数主要还是调用了_anchor_target_layer_py函数，然后将输出转化为tensor。

下面，我们就来仔细分析一下_anchor_target_layer_py函数。在该函数中，首先通过generate_anchors函数生成了9个候选框，然后按照在共享特征上每滑动一次对应到原图的位置生成候选框，即all_anchors。

紧接着，排除了全部边框超过图像边界的候选框，得到anchors，之后的操作都是针对图像内部的anchors。然后，通过bbox_overlaps函数计算了所有边界内anchor与包围框之间的IoU值。接着，排除了IoU在0.3到0.7之间的anchor(通过将labels对应的值置为-1)，并且为训练安排了合适数量的前景anchor和背景anchor。

然后，通过_compute_targets函数计算每个anchor对应的坐标变换值(tx, ty, th, tw)，存在bbox_targets数组里面。再计算了bbox_inside_weights和bbox_outside_weights，这两个数组在训练anchor边框修正时有重大作用。最后，通过_unmap函数将所有图像边框内部的anchor映射回所有的anchor。

总之，anchor_target_layer主要就是为了得到两个东西：

第一个是一张图像所生成anchor box的类别，在训练时需要赋予一定数量的正样本(前景，label=1)和一定数量的负样本(背景，label=0)，其余的（非负非正）需要全部置成-1，表示训练的时候会忽略掉。
第二个东西是对于每一个anchor box进行回归修正，在进行边框修正loss的计算时，只有前景anchor（label=1的anchor）会起作用，可以看到这是bbox_inside_weights和bbox_outside_weights在实现。非前景非背景的anchor box对应的bbox_inside_weights和bbox_outside_weights都为0。

2.2 faster_rcnn_networks.py

然后，我们进入faster_rcnn_networks.py文件，可以看到rpn类、roi proposal类、fast rcnn类。

# -*- coding: utf-8 -*-
"""
Created on Fri Dec 30 16:14:48 2016
@author: Kevin Liang
Faster R-CNN detection and classification networks.
Contains the Region Proposal Network (RPN), ROI proposal layer, and the RCNN.
TODO: -Split off these three networks into their own files OR add to Layers
"""
import sys
 
sys.path.append('../')
 
from Lib.TensorBase.tensorbase.base import Layers
 
from Lib.faster_rcnn_config import cfg
from Lib.loss_functions import rpn_cls_loss, rpn_bbox_loss, fast_rcnn_cls_loss, fast_rcnn_bbox_loss
from Lib.roi_pool import roi_pool
from Lib.rpn_softmax import rpn_softmax
from Networks.anchor_target_layer import anchor_target_layer
from Networks.proposal_layer import proposal_layer
from Networks.proposal_target_layer import proposal_target_layer
 
import tensorflow as tf

# 定义region proposal network类
class rpn:
    '''
    Region Proposal Network (RPN): From the convolutional feature maps
    (TensorBase Layers object) of the last layer, generate bounding boxes
    relative to anchor boxes and give an "objectness" score to each
    In evaluation mode (eval_mode==True), gt_boxes should be None.
    '''
 
    def __init__(self, featureMaps, gt_boxes, im_dims, _feat_stride, eval_mode):
        self.featureMaps = featureMaps #得到共享特征
        self.gt_boxes = gt_boxes #得到标签 shape: [None, 5]，记录左上角和右下角的坐标以及类别
        self.im_dims = im_dims #图像尺度 shape: [None ,2]，记录图像的宽度与高度
        self._feat_stride = _feat_stride #记录图像经过特征图缩小的尺度
        self.anchor_scales = cfg.RPN_ANCHOR_SCALES #记录anchor的尺度 [8, 16, 32]
        self.eval_mode = eval_mode #记录是训练还是测试，true为测试，false为训练    
        self._network() #执行_network函数
 
    def _network(self):
        # There shouldn't be any gt_boxes if in evaluation mode
        if self.eval_mode is True: #如果是测试的话，那么就没有ground truth
            assert self.gt_boxes is None, \
 'Evaluation mode should not have ground truth boxes (or else what are you detecting for?)'
 
        _num_anchors = len(self.anchor_scales)*3 #_num_anchors为9(3×3)，指一次滑动对应9个anchor
 
        rpn_layers = Layers(self.featureMaps) #将共享特征赋给rpn_layers
 
        with tf.variable_scope('rpn'):
            # Spatial windowing
            for i in range(len(cfg.RPN_OUTPUT_CHANNELS)):# 在这里先用3×3的核输出512个通道
                rpn_layers.conv2d(filter_size=cfg.RPN_FILTER_SIZES[i], output_channels=cfg.RPN_OUTPUT_CHANNELS[i])
                
            features = rpn_layers.get_output()
 
            with tf.variable_scope('cls'):
                # Box-classification layer (objectness)
                self.rpn_bbox_cls_layers = Layers(features) #在这里使用1×1的核输出18(9×2)个通道
                self.rpn_bbox_cls_layers.conv2d(filter_size=1, output_channels=_num_anchors*2, activation_fn=None)
 
            with tf.variable_scope('target'): #在这里得到每个anchor对应的target
                # Only calculate targets in train mode. No ground truth boxes in evaluation mode
                if self.eval_mode is False:
                    # Anchor Target Layer (anchors and deltas)
                    rpn_cls_score = self.rpn_bbox_cls_layers.get_output()
                    self.rpn_labels, self.rpn_bbox_targets, self.rpn_bbox_inside_weights, self.rpn_bbox_outside_weights = \
                        anchor_target_layer(rpn_cls_score=rpn_cls_score, gt_boxes=self.gt_boxes, im_dims=self.im_dims,
                                            _feat_stride=self._feat_stride, anchor_scales=self.anchor_scales)
 
            with tf.variable_scope('bbox'): #在这里使用1×1的核输出36(9×4)个通道
                # Bounding-Box regression layer (bounding box predictions)
                self.rpn_bbox_pred_layers = Layers(features)
                self.rpn_bbox_pred_layers.conv2d(filter_size=1, output_channels=_num_anchors*4, activation_fn=None)
 
    # Get functions
    def get_rpn_cls_score(self): #返回rpn网络判断的anchor前后景分数
        return self.rpn_bbox_cls_layers.get_output()
 
    def get_rpn_labels(self): #返回每个anchor属于前景还是后景的ground truth
        assert self.eval_mode is False, 'No RPN labels without ground truth boxes'
        return self.rpn_labels
 
    def get_rpn_bbox_pred(self): #返回rpn判断的anchor的四个偏移值
        return self.rpn_bbox_pred_layers.get_output()
 
    def get_rpn_bbox_targets(self): #返回每个anchor对应的事实的四个偏移值
        assert self.eval_mode is False, 'No RPN bounding box targets without ground truth boxes'
        return self.rpn_bbox_targets
 
    def get_rpn_bbox_inside_weights(self): #在训练计算边框误差时有用，仅对未超出图像边界的anchor有用
        assert self.eval_mode is False, 'No RPN inside weights without ground truth boxes'
        return self.rpn_bbox_inside_weights
 
    def get_rpn_bbox_outside_weights(self): #在训练计算边框误差时有用，仅对未超出图像边界的anchor有用
        assert self.eval_mode is False, 'No RPN outside weights without ground truth boxes'
        return self.rpn_bbox_outside_weights
 
    # Loss functions
    def get_rpn_cls_loss(self): #计算rpn的分类loss
        assert self.eval_mode is False, 'No RPN cls loss without ground truth boxes'
        rpn_cls_score = self.get_rpn_cls_score()
        rpn_labels = self.get_rpn_labels()
        return rpn_cls_loss(rpn_cls_score, rpn_labels)
 
    def get_rpn_bbox_loss(self): #计算rpn的边界损失loss，请注意在这里用到了inside和outside_weights
        assert self.eval_mode is False, 'No RPN bbox loss without ground truth boxes'
        rpn_bbox_pred = self.get_rpn_bbox_pred()
        rpn_bbox_targets = self.get_rpn_bbox_targets()
        rpn_bbox_inside_weights = self.get_rpn_bbox_inside_weights()
        rpn_bbox_outside_weights = self.get_rpn_bbox_outside_weights()
        return rpn_bbox_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights)

class roi_proposal:
    '''
    Propose highest scoring boxes to the RCNN classifier
    In evaluation mode (eval_mode==True), gt_boxes should be None.
    '''

    def __init__(self, rpn_net, gt_boxes, im_dims, eval_mode):
        self.rpn_net = rpn_net
        self.rpn_cls_score = rpn_net.get_rpn_cls_score()
        self.rpn_bbox_pred = rpn_net.get_rpn_bbox_pred()
        self.gt_boxes = gt_boxes
        self.im_dims = im_dims
        self.num_classes = cfg.NUM_CLASSES
        self.anchor_scales = cfg.RPN_ANCHOR_SCALES
        self.eval_mode = eval_mode
        
        self._network()

    def _network(self):
        # There shouldn't be any gt_boxes if in evaluation mode
        if self.eval_mode is True:
            assert self.gt_boxes is None, \
                'Evaluation mode should not have ground truth boxes (or else what are you detecting for?)'

        with tf.variable_scope('roi_proposal'):
            # Convert scores to probabilities
            self.rpn_cls_prob = rpn_softmax(self.rpn_cls_score)
    
            # Determine best proposals
            key = 'TRAIN' if self.eval_mode is False else 'TEST'
            self.blobs = proposal_layer(rpn_bbox_cls_prob=self.rpn_cls_prob, rpn_bbox_pred=self.rpn_bbox_pred,
                                        im_dims=self.im_dims, cfg_key=key, _feat_stride=self.rpn_net._feat_stride,
                                        anchor_scales=self.anchor_scales)
    
            if self.eval_mode is False:
                # Calculate targets for proposals
                self.rois, self.labels, self.bbox_targets, self.bbox_inside_weights, self.bbox_outside_weights = \
                    proposal_target_layer(rpn_rois=self.blobs, gt_boxes=self.gt_boxes,
                                          _num_classes=self.num_classes)

    def get_rois(self):
        return self.rois if self.eval_mode is False else self.blobs

    def get_labels(self):
        assert self.eval_mode is False, 'No labels without ground truth boxes'
        return self.labels

    def get_bbox_targets(self):
        assert self.eval_mode is False, 'No bounding box targets without ground truth boxes'
        return self.bbox_targets

    def get_bbox_inside_weights(self):
        assert self.eval_mode is False, 'No RPN inside weights without ground truth boxes'
        return self.bbox_inside_weights

    def get_bbox_outside_weights(self):
        assert self.eval_mode is False, 'No RPN outside weights without ground truth boxes'
        return self.bbox_outside_weights


class fast_rcnn:
    '''
    Crop and resize areas from the feature-extracting CNN's feature maps
    according to the ROIs generated from the ROI proposal layer
    '''

    def __init__(self, featureMaps, roi_proposal_net, eval_mode):
        self.featureMaps = featureMaps
        self.roi_proposal_net = roi_proposal_net
        self.rois = roi_proposal_net.get_rois()
        self.im_dims = roi_proposal_net.im_dims
        self.num_classes = cfg.NUM_CLASSES
        self.eval_mode = eval_mode
        
        self._network()

    def _network(self):
        with tf.variable_scope('fast_rcnn'):
            # No dropout in evaluation mode
            keep_prob = cfg.FRCNN_DROPOUT_KEEP_RATE if self.eval_mode is False else 1.0

            # ROI pooling
            pooledFeatures = roi_pool(self.featureMaps, self.rois, self.im_dims)

            # Fully Connect layers (with dropout)
            with tf.variable_scope('fc'):
                self.rcnn_fc_layers = Layers(pooledFeatures)
                self.rcnn_fc_layers.flatten()
                for i in range(len(cfg.FRCNN_FC_HIDDEN)):
                    self.rcnn_fc_layers.fc(output_nodes=cfg.FRCNN_FC_HIDDEN[i], keep_prob=keep_prob)

                hidden = self.rcnn_fc_layers.get_output()

            # Classifier score
            with tf.variable_scope('cls'):
                self.rcnn_cls_layers = Layers(hidden)
                self.rcnn_cls_layers.fc(output_nodes=self.num_classes, activation_fn=None)

            # Bounding Box refinement
            with tf.variable_scope('bbox'):
                self.rcnn_bbox_layers = Layers(hidden)
                self.rcnn_bbox_layers.fc(output_nodes=self.num_classes*4, activation_fn=None)

    # Get functions
    def get_cls_score(self):
        return self.rcnn_cls_layers.get_output()

    def get_cls_prob(self):
        logits = self.get_cls_score()
        return tf.nn.softmax(logits)

    def get_bbox_refinement(self):
        return self.rcnn_bbox_layers.get_output()

    # Loss functions
    def get_fast_rcnn_cls_loss(self):
        assert self.eval_mode is False, 'No Fast RCNN cls loss without ground truth boxes'
        fast_rcnn_cls_score = self.get_cls_score()
        labels = self.roi_proposal_net.get_labels()
        return fast_rcnn_cls_loss(fast_rcnn_cls_score, labels)

    def get_fast_rcnn_bbox_loss(self):
        assert self.eval_mode is False, 'No Fast RCNN bbox loss without ground truth boxes'
        fast_rcnn_bbox_pred = self.get_bbox_refinement()
        bbox_targets = self.roi_proposal_net.get_bbox_targets()
        roi_inside_weights = self.roi_proposal_net.get_bbox_inside_weights()
        roi_outside_weights = self.roi_proposal_net.get_bbox_outside_weights()
        return fast_rcnn_bbox_loss(fast_rcnn_bbox_pred, bbox_targets, roi_inside_weights, roi_outside_weights)

可以看出，rpn类在训练的时候主要有两个功能：

get_rpn_cls_loss：计算的rpn网络分类loss，
第二个是get_rpn_bbox_loss计算的rpn网络的anchor边界回归loss。
那么，要计算两个loss，最难的地方是如何去获得ground truth并与之相比较。这个ground truth的获得是通过以上介绍的anchor_target_layer函数来实现的。

2.3 anchor_target_layer

在anchor_target_layer函数中，有几个比较重要的函数，第一个函数就是generate_anchors，这个函数的主要作用是生成9个anchor，包含3种长宽比和3种面积。请参考笔者写的这篇文章进行理解：
【Faster R-CNN论文精度系列】代码解读并深入理解Anchor和Anchor Box
主要的原理就是最开始生成一个anchor (reference)。然后，通过这个reference
anchor生成三个不同长宽比，但面积一样的anchor。最后，对每个长宽比anchor生成三个不同面积尺度的anchor，最终生成9个anchor。

2.4 bbox_overlaps & bbox_transform

bbox_overlaps函数的作用是：对于每一个anchor和所有的ground truth计算IoU值（交并比）
bbox_transform函数的作用是：在计算anchor的坐标变换值的时候，将anchor的表示形式变成中心坐标与长、宽的形式

2.5 rpn的Loss函数

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 17 15:05:05 2017
@author: Kevin Liang
Loss functions
"""
 
from .faster_rcnn_config import cfg
 
import tensorflow as tf
 
 
def rpn_cls_loss(rpn_cls_score,rpn_labels):
    '''
    Calculate the Region Proposal Network classifier loss. Measures how well 
    the RPN is able to propose regions by the performance of its "objectness" 
    classifier.
    
    Standard cross-entropy loss on logits
    '''
    with tf.variable_scope('rpn_cls_loss'):
        # input shape dimensions
        shape = tf.shape(rpn_cls_score)
        
        # Stack all classification scores into 2D matrix
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,3,1,2])
        rpn_cls_score = tf.reshape(rpn_cls_score,[shape[0],2,shape[3]//2*shape[1],shape[2]])
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,2,3,1])
        rpn_cls_score = tf.reshape(rpn_cls_score,[-1,2])
        
        # Stack labels
        rpn_labels = tf.reshape(rpn_labels,[-1]) #在这里先讲label展开成one_hot向量
        
        # Ignore label=-1 (Neither object nor background: IoU between 0.3 and 0.7)
		#在这里对应label中为-1值的位置排除掉score中的值，并且变成[-1,2]的形状方便计算交叉熵loss
        rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score,tf.where(tf.not_equal(rpn_labels,-1))),[-1,2])
		#在这里留下label中的非-1的值，表示对应的anchor与gt的IoU在0.7以上
        rpn_labels = tf.reshape(tf.gather(rpn_labels,tf.where(tf.not_equal(rpn_labels,-1))),[-1]) 
        
        # Cross entropy error 在这里计算交叉熵loss
        rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_labels))
    
    return rpn_cross_entropy
    
    
def rpn_bbox_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_inside_weights, rpn_outside_weights):
    '''
    Calculate the Region Proposal Network bounding box loss. Measures how well 
    the RPN is able to propose regions by the performance of its localization.
    lam/N_reg * sum_i(p_i^* * L_reg(t_i,t_i^*))
    lam: classification vs bbox loss balance parameter     
    N_reg: Number of anchor locations (~2500)
    p_i^*: ground truth label for anchor (loss only for positive anchors)
    L_reg: smoothL1 loss
    t_i: Parameterized prediction of bounding box
    t_i^*: Parameterized ground truth of closest bounding box
    '''    
    with tf.variable_scope('rpn_bbox_loss'):
        # Transposing
        rpn_bbox_targets = tf.transpose(rpn_bbox_targets, [0,2,3,1])
        rpn_inside_weights = tf.transpose(rpn_inside_weights, [0,2,3,1])
        rpn_outside_weights = tf.transpose(rpn_outside_weights, [0,2,3,1])
        
        # How far off was the prediction?
	#在这里将预测的tx,ty,th,tw和标签做减法，并乘以rpn_inside_weights，意思是只对positive anchor计算bbox loss
        diff = tf.multiply(rpn_inside_weights, rpn_bbox_pred - rpn_bbox_targets)
	#在这里计算smooth_L1结果
        diff_sL1 = smoothL1(diff, 3.0)
        
        # Only count loss for positive anchors. Make sure it's a sum.
	#在这里将上面的运算结果乘以rpn_outside_weights并且求和，同样是只对positive anchor计算bbox loss
 
        rpn_bbox_reg = tf.reduce_sum(tf.multiply(rpn_outside_weights, diff_sL1))
    
        # Constant for weighting bounding box loss with classification loss
	#在这里将边框误差再乘以一个lambda参数，作为最终的边框误差
        rpn_bbox_reg = cfg.TRAIN.RPN_BBOX_LAMBDA * rpn_bbox_reg

重要！

在计算rpn_cls_loss的时候，排除掉了label中对应值为-1的值，也就是说，只保留了图像边界内的与ground truth box最大IoU在0.7以上或者0.3以下的anchor。
在计算rpn_bbox_loss的时候，从最开始乘以rpn_inside_weights来看，只计算了前景anchor的bbox loss，因为其余非前景anchor对应的rpn_inside_weights都为0。也就是说，对于rpn层对bbox进行回归的时候只使用了正标签的anchor

3 总结

如何生成H×W×9个anchor：做法是先生成9个不同ratio和不同scale的anchor，然后在feature map（256-d）上各个cell滑动生成不同的anchor box
如何计算每个anchor的类别(前景背景)和边框变换值。做法是首先为每个anchor计算与ground truth box对应的IoU值，排除IoU为0.3~0.7的anchor。0.3以下的为背景anchor，0.7以上的为前景anchor。对于边框变化值，是计算的anchor与IoU重合最大的ground truth box对应的tx,ty,th,tw四个值。