深度目标检测网络中关于anchor的神之问（配代码详解）（二）

最新推荐文章于 2024-01-26 17:33:16 发布

恋蛩音

最新推荐文章于 2024-01-26 17:33:16 发布

阅读量1.6k

点赞数

分类专栏：计算机视觉面试神问文章标签：计算机视觉面试神问计算机视觉面试神问

本文链接：https://blog.csdn.net/qq_17846375/article/details/98039046

版权

计算机视觉面试神问专栏收录该内容

8 篇文章 1 订阅

订阅专栏

将所求出的所有anchor都用于计算吗？如何将筛选所用于计算proposal的anchor点？

如何用anchor来计算proposal(分类与边框回归）？

如何根据前景anchor和GT作Bounding-box regression（边框回归）？

如何确定anchor该放在那几层合适呢？(影响anchor选取的因素有那些？)

anchor相比于之前的技术优势在那？为什么如此的普及？

将所求出的所有anchor都用于计算吗？如何将筛选所用于计算proposal的anchor点？

首先，对着代码来分析下，得到ancho之后的过程，以tensoflowy源码的RPN为例进行梳理，

    im_dims = im_dims[0] #获得原图的尺度[height, width]
    _anchors = generate_anchors(scales=np.array(anchor_scales))# 生成9个锚点，shape: [9,4]
    _num_anchors = _anchors.shape[0] #_num_anchors值为9
    
    # allow boxes to sit over the edge by a small amount
    _allowed_border =  0 #将anchor超出边界的限度设置为0
    
    # Only minibatch of 1 supported 在这里核验batch_size是否为1
    assert rpn_cls_score.shape[0] == 1, \
        'Only single item batches are supported'    
    
    # map of shape (..., H, W)
    height, width = rpn_cls_score.shape[1:3] #在这里得到了rpn输出的H和W，总的anchor数目应该是H×W×9
    
    # 1. Generate proposals from bbox deltas and shifted anchors
    #下面是在原图上生成anchor
    shift_x = np.arange(0, width) * _feat_stride #shape: [width,]
    shift_y = np.arange(0, height) * _feat_stride #shape: [height,]
    shift_x, shift_y = np.meshgrid(shift_x, shift_y) #生成网格 shift_x shape: [height, width], shift_y shape: [height, width]
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                        shift_x.ravel(), shift_y.ravel())).transpose() # shape[height*width, 4]
 
    # add A anchors (1, A, 4) to
    # cell K shifts (K, 1, 4) to get
    # shift anchors (K, A, 4)
    # reshape to (K*A, 4) shifted anchors
    A = _num_anchors # A = 9
    K = shifts.shape[0] # K=height*width(特征图上的)
    all_anchors = (_anchors.reshape((1, A, 4)) +
                   shifts.reshape((1, K, 4)).transpose((1, 0, 2))) #shape[K,A,4] 得到所有的anchor
    all_anchors = all_anchors.reshape((K * A, 4))
    total_anchors = int(K * A) #total_anchors记录anchor的数目

由于本人能力有限，代码的很多细节也无法把握到位，只能宏观的解释下，大致的流程，首先是得到所有的anchors的集合，将集合保存在all_anchors里面，这时候这个里面保存着求解出来的anchors是巨大的，肯定是上万的，但我们实际操作中如果每个都用上的话就会造成非常大的计算资源消耗，所以，大神们想了如下的方式来进行：

首先，将超出图片边界的anchor给删除掉，分析的的时候滤掉这些anchor，先减少一批。

 # anchors inside the image inds_inside所有的anchor中没有超过图像边界的
    inds_inside = np.where(
        (all_anchors[:, 0] >= -_allowed_border) &
        (all_anchors[:, 1] >= -_allowed_border) &
        (all_anchors[:, 2] < im_dims[1] + _allowed_border) &  # width
        (all_anchors[:, 3] < im_dims[0] + _allowed_border)    # height
    )[0]
    
    # keep only inside anchors
    anchors = all_anchors[inds_inside, :]#在这里选出合理的anchors，指的是没超出边界的

接着，将剩下的anchor们每个都赋一个“-1”的标签值再让其真值框GT求IOU，如果IOU大于0.7，标记为标签值“1”，代表这个包含所检测的目标物体；如果IOU小于0.3，标记为标签值为“0”，代表这个不包含所检测的目标物体。接着，把所有的标签值为“-1”全部舍弃，标签值为“1”的当作正样本；标签值为“0”的当作负样本。


    # label: 1 is positive, 0 is negative, -1 is dont care
    labels = np.empty((len(inds_inside), ), dtype=np.float32)#labels的长度就是合法的anchor的个数
    labels.fill(-1) #先用-1填充labels
    
    # overlaps between the anchors and the gt boxes
    # overlaps (ex, gt)
    #对所有的没超过图像边界的anchor计算overlap，得到的shape: [len(anchors), len(gt_boxes)]
    overlaps = bbox_overlaps(
        np.ascontiguousarray(anchors, dtype=np.float),
        np.ascontiguousarray(gt_boxes, dtype=np.float))
    argmax_overlaps = overlaps.argmax(axis=1) #对于每个anchor，找到对应的gt_box坐标。shape: [len(anchors),]
    max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #对于每个anchor，找到最大的overlap的gt_box shape: [len(anchors)]
    gt_argmax_overlaps = overlaps.argmax(axis=0) #对于每个gt_box，找到对应的最大overlap的anchor。shape[len(gt_boxes),]
    gt_max_overlaps = overlaps[gt_argmax_overlaps,
                               np.arange(overlaps.shape[1])]#对于每个gt_box，找到与anchor的最大IoU值。shape[len(gt_boxes),]
    gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]#再次对于每个gt_box，找到对应的最大overlap的anchor。shape[len(gt_boxes),]
    
    if not cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果不需要抑制positive的anchor，就先给背景anchor赋值，这样在赋前景值的时候可以覆盖。
        # assign bg labels first so that positive labels can clobber them
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0
 
    # fg label: for each gt, anchor with highest overlap
    labels[gt_argmax_overlaps] = 1 #在这里将每个gt_box对应IoU最大的anchor置1
 
    # fg label: above threshold IOU
    labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 #在这里将最大IoU大于阈值(0.7)的某些anchor置1
 
    if cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果需要抑制positive的anchor，就将背景anchor后赋值
        # assign bg labels last so that negative labels can clobber positives
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0

最后，再设置一个值，如果剩下的正样本的数量大于这个值了，就把正样本里面随机舍弃包含的anchor值，直到小于等于所设定的这个最大值。同理，负样本也这样处理。

PS:这里所说的正负样本就是大神们所言的前后背景。

   # subsample positive labels if we have too many
    num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)#计算出一个训练batch中需要的前景的数量
    fg_inds = np.where(labels == 1)[0] #找出被置为前景的anchors
    if len(fg_inds) > num_fg:
        disable_inds = npr.choice(
            fg_inds, size=(len(fg_inds) - num_fg), replace=False)
        labels[disable_inds] = -1 #如果事实存在的前景anchor大于了所需值，就随机抛弃一些前景anchor
 
    # subsample negative labels if we have too many
    num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1) ##计算出一个训练batch中需要的背景的数量
    bg_inds = np.where(labels == 0)[0] #找出被置为背景的anchors
    if len(bg_inds) > num_bg:
        disable_inds = npr.choice(
            bg_inds, size=(len(bg_inds) - num_bg), replace=False)
        labels[disable_inds] = -1 #如果事实存在的背景anchor大于了所需值，就随机抛弃一些背景anchor

再把这些anchor用于后面的计算。这样就完成了筛选的过程。

所涉及的完整代码

引自：https://blog.csdn.net/jiongnima/article/details/79781792

# -*- coding: utf-8 -*-
"""
Created on Sun Jan  1 16:11:17 2017
@author: Kevin Liang (modifications)
Anchor Target Layer: Creates all the anchors in the final convolutional feature
map, assigns anchors to ground truth boxes, and applies labels of "objectness"
Adapted from the official Faster R-CNN repo: 
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/anchor_target_layer.py
"""
 
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
 
import sys
sys.path.append('../')
 
import numpy as np
import numpy.random as npr
import tensorflow as tf
 
from Lib.bbox_overlaps import bbox_overlaps
from Lib.bbox_transform import bbox_transform
from Lib.faster_rcnn_config import cfg
from Lib.generate_anchors import generate_anchors
 
#该函数计算每个anchor对应的ground truth(前景/背景，坐标偏移值)
def anchor_target_layer(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):
    '''
    Make Python version of _anchor_target_layer_py below Tensorflow compatible
    '''
    #执行_anchor_target_layer_py函数，传参有网络预测的rpn分类分数，ground_truth_box，图像的尺寸，与原图相比特征图缩小的比例和anchor的尺度
    rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights = \
        tf.py_func(_anchor_target_layer_py, [rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales],
                   [tf.float32, tf.float32, tf.float32, tf.float32])
 
    #转化成tensor
    rpn_labels = tf.convert_to_tensor(tf.cast(rpn_labels,tf.int32), name = 'rpn_labels')
    rpn_bbox_targets = tf.convert_to_tensor(rpn_bbox_targets, name = 'rpn_bbox_targets')
    rpn_bbox_inside_weights = tf.convert_to_tensor(rpn_bbox_inside_weights , name = 'rpn_bbox_inside_weights')
    rpn_bbox_outside_weights = tf.convert_to_tensor(rpn_bbox_outside_weights , name = 'rpn_bbox_outside_weights')
 
    return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
 
 
def _anchor_target_layer_py(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):
    """
    Python version    
    
    Assign anchors to ground-truth targets. Produces anchor classification
    labels and bounding-box regression targets.
    
    # Algorithm:
    #
    # for each (H, W) location i
    #   generate 9 anchor boxes centered on cell i
    #   apply predicted bbox deltas at cell i to each of the 9 anchors
    # filter out-of-image anchors
    # measure GT overlap
    """
    im_dims = im_dims[0] #获得原图的尺度[height, width]
    _anchors = generate_anchors(scales=np.array(anchor_scales))# 生成9个锚点，shape: [9,4]
    _num_anchors = _anchors.shape[0] #_num_anchors值为9
    
    # allow boxes to sit over the edge by a small amount
    _allowed_border =  0 #将anchor超出边界的限度设置为0
    
    # Only minibatch of 1 supported 在这里核验batch_size是否为1
    assert rpn_cls_score.shape[0] == 1, \
        'Only single item batches are supported'    
    
    # map of shape (..., H, W)
    height, width = rpn_cls_score.shape[1:3] #在这里得到了rpn输出的H和W，总的anchor数目应该是H×W×9
    
    # 1. Generate proposals from bbox deltas and shifted anchors
    #下面是在原图上生成anchor
    shift_x = np.arange(0, width) * _feat_stride #shape: [width,]
    shift_y = np.arange(0, height) * _feat_stride #shape: [height,]
    shift_x, shift_y = np.meshgrid(shift_x, shift_y) #生成网格 shift_x shape: [height, width], shift_y shape: [height, width]
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                        shift_x.ravel(), shift_y.ravel())).transpose() # shape[height*width, 4]
 
    # add A anchors (1, A, 4) to
    # cell K shifts (K, 1, 4) to get
    # shift anchors (K, A, 4)
    # reshape to (K*A, 4) shifted anchors
    A = _num_anchors # A = 9
    K = shifts.shape[0] # K=height*width(特征图上的)
    all_anchors = (_anchors.reshape((1, A, 4)) +
                   shifts.reshape((1, K, 4)).transpose((1, 0, 2))) #shape[K,A,4] 得到所有的anchor
    all_anchors = all_anchors.reshape((K * A, 4))
    total_anchors = int(K * A) #total_anchors记录anchor的数目
    
    # anchors inside the image inds_inside所有的anchor中没有超过图像边界的
    inds_inside = np.where(
        (all_anchors[:, 0] >= -_allowed_border) &
        (all_anchors[:, 1] >= -_allowed_border) &
        (all_anchors[:, 2] < im_dims[1] + _allowed_border) &  # width
        (all_anchors[:, 3] < im_dims[0] + _allowed_border)    # height
    )[0]
    
    # keep only inside anchors
    anchors = all_anchors[inds_inside, :]#在这里选出合理的anchors，指的是没超出边界的
    
    # label: 1 is positive, 0 is negative, -1 is dont care
    labels = np.empty((len(inds_inside), ), dtype=np.float32)#labels的长度就是合法的anchor的个数
    labels.fill(-1) #先用-1填充labels
    
    # overlaps between the anchors and the gt boxes
    # overlaps (ex, gt)
    #对所有的没超过图像边界的anchor计算overlap，得到的shape: [len(anchors), len(gt_boxes)]
    overlaps = bbox_overlaps(
        np.ascontiguousarray(anchors, dtype=np.float),
        np.ascontiguousarray(gt_boxes, dtype=np.float))
    argmax_overlaps = overlaps.argmax(axis=1) #对于每个anchor，找到对应的gt_box坐标。shape: [len(anchors),]
    max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #对于每个anchor，找到最大的overlap的gt_box shape: [len(anchors)]
    gt_argmax_overlaps = overlaps.argmax(axis=0) #对于每个gt_box，找到对应的最大overlap的anchor。shape[len(gt_boxes),]
    gt_max_overlaps = overlaps[gt_argmax_overlaps,
                               np.arange(overlaps.shape[1])]#对于每个gt_box，找到与anchor的最大IoU值。shape[len(gt_boxes),]
    gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]#再次对于每个gt_box，找到对应的最大overlap的anchor。shape[len(gt_boxes),]
    
    if not cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果不需要抑制positive的anchor，就先给背景anchor赋值，这样在赋前景值的时候可以覆盖。
        # assign bg labels first so that positive labels can clobber them
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0
 
    # fg label: for each gt, anchor with highest overlap
    labels[gt_argmax_overlaps] = 1 #在这里将每个gt_box对应IoU最大的anchor置1
 
    # fg label: above threshold IOU
    labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 #在这里将最大IoU大于阈值(0.7)的某些anchor置1
 
    if cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果需要抑制positive的anchor，就将背景anchor后赋值
        # assign bg labels last so that negative labels can clobber positives
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在这里将最大IoU仍然小于阈值(0.3)的某些anchor置0
 
    # subsample positive labels if we have too many
    num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)#计算出一个训练batch中需要的前景的数量
    fg_inds = np.where(labels == 1)[0] #找出被置为前景的anchors
    if len(fg_inds) > num_fg:
        disable_inds = npr.choice(
            fg_inds, size=(len(fg_inds) - num_fg), replace=False)
        labels[disable_inds] = -1 #如果事实存在的前景anchor大于了所需值，就随机抛弃一些前景anchor
 
    # subsample negative labels if we have too many
    num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1) ##计算出一个训练batch中需要的背景的数量
    bg_inds = np.where(labels == 0)[0] #找出被置为背景的anchors
    if len(bg_inds) > num_bg:
        disable_inds = npr.choice(
            bg_inds, size=(len(bg_inds) - num_bg), replace=False)
        labels[disable_inds] = -1 #如果事实存在的背景anchor大于了所需值，就随机抛弃一些背景anchor
 
    # bbox_targets: The deltas (relative to anchors) that Faster R-CNN should 
    # try to predict at each anchor
    # TODO: This "weights" business might be deprecated. Requires investigation
    #返回的是，对于每个anchor，得到四个坐标变换值(tx,ty,th,tw)。
    bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) #对每个在原图内部的anchor,用全0初始化坐标变换值
    bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) #对于每个anchor，找到变换到对应的最大的overlap的gt_box的四个值
 
    bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) #使用全0初始化inside_weights
    bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS) #在前景anchor处赋权重
 
    bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) #使用全0初始化outside_weights
    if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0: #如果RPN_POSITIVE_WEIGHT小于0的话，
        # uniform weighting of examples (given non-uniform sampling)
        num_examples = np.sum(labels >= 0)
        positive_weights = np.ones((1, 4)) * 1.0 / num_examples #则positive_weights和negative_weights都一样
        negative_weights = np.ones((1, 4)) * 1.0 / num_examples
    else:
        assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1)) #如果RPN_POSITIVE_WEIGHT位于0和1之间的话，
        positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                            np.sum(labels == 1))
        negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                            np.sum(labels == 0)) #则positive_weights和negative_weights分别赋值
    bbox_outside_weights[labels == 1, :] = positive_weights
    bbox_outside_weights[labels == 0, :] = negative_weights #将positive_weights和negative_weights赋给bbox_outside_weights
 
    # map up to original set of anchors
    labels = _unmap(labels, total_anchors, inds_inside, fill=-1)#把图像内部的anchor对应的label映射回总的anchor(加上了那些超出边界的anchor，类别填充-1)
    bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)#把图像内部的anchor对应的bbox_target映射回所有的anchor(加上了那些超出边界的anchor，填充0)
    bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) #把图像内部的anchor对应的inside_weights映射回总的anchor(加上了那些超出边界的anchor，填充0)
    bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0) #把图像内部的anchor对应的outside_weights映射回总的anchor(加上了那些超出边界的anchor，填充0)
    
    # labels
    labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
    labels = labels.reshape((1, 1, A * height, width)) #将anchor的类别label数组形状置为[1,1,9*height,width]
    rpn_labels = labels
 
    # bbox_targets
    rpn_bbox_targets = bbox_targets.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #将anchor的位置映射数组的形状置为[1,9*4,height,width]
    
    # bbox_inside_weights
    rpn_bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #将anchor的inside_weights数组的形状置为[1,9*4,height,width]
 
    # bbox_outside_weights
    rpn_bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #将anchor的outside_weights数组的形状置为[1,9*4,height,width]
 
    return rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights #返回所有的ground truth值
    
 
def _unmap(data, count, inds, fill=0): #_unmap函数将图像内部的anchor映射回到生成的所有的anchor
    """ Unmap a subset of item (data) back to the original set of items (of
    size count) """
    if len(data.shape) == 1:
        ret = np.empty((count, ), dtype=np.float32)
        ret.fill(fill)
        ret[inds] = data
    else:
        ret = np.empty((count, ) + data.shape[1:], dtype=np.float32)
        ret.fill(fill)
        ret[inds, :] = data
    return ret
 
def _compute_targets(ex_rois, gt_rois): #_compute_targets函数计算anchor和对应的gt_box的位置映射
    """Compute bounding-box regression targets for an image."""
 
    assert ex_rois.shape[0] == gt_rois.shape[0]
    assert ex_rois.shape[1] == 4
    assert gt_rois.shape[1] == 5
 
    return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)

如何用anchor来计算proposal(分类与边框回归）？

经过上问的过程，将anchor给进行了筛选，只剩下了数量可控的anchor，这些anchor分为两类：一个是前景一个是后景，那么我就要进行计算了，计算分成两个部分分类和回归。因为我们作目标检测，第一，就是分清楚我们的目标为，第二，就是将目标物准确的框出来，所以要经过分类和回归。

先说分类：

因为之前就已经计算出了前景和背景，也就是正负样本，那么我们之间用一个softmax来进行，就完成了分类的训练。这里需要提一嘴的是，这个多分类是多个二分类，13*13*256->13*13*18,13*13是featuremap的长宽，18代表着9个anchor每个对应的正负两个类别，集9*2=18，那么对每个anchor判断前景背景的分析。rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_labels))

从代码可以看出是sparse_softmax_cross_entropy_with_logits函数来处理（该函数分成两个部分，先计算softmax,再计算Cross-Entropy,先多分类再二分类）


def rpn_cls_loss(rpn_cls_score,rpn_labels):
    '''
    Calculate the Region Proposal Network classifier loss. Measures how well 
    the RPN is able to propose regions by the performance of its "objectness" 
    classifier.
    
    Standard cross-entropy loss on logits
    '''
    with tf.variable_scope('rpn_cls_loss'):
        # input shape dimensions
        shape = tf.shape(rpn_cls_score)
        
        # Stack all classification scores into 2D matrix
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,3,1,2])
        rpn_cls_score = tf.reshape(rpn_cls_score,[shape[0],2,shape[3]//2*shape[1],shape[2]])
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,2,3,1])
        rpn_cls_score = tf.reshape(rpn_cls_score,[-1,2])
        
        # Stack labels
        rpn_labels = tf.reshape(rpn_labels,[-1]) #在这里先讲label展开成one_hot向量
        
        # Ignore label=-1 (Neither object nor background: IoU between 0.3 and 0.7)
		#在这里对应label中为-1值的位置排除掉score中的值，并且变成[-1,2]的形状方便计算交叉熵loss
        rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score,tf.where(tf.not_equal(rpn_labels,-1))),[-1,2])
		#在这里留下label中的非-1的值，表示对应的anchor与gt的IoU在0.7以上
        rpn_labels = tf.reshape(tf.gather(rpn_labels,tf.where(tf.not_equal(rpn_labels,-1))),[-1]) 
        
        # Cross entropy error 在这里计算交叉熵loss
        rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_labels))
    
    return rpn_cross_entropy

再看边框回归，边框回归的存在背景就是，如果我不做边框回归，倒也是可以将目标物给框出来，但框的大小和实际的GT不一样，就造成了不能精准的把目标物给识别出来，只能框个局部，这是不准确的，比如：

红框倒是也标出了飞机，但只是飞机的局部。

所以，有必要让红框和真实框（GT，绿色的）进行回归，保证尽可能的让输出框接近真值。


def rpn_bbox_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_inside_weights, rpn_outside_weights):
    '''
    Calculate the Region Proposal Network bounding box loss. Measures how well 
    the RPN is able to propose regions by the performance of its localization.
    lam/N_reg * sum_i(p_i^* * L_reg(t_i,t_i^*))
    lam: classification vs bbox loss balance parameter     
    N_reg: Number of anchor locations (~2500)
    p_i^*: ground truth label for anchor (loss only for positive anchors)
    L_reg: smoothL1 loss
    t_i: Parameterized prediction of bounding box
    t_i^*: Parameterized ground truth of closest bounding box
    '''    
    with tf.variable_scope('rpn_bbox_loss'):
        # Transposing
        rpn_bbox_targets = tf.transpose(rpn_bbox_targets, [0,2,3,1])
        rpn_inside_weights = tf.transpose(rpn_inside_weights, [0,2,3,1])
        rpn_outside_weights = tf.transpose(rpn_outside_weights, [0,2,3,1])
        
        # How far off was the prediction?
	#在这里将预测的tx,ty,th,tw和标签做减法，并乘以rpn_inside_weights，意思是只对positive anchor计算bbox loss
        diff = tf.multiply(rpn_inside_weights, rpn_bbox_pred - rpn_bbox_targets)
	#在这里计算smooth_L1结果
        diff_sL1 = smoothL1(diff, 3.0)
        
        # Only count loss for positive anchors. Make sure it's a sum.
	#在这里将上面的运算结果乘以rpn_outside_weights并且求和，同样是只对positive anchor计算bbox loss
 
        rpn_bbox_reg = tf.reduce_sum(tf.multiply(rpn_outside_weights, diff_sL1))
    
        # Constant for weighting bounding box loss with classification loss
	#在这里将边框误差再乘以一个lambda参数，作为最终的边框误差
        rpn_bbox_reg = cfg.TRAIN.RPN_BBOX_LAMBDA * rpn_bbox_reg
    
    return rpn_bbox_reg #返回最终的误差

这个利用的是回归的方式来进行的。首先，要清楚，这一步是建立再上一步分类之后的，或者说，这一步只拿表前景的anchor进行，寻找原始前景框和真值框的一种映射关系。上面的代码只是求两个框的损失，并未做怎么回归，回归的思路是先平移再缩放，具体见：https://blog.csdn.net/elaine_bao/article/details/60469036 ，这位大神的详解

所涉及的完整代码

引自：https://blog.csdn.net/jiongnima/article/details/79781792

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 17 15:05:05 2017
@author: Kevin Liang
Loss functions
"""
 
from .faster_rcnn_config import cfg
 
import tensorflow as tf
 
 
def rpn_cls_loss(rpn_cls_score,rpn_labels):
    '''
    Calculate the Region Proposal Network classifier loss. Measures how well 
    the RPN is able to propose regions by the performance of its "objectness" 
    classifier.
    
    Standard cross-entropy loss on logits
    '''
    with tf.variable_scope('rpn_cls_loss'):
        # input shape dimensions
        shape = tf.shape(rpn_cls_score)
        
        # Stack all classification scores into 2D matrix
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,3,1,2])
        rpn_cls_score = tf.reshape(rpn_cls_score,[shape[0],2,shape[3]//2*shape[1],shape[2]])
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,2,3,1])
        rpn_cls_score = tf.reshape(rpn_cls_score,[-1,2])
        
        # Stack labels
        rpn_labels = tf.reshape(rpn_labels,[-1]) #在这里先讲label展开成one_hot向量
        
        # Ignore label=-1 (Neither object nor background: IoU between 0.3 and 0.7)
		#在这里对应label中为-1值的位置排除掉score中的值，并且变成[-1,2]的形状方便计算交叉熵loss
        rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score,tf.where(tf.not_equal(rpn_labels,-1))),[-1,2])
		#在这里留下label中的非-1的值，表示对应的anchor与gt的IoU在0.7以上
        rpn_labels = tf.reshape(tf.gather(rpn_labels,tf.where(tf.not_equal(rpn_labels,-1))),[-1]) 
        
        # Cross entropy error 在这里计算交叉熵loss
        rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_labels))
    
    return rpn_cross_entropy
    
    
def rpn_bbox_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_inside_weights, rpn_outside_weights):
    '''
    Calculate the Region Proposal Network bounding box loss. Measures how well 
    the RPN is able to propose regions by the performance of its localization.
    lam/N_reg * sum_i(p_i^* * L_reg(t_i,t_i^*))
    lam: classification vs bbox loss balance parameter     
    N_reg: Number of anchor locations (~2500)
    p_i^*: ground truth label for anchor (loss only for positive anchors)
    L_reg: smoothL1 loss
    t_i: Parameterized prediction of bounding box
    t_i^*: Parameterized ground truth of closest bounding box
    '''    
    with tf.variable_scope('rpn_bbox_loss'):
        # Transposing
        rpn_bbox_targets = tf.transpose(rpn_bbox_targets, [0,2,3,1])
        rpn_inside_weights = tf.transpose(rpn_inside_weights, [0,2,3,1])
        rpn_outside_weights = tf.transpose(rpn_outside_weights, [0,2,3,1])
        
        # How far off was the prediction?
	#在这里将预测的tx,ty,th,tw和标签做减法，并乘以rpn_inside_weights，意思是只对positive anchor计算bbox loss
        diff = tf.multiply(rpn_inside_weights, rpn_bbox_pred - rpn_bbox_targets)
	#在这里计算smooth_L1结果
        diff_sL1 = smoothL1(diff, 3.0)
        
        # Only count loss for positive anchors. Make sure it's a sum.
	#在这里将上面的运算结果乘以rpn_outside_weights并且求和，同样是只对positive anchor计算bbox loss
 
        rpn_bbox_reg = tf.reduce_sum(tf.multiply(rpn_outside_weights, diff_sL1))
    
        # Constant for weighting bounding box loss with classification loss
	#在这里将边框误差再乘以一个lambda参数，作为最终的边框误差
        rpn_bbox_reg = cfg.TRAIN.RPN_BBOX_LAMBDA * rpn_bbox_reg
    
    return rpn_bbox_reg #返回最终的误差

如何根据前景anchor和GT作Bounding-box regression（边框回归）？

整个过程从几何的角度看，无非就是平移和放缩，即针对的对象就是（x,y,w,h）这四个值了，那么当实际框和前景框相差较小的情况下（IOU>0.6）,可以认为这是一种线性变换，那么就可以用线性回归来建模对窗口进行微调。

线性回归就是给定输入的特征向量X，学习一组参数W，使得经过线性回归后的值跟真实值Y(Ground Truth)非常接近。即：

那我就之间对这四个量进行线性回归就行了，先定义损失函数：（真值-前景值），然后，用梯度下降法或者最小二乘法，得到最小的权值就行了。具体见：https://blog.csdn.net/elaine_bao/article/details/60469036 ，这位大神的详解

如何确定anchor该放在那几层合适呢？(影响anchor选取的因素有那些？)

anchor要能够最大的覆盖到整个感受野，如果所要检测的物体有大有小，那就需针对物体的大小来调整anchor，保证可以将其覆盖。放置anchor层的特征感受野应该跟anchor大小相匹配。感受野比anchor大太多不好，小太多也不好。如果感受野比anchor小很多，就好比只给你一只脚，让你说出这是什么鸟一样。如果感受野比anchor大很多，则好比给你一张世界地图，让你指出故宫在哪儿一样。

通常anchor需要覆盖训练集的所有目标，即每个groundtruth box都能匹配到一个anchor，因此理想情况下目标越小anchor应该越多越密集，才能覆盖所有的候选区域，目标越大anchor应该越少越稀疏，否则互相高度重叠造成冗余计算量。

感受野是什么？简单点理解：

某一层feature map(特性图)中某个位置的特征向量，是由前面某一层固定区域的输入计算出来的，那这个区域就是这个位置的感受野。

初始feature map层的感受野是1
每经过一个convkxk s1的卷积层，感受野 r = r + (k - 1)，常用k=3感受野 r = r + 2, k=5感受野r = r + 4
每经过一个convkxk s2的卷积层或max/avg pooling层，感受野 r = (r x 2) + (k -2)，常用卷积核k=3, s=2，感受野 r = r x 2 + 1，卷积核k=7, s=2, 感受野r = r x 2 + 5
每经过一个maxpool2x2 s2的max/avg pooling下采样层，感受野 r = r x 2
特殊情况，经过conv1x1 s1不会改变感受野，经过FC层和GAP层，感受野就是整个输入图像

引自：https://zhuanlan.zhihu.com/p/44106492