faster rcnn pytorch 复现系列（三）： Anchor_target_layer.py

最新推荐文章于 2024-01-09 16:06:20 发布

Snoopy_Dream

最新推荐文章于 2024-01-09 16:06:20 发布

阅读量2.8k

点赞数 1

分类专栏： pytorch faster-rcnn.pytorch

本文链接：https://blog.csdn.net/e01528/article/details/83650284

版权

pytorch 同时被 2 个专栏收录

33 篇文章 10 订阅

订阅专栏

faster-rcnn.pytorch

18 篇文章 8 订阅

订阅专栏

注：此篇文章的基础知识可见链接：

pytoch faster rcnn复现系列（一） RPN层输入输出维度理解

faster rcnn pytorch 复现系列（二）：generate_anchors源码解析

RPN 预备编程知识（一） : Numpy VS Pytorch （ Anchor_target_layer ）

RPN 预备编程知识（二） : 函数篇章 bbox_transform.py

faster rcnn中 softmax，softmax loss和cross entropy的讲解

首先附上，参考的两篇caffe中faster rcnn的 Anchor_target_layer.py源码分析链接：

Faster rcnn代码理解（3）

r-cnn学习（六）：RPN及AnchorTargetLayer学习

然后写下pytorch中、Anchor_target_layer.py的解析，pytorch允许batchsize>1：

0. 变量名说明

1. 导入相关包

2. __init__初始化

3. 通过ravel、meshgrid合用，得到平移坐标，构造通过原始anchor+平移，得到所有anchors。

4. 去除过界的anchor留下，inds_inside+anchors

5. 通过overlaps制作labels，找到anchor对应的gt最大索引，和gt对应的anchor最大索引

6. 降采样（正样本或者负样本太多）

7. bbox_targets、bbox_inside_weights、bbox_inside_weights

8. 所有anchors赋上label、bbox_targets、bbox_inside_weights、bbox_outside_weights属性

9. _unmap见该函数：

10. 不需要反向传播

11. 手绘流程图

12. caffe 版本

补充. cnn 大局观

y^=wx+b，x是输入，w是特征权值向量。

w：往往我们说的3*3*256的卷积就是w了，

y^：通过卷积生成的（1,36，H，W）的就是y^，

y* ：gt对应的

L(y^,y*): y^和y*，做损失函数，通过优化器，迭代更新3*3*256的w卷积，学习到复杂的特征（就是w），来使得loss减少，预测的接近于真实的。

比如faster rcnn 中smooth l1 loss 就使得rpn_box_pred(预测的偏移)接近于rpn_box_target(真实的偏移),降低loss的方法，就是不断学习调整各个w，以学习复杂特征。（在这个过程中利用gt在anchor_target_layer中，先删除边界框外的anchor，然后获得边界框内的rpn_box_inside，然后重新弄一个新的整个anchor的，统一赋值0，然后讲框内的rpn_box_inside 再利用之前的赋值，来确定哪一个是整个anchor中，正样本对应的anchor，只对他进行smooth l1 loss，也就是回归正anchor），

在这个过程中rpn_box_pred是直接在共享特征层进行3*3*256的卷积后，进行1*1*36的全卷积w（这个就是要学的w）生成y^（1,36，H，W），来存储预测的anchor。

而rpn_box_target则是通过寻找原始anchor最近的gt的方法，找到每个anchor对应的gt的偏移，用数学表达为：y*（1,36，H，W）的向量。其中（1,36，H，W）可以理解为(1,4,9*H,W)也就是每个anchor的4个偏移。

首先：让我学习到：

看源码

首先要弄清楚论文中的原理，
搞清楚这个py的输入和输出
分清楚流程，需要哪些步骤，求取中间值，然后构造中间函数【输入+输出】和中间变量。

0. 变量名说明

学习总结别人的命名。

shift_x和shift_y：分别对应x和y轴上的偏移量。
_num_anchors： anchor的3*3 9种数量，【最后的名词类似于num短小这种放在最前面】
gt_boxes：gt的boxes【保证不头重脚轻，这种时候长名词放在后面】
shifts：偏移量，只要原始anchor+shifts，就可以实现逐行算anchor，得到所有anchor
inds_inside：inside anchors的index
overlaps ：（B，M，N）表示B个batchsize，M表示内部anchor总数，N表示gt的个数。IOU
max_overlaps：每个内部anchor，找iou最大的gt （B，M）
argmax_overlaps：每个内部anchor，找iou最大的gt 的索引
gt_max_overlaps：每个内部gt，找iou最大的anchor （B，N）两种赋正样本的规则
offset：因为batchsize导致的计算bbox_targets需要输入(B*K,5)的gt_boxes，索引经过了reshape，索引会发生便宜。
bbox_targets：（1，4*9，H，W）把他看成（1，4, 9*h，w）每个anchor的偏移，边框回归的输入 M表示内部anchor的数量
bbox_inside_weights：label是1，它就是1。其他为0，只就算正样本的损失。#(B,M) 与caffe版本的不同， # use a single value instead of 4 values for easy index，pi*
bbox_outside_weights：#(B,M) 样本权重归一化后正负样本都是1/256.其他背景为0

1. 导入相关包

from .generate_anchors .表示同一文件路径下。

import torch
import torch.nn as nn
import numpy as np
import numpy.random as npr
from model.utils.config import cfg
from .generate_anchors import generate_anchors
from .bbox_transform import clip_boxes, bbox_overlaps_batch, bbox_transform_batch
import pdb

DEBUG = False
try:
    long        # Python 2
except NameError:
    long = int  # Python 3

2. init初始化

传入特征图与原图的倍数差，scales，ratios，初始化这三个参数，并generate_anchors生成tensor形式的初始anchor

class _AnchorTargetLayer(nn.Module):
    """
        给所有的anchors赋对应的gt目标，制造anchor二分类的labels和bbox的回归用的targets
        targets 包括:dx dy dw dh
    """ #                   16      8 16 32 0.5 1 2   
    def __init__(self, feat_stride, scales, ratios):
        super(_AnchorTargetLayer, self).__init__()
        self._feat_stride = feat_stride
        self._scales = scales
        anchor_scales = scales
        #!这里从np转为了float_tensor，方便运用torch中函数
        #传入的scales和ratios是元组形式，需要np.array转换
        self._anchors =torch.from_numpy(generate_anchors(scales=np.array(anchor_scales), ratios=np.array(ratios))).float() #float_tensor
        self._num_anchors = self._anchors.size(0)
        # 是否允许包括少部分边界外的anchor
        self._allowed_border = 0  # default is 0

3. 通过ravel、meshgrid合用，得到平移坐标，构造通过原始anchor+平移，得到所有anchors。

shift_x和shift_y分别对应x和y轴上的偏移量，用在之前说过的用generate_anchors（）函数生成的最左上角的anchors上，对其进行偏移，从而获得所有图像上的anchors；all_anchors用来存储所有这些anchors，total_anchors用来存储这些anchors的数量K×A,其中，K是输入图像的num,A是一幅图像上anchor的num；之后作者还对这些anchors进行了筛选，超出图像边界的anchors都将其丢弃～继续：

    def forward(self, input):
        """ 
        对于公共特征图(H, W)，在每一个位置i，生成9个anchors
        超出边界的去掉
        """
        #input_>torch.Tensor
        rpn_cls_score = input[0]#map存有特征图的W H
        gt_boxes = input[1]#boxes labels[1,5]
        im_info = input[2]#(im_info[0][1])原图w(im_info[0][0])原图h
        num_boxes = input[3]

        # map of shape (..., H, W) torch里的size和shape一样
        height, width = rpn_cls_score.size(2), rpn_cls_score.size(3)
        #输如多少个gt
        batch_size = gt_boxes.size(0)

        feat_height, feat_width = rpn_cls_score.size(2), rpn_cls_score.size(3)
        #见RPN编程预备知识
        shift_x = np.arange(0, feat_width) * self._feat_stride
        shift_y = np.arange(0, feat_height) * self._feat_stride
        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
        shifts = torch.from_numpy(np.vstack((shift_x.ravel(), shift_y.ravel(),
                                  shift_x.ravel(), shift_y.ravel())).transpose())
        # tensor.contiguous() → Tensor
        #返回一个内存连续的有相同数据的tensor，如果原tensor内存连续则返回原tensor
        #shifts.view need 内存连续，可以改成reshape函数，就不需要这一步了。
        #type_as(tesnor)将tensor投射为参数给定tensor类型并返回。 如果tensor已经是正确的类型则不会执行操作。等效于：self.type(tensor.type())
        #shifts int_>float
        shifts = shifts.contiguous().type_as(rpn_cls_score).float()

        # add A anchors (1, A, 4) to
        # cell K shifts (K, 1, 4) to get
        # shift anchors (K, A, 4)
        # reshape to (K*A, 4) shifted anchors

        A = self._num_anchors
        K = shifts.size(0)
        self._anchors = self._anchors.type_as(gt_boxes) #int_tensor_>float_tensor
        #board
        all_anchors = self._anchors.view(1, A, 4) + shifts.view(K, 1, 4)
        all_anchors = all_anchors.view(K * A, 4)

        total_anchors = int(K * A)

4. 去除过界的anchor留下，inds_inside+anchors，没有anchors[np.where()[0],:]方便

    total_anchors = int(K * A)
        #没有过界的anchors索引   xmin>=0 ymin>=0  xmax<w+0 ymax<h+0
        keep = ((all_anchors[:, 0] >= -self._allowed_border) &
                (all_anchors[:, 1] >= -self._allowed_border) &
                (all_anchors[:, 2] < long(im_info[0][1]) + self._allowed_border) &
                (all_anchors[:, 3] < long(im_info[0][0]) + self._allowed_border))

        #inds_inside ：没有过界的anchors索引
        inds_inside = torch.nonzero(keep).view(-1)

        #anchors：没有过界的anchors
        # keep only inside anchors
        anchors = all_anchors[inds_inside, :]

5. 通过overlaps制作labels，找到anchor对应的gt最大索引，和gt对应的anchor最大索引

这一部分主要就是获得这些anchors和对应gt的最大重叠率的情况，以及正样本的划分标准：

a.对于每一个gt，重叠率最大的那个anchor为fg；

b,对于每一个anchor，与gts中最大重叠率大于0.7的为fg；

cfg.TRAIN.RPN_CLOBBER_POSITIVE则涉及到一种情况，即如果最大重叠率小于cfg.TRAIN.RPN_NEGATIVE_OVERLAP=0.3,则到底正还是负，这里的cfg.TRAIN.RPN_CLOBBER_POSITIVE默认是False；

       # label: 1 is positive, 0 is negative, -1 is dont care
        labels = gt_boxes.new(batch_size, inds_inside.size(0)).fill_(-1) #labels (B*M)
        bbox_inside_weights = gt_boxes.new(batch_size, inds_inside.size(0)).zero_()
        bbox_outside_weights = gt_boxes.new(batch_size, inds_inside.size(0)).zero_()
        #返回（B，M，N）表示B个batchsize，M表示内部anchor总数，N表示gt的个数。
        overlaps = bbox_overlaps_batch(anchors, gt_boxes)
        #axis=2，意味着从（B，M，N）的 N 个gts中找出最大的iou 的 gt
        #返回（B，M）
        max_overlaps, argmax_overlaps = torch.max(overlaps, 2)
        #axis=1，意味着从（B，M，N）的 M 个anchor中找出最大iou 的 anchor
        #返回（B，N）
        gt_max_overlaps, _ = torch.max(overlaps, 1)

        if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
            labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
        #如果gt与所有anchors相交最大的那一个iou = 0,
        gt_max_overlaps[gt_max_overlaps==0] = 1e-5
        #gt_max_overlaps.view(batch_size,1,-1)的维度：（B，1，N）
        #overlaps 的维度：（B，M，N）
        #在这里我认为没有必要expand_as，因为执行运算会自动广播啊？
        #eq 相等返回1，不相等返回0,不像np中的==返回true，false
        #torch.sum（...，2）返回（B，M）
	keep = torch.sum(overlaps.eq(gt_max_overlaps.view(batch_size,1,-1).expand_as(overlaps)), 2)#keep（B，M）
        
        if torch.sum(keep) > 0:
            #找出与gt相交最大且iou不为0的那个anchor，作为正样本
            labels[keep>0] = 1

        # fg label: above threshold IOU
        #如果一个anchor与gts相交的最大值>0.7，赋予正样本
        labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
        #默认FALse
        if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
            ##如果最大的anchor的iou都小于0.3，那么这个anchor是背景
            labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

6. 降采样（正样本或者负样本太多）

这一部分是说，如果我们得到的正样本或者负样本太多的话，那么就选取一定数量的，丢弃一定数量的anchors，应该是为了加速（这里的选取方法也很直接，就是随机选取），继续：

        #前景要求的数量
        num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
        
        sum_fg = torch.sum((labels == 1).int(), 1)#（B,）
        sum_bg = torch.sum((labels == 0).int(), 1)#（B,）
       #如果多了，随机删除 感觉不严谨
        for i in range(batch_size):
            # subsample positive labels if we have too many
            if sum_fg[i] > num_fg:
                fg_inds = torch.nonzero(labels[i] == 1).view(-1)
                # torch.randperm seems has a bug on multi-gpu setting that cause the segfault.
                # See https://github.com/pytorch/pytorch/issues/1868 for more details.
                # use numpy instead.
                #rand_num = torch.randperm(fg_inds.size(0)).type_as(gt_boxes).long()
                rand_num = torch.from_numpy(np.random.permutation(fg_inds.size(0))).type_as(gt_boxes).long()
                disable_inds = fg_inds[rand_num[:fg_inds.size(0)-num_fg]]
                labels[i][disable_inds] = -1

#           num_bg = cfg.TRAIN.RPN_BATCHSIZE - sum_fg[i]
            num_bg = cfg.TRAIN.RPN_BATCHSIZE - torch.sum((labels == 1).int(), 1)[i]
	    #RPN_BATCHSIZE=256 RPN_FG_FRACTION=0.5 
	    #正样本如果最开始就是不够128个（256*0.5），就不够，负样本会大于128
            # subsample negative labels if we have too many
            if sum_bg[i] > num_bg:
                bg_inds = torch.nonzero(labels[i] == 0).view(-1)
                #rand_num = torch.randperm(bg_inds.size(0)).type_as(gt_boxes).long()

                rand_num = torch.from_numpy(np.random.permutation(bg_inds.size(0))).type_as(gt_boxes).long()
                disable_inds = bg_inds[rand_num[:bg_inds.size(0)-num_bg]]
                labels[i][disable_inds] = -1

7. bbox_targets、bbox_inside_weights、bbox_inside_weights

bbox_targets是一个anchor与他最近的gt【ancho对应的gt最大索引】之前的回归四要素。tx,ty,tw,th

这一部分是生成bbox_targets、bbox_inside_weights、bbox_inside_weights；其中对于bbox_targets，它这里是调用了_compute_targets()函数，见bbox_transform_batch：.

而对于后两个bbox_inside_weights和bbox_outside_weights，函数中定义的是bbox_inside_weights初始化为n×4的0数组，然后其中正样本的坐标的权值均为1；而bbox_outside_weights同样的初始化，其中正样本和负样本都被赋值1/num(anchors的数量)

def _compute_targets_batch(ex_rois, gt_rois):
"""Compute bounding-box regression targets for an image."""

return bbox_transform_batch(ex_rois, gt_rois[:, :, :4])

        # gt_boxes (B,K,5) 
        offset = torch.arange(0, batch_size)*gt_boxes.size(1)#offset(batch_size,)
        # argmax_overlaps（B，M）         +  （B，1）    ！！索引需要
        argmax_overlaps = argmax_overlaps + offset.view(batch_size, 1).type_as(argmax_overlaps)
        # gt_boxes.view(-1,5) (B*K,5)所以需要offset
        # bbox_targets (batch_size, -1, 5)
        bbox_targets = _compute_targets_batch(anchors, gt_boxes.view(-1,5)[argmax_overlaps.view(-1), :].view(batch_size, -1, 5))

        # use a single value instead of 4 values for easy index.  
        #求出的偏移，是要看这个anchor离哪一个gt最近【那如果都不近呢，返回的应该是第一个gt的偏移】
        bbox_inside_weights[labels==1] = cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS[0]
        #默认RPN_POSITIVE_WEIGHT=-1
        if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
            #num_examples = torch.sum(labels[i] >= 0)
            # #样本权重归一化
            num_examples = torch.sum(labels[i] >= 0).item()#正负的样本总数目
            positive_weights = 1.0 / num_examples#正样本权重 1/总样本，感觉可以直接256上啊
            negative_weights = 1.0 / num_examples

        else:
            assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                    (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))

        bbox_outside_weights[labels == 1] = positive_weights
        bbox_outside_weights[labels == 0] = negative_weights

8. 所有anchors赋上label、bbox_targets、bbox_inside_weights、bbox_outside_weights属性

至于为什么先view然后permute，可以参考reshape和transpose的区别

 #利用inds_inside索引，建立一个所有anchors数组，按照labels中正负样本打1 0 标签，其余的fill-1
        labels = _unmap(labels, total_anchors, inds_inside, batch_size, fill=-1)
        bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, batch_size, fill=0)
        #(B,H*W)
        bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, batch_size, fill=0)
        bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, batch_size, fill=0)

        outputs = []
        #B,9*W*H
        labels = labels.view(batch_size, height, width, A).permute(0,3,1,2).contiguous()#B,9,h,w
        labels = labels.view(batch_size, 1, A * height, width)#B 1 9*H W
        outputs.append(labels)

        bbox_targets = bbox_targets.view(batch_size, height, width, A*4).permute(0,3,1,2).contiguous()#B,9*4,H,W
        outputs.append(bbox_targets)

        anchors_count = bbox_inside_weights.size(1)
        bbox_inside_weights = bbox_inside_weights.view(batch_size,anchors_count,1).expand(batch_size, anchors_count, 4)#

        bbox_inside_weights = bbox_inside_weights.contiguous().view(batch_size, height, width, 4*A)\
                            .permute(0,3,1,2).contiguous()##B,W,9*4,H

        outputs.append(bbox_inside_weights)

        bbox_outside_weights = bbox_outside_weights.view(batch_size,anchors_count,1).expand(batch_size, anchors_count, 4)
        bbox_outside_weights = bbox_outside_weights.contiguous().view(batch_size, height, width, 4*A)\
                            .permute(0,3,1,2).contiguous()
        outputs.append(bbox_outside_weights)

        return outputs

9. _unmap见该函数：

def _unmap(data, count, inds, batch_size, fill=0):
    """ Unmap a subset of item (data) back to the original set of items (of
    size count) """

    if data.dim() == 2:
        ret = torch.Tensor(batch_size, count).fill_(fill).type_as(data)
        ret[:, inds] = data
    else:
        ret = torch.Tensor(batch_size, count, data.size(2)).fill_(fill).type_as(data)
        ret[:, inds,:] = data
    return ret

之后会把这些属性信息经过reshape封装进该网络层即top[0]、top[1]、top[2]、top[3]中；之后：

10. 不需要反向传播

    def backward(self, top, propagate_down, bottom):
        """This layer does not propagate gradients."""
        pass

    def reshape(self, bottom, top):
        """Reshaping happens during the call to forward."""
        pass

由于该层不需要反向传播，所以backward函数也不需要写了，在前向传播中已经reshape了，就不用再写reshape函数了～

11. 手绘流程图

12. caffe版本的源码解析

# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------

import os
import caffe
import yaml
from fast_rcnn.config import cfg
import numpy as np
import numpy.random as npr
from generate_anchors import generate_anchors
from utils.cython_bbox import bbox_overlaps
from fast_rcnn.bbox_transform import bbox_transform

DEBUG = False

class AnchorTargetLayer(caffe.Layer):
    """
    Assign anchors to ground-truth targets. Produces anchor classification
    labels and bounding-box regression targets.
    """

    def setup(self, bottom, top):
        layer_params = yaml.load(self.param_str_)
        anchor_scales = layer_params.get('scales', (8, 16, 32))
        self._anchors = generate_anchors(scales=np.array(anchor_scales))
        self._num_anchors = self._anchors.shape[0]
        self._feat_stride = layer_params['feat_stride']

        if DEBUG:
            print 'anchors:'
            print self._anchors
            print 'anchor shapes:'
            print np.hstack((
                self._anchors[:, 2::4] - self._anchors[:, 0::4],
                self._anchors[:, 3::4] - self._anchors[:, 1::4],
            ))
            self._counts = cfg.EPS
            self._sums = np.zeros((1, 4))
            self._squared_sums = np.zeros((1, 4))
            self._fg_sum = 0
            self._bg_sum = 0
            self._count = 0

        # allow boxes to sit over the edge by a small amount
        self._allowed_border = layer_params.get('allowed_border', 0)

        height, width = bottom[0].data.shape[-2:]
        if DEBUG:
            print 'AnchorTargetLayer: height', height, 'width', width

        A = self._num_anchors
        # labels
        top[0].reshape(1, 1, A * height, width)
        # bbox_targets
        top[1].reshape(1, A * 4, height, width)
        # bbox_inside_weights
        top[2].reshape(1, A * 4, height, width)
        # bbox_outside_weights
        top[3].reshape(1, A * 4, height, width)

    def forward(self, bottom, top):
        # Algorithm:
        #
        # for each (H, W) location i
        #   generate 9 anchor boxes centered on cell i
        #   apply predicted bbox deltas at cell i to each of the 9 anchors
        # filter out-of-image anchors
        # measure GT overlap

        assert bottom[0].data.shape[0] == 1, \
            'Only single item batches are supported'

        # map of shape (..., H, W)
        height, width = bottom[0].data.shape[-2:]
        # GT boxes (x1, y1, x2, y2, label)
        gt_boxes = bottom[1].data
        # im_info
        im_info = bottom[2].data[0, :]

        if DEBUG:
            print ''
            print 'im_size: ({}, {})'.format(im_info[0], im_info[1])
            print 'scale: {}'.format(im_info[2])
            print 'height, width: ({}, {})'.format(height, width)
            print 'rpn: gt_boxes.shape', gt_boxes.shape
            print 'rpn: gt_boxes', gt_boxes

        # 1. Generate proposals from bbox deltas and shifted anchors
        shift_x = np.arange(0, width) * self._feat_stride
        shift_y = np.arange(0, height) * self._feat_stride
        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
        shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                            shift_x.ravel(), shift_y.ravel())).transpose()
        # add A anchors (1, A, 4) to
        # cell K shifts (K, 1, 4) to get
        # shift anchors (K, A, 4)
        # reshape to (K*A, 4) shifted anchors
        A = self._num_anchors
        K = shifts.shape[0]
        all_anchors = (self._anchors.reshape((1, A, 4)) +
                       shifts.reshape((1, K, 4)).transpose((1, 0, 2)))
        all_anchors = all_anchors.reshape((K * A, 4))
        total_anchors = int(K * A)

        # only keep anchors inside the image
        #注意np.where返回的是行+列，【0】表示行号
        inds_inside = np.where(
            (all_anchors[:, 0] >= -self._allowed_border) &
            (all_anchors[:, 1] >= -self._allowed_border) &
            (all_anchors[:, 2] < im_info[1] + self._allowed_border) &  # width
            (all_anchors[:, 3] < im_info[0] + self._allowed_border)    # height
        )[0]

        if DEBUG:
            print 'total_anchors', total_anchors
            print 'inds_inside', len(inds_inside)

        # keep only inside anchors
        #返回内部anchor
        anchors = all_anchors[inds_inside, :]
        if DEBUG:
            print 'anchors.shape', anchors.shape

        # label: 1 is positive, 0 is negative, -1 is dont care
        #创建一个(n,)的接近于0的数组
        labels = np.empty((len(inds_inside), ), dtype=np.float32)
        #fill全部变成-1
        labels.fill(-1)

        # overlaps between the anchors and the gt boxes
        # overlaps (ex, gt)     k表示gt的数量  N=len(inds_inside)，内部ANCHOR的数量
        # overlaps: (N, K) ndarray of overlap between boxes and query_boxes 
        #overlaps(m,n):m行，列，第m个anchor与第n个box的iou
        overlaps = bbox_overlaps(
            np.ascontiguousarray(anchors, dtype=np.float),
            np.ascontiguousarray(gt_boxes, dtype=np.float))
        #找到某个anchor与所有gt_box最大的overlaps【np.array】，按行找，返回列索引
            #(axis=1)返回（原行，） #(axis=0 返回（原列，）
        argmax_overlaps = overlaps.argmax(axis=1)#N行1列 argmax返回【np.array】
        #最大的overlaps值  overlaps【行索引，列索引】找出每一行对应的最大值！！！
        max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]#（N，）
        #按列取最大index，找出K个gt_box与哪一行的anchor的iou最大，返回的是行索引
        gt_argmax_overlaps = overlaps.argmax(axis=0)#找到的是每一个的第一个最大值gt_argmax_overlaps（k，）
        #写上行索引，然后np.arange（len（gt_box）），对应的iou的大小
        gt_max_overlaps = overlaps[gt_argmax_overlaps,#(K， )
                                   np.arange(overlaps.shape[1])]#(K， )
        #overlaps (N, K) gt_max_overlaps(K， )，数组广播！！！！！！1行k列填充成N行k列
                                    #找到所有的最大值，可能有相同大小的！！！
                                   #overlaps == gt_max_overlaps是 (N, K)的true false
        gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]#【0】行【1】列

        #RPN_CLOBBER_POSITIVES默认是FALSE
        if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
            # assign bg labels first so that positive labels can clobber them
            #如果最大的anchor的iou都小于0.3，那么这个anchor是背景
            labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0#max_overlaps （N，）广播
        #与每个gt相交最大的，可能不止一个，标签设置为1
        # fg label: for each gt, anchor with highest overlap
        labels[gt_argmax_overlaps] = 1
        #iou>0.7设置为前景
        # fg label: above threshold IOU
        labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1

        if cfg.TRAIN.RPN_CLOBBER_POSITIVES:#默认是false，如果是true的话，如果同时满足正负样本条件，弄成负样本。
            # assign bg labels last so that negative labels can clobber positives
            labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

# subsample positive labels if we have too many
        num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
        
        fg_inds = np.where(labels == 1)[0]#找到所有正样本的行号
        #如果多了，随机删除 感觉不严谨
        if len(fg_inds) > num_fg:
            disable_inds = npr.choice(
                fg_inds, size=(len(fg_inds) - num_fg), replace=False)
            labels[disable_inds] = -1

# subsample negative labels if we have too many，
#RPN_BATCHSIZE=256 RPN_FG_FRACTION=0.5 
#正样本如果最开始就是不够128个（256*0.5），就不够，负样本会大于128
        num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)#正样本比较少的情况
        bg_inds = np.where(labels == 0)[0]
        
        if len(bg_inds) > num_bg:
            disable_inds = npr.choice(
                bg_inds, size=(len(bg_inds) - num_bg), replace=False)
            labels[disable_inds] = -1#随机去除背景，当做不关系的对象
            #print "was %s inds, disabling %s, now %s inds" % (
                #len(bg_inds), len(disable_inds), np.sum(labels == 0))

        bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)
        #求出的偏移，是要看这个anchor离哪一个gt最近【那如果都不近呢，返回的应该是第一个gt的偏移】
        bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])#argmax_overlaps所有anchor对应的最大iou的gtbox的索引

        bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
       # ！！特别注意下类似的切片 == 的赋值方式的学习#前景的权重为1.0
        bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)#1.0 1.0 1.0 1.0

        bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
        
        
        #默认RPN_POSITIVE_WEIGHT=-1
        if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
            #样本权重归一化
            # uniform weighting of examples (given non-uniform sampling)
            num_examples = np.sum(labels >= 0)#正负的样本总数目
            positive_weights = np.ones((1, 4)) * 1.0 / num_examples#正样本权重 1/总样本，感觉可以直接256上啊
            negative_weights = np.ones((1, 4)) * 1.0 / num_examples#负样本权重 1/总样本
        #反而觉得RPN_POSITIVE_WEIGHT=0.5更合适些。。
        else:
            assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                    (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
            positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                                np.sum(labels == 1))#自己设置正负样本的权重，
            negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                                np.sum(labels == 0))
                                
        bbox_outside_weights[labels == 1, :] = positive_weights
        bbox_outside_weights[labels == 0, :] = negative_weights

        if DEBUG:
            self._sums += bbox_targets[labels == 1, :].sum(axis=0)
            self._squared_sums += (bbox_targets[labels == 1, :] ** 2).sum(axis=0)
            self._counts += np.sum(labels == 1)
            means = self._sums / self._counts
            stds = np.sqrt(self._squared_sums / self._counts - means ** 2)
            print 'means:'
            print means
            print 'stdevs:'
            print stds

        # map up to original set of anchors
        #total_anchors所有anchor的数量

        #利用inds_inside索引，建立一个所有anchors数组，按照labels中正负样本打1 0 标签，其余的fill-1
        labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
        bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
        bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
        bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)

        if DEBUG:
            print 'rpn: max max_overlap', np.max(max_overlaps)
            print 'rpn: num_positive', np.sum(labels == 1)
            print 'rpn: num_negative', np.sum(labels == 0)
            self._fg_sum += np.sum(labels == 1)
            self._bg_sum += np.sum(labels == 0)
            self._count += 1
            print 'rpn: num_positive avg', self._fg_sum / self._count
            print 'rpn: num_negative avg', self._bg_sum / self._count

        #变形赋值top[0]
        # labels
        labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
        labels = labels.reshape((1, 1, A * height, width))
        top[0].reshape(*labels.shape)
        top[0].data[...] = labels

        # bbox_targets
        bbox_targets = bbox_targets \
            .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
        top[1].reshape(*bbox_targets.shape)
        top[1].data[...] = bbox_targets

        # bbox_inside_weights
        bbox_inside_weights = bbox_inside_weights \
            .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
        assert bbox_inside_weights.shape[2] == height
        assert bbox_inside_weights.shape[3] == width
        top[2].reshape(*bbox_inside_weights.shape)
        top[2].data[...] = bbox_inside_weights

        # bbox_outside_weights
        bbox_outside_weights = bbox_outside_weights \
            .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
        assert bbox_outside_weights.shape[2] == height
        assert bbox_outside_weights.shape[3] == width
        top[3].reshape(*bbox_outside_weights.shape)
        top[3].data[...] = bbox_outside_weights

    def backward(self, top, propagate_down, bottom):
        """This layer does not propagate gradients."""
        pass

    def reshape(self, bottom, top):
        """Reshaping happens during the call to forward."""
        pass



def _unmap(data, count, inds, fill=0):
    """ Unmap a subset of item (data) back to the original set of items (of
    size count) """
    if len(data.shape) == 1:
        ret = np.empty((count, ), dtype=np.float32)
        ret.fill(fill)
        ret[inds] = data
    else:
        ret = np.empty((count, ) + data.shape[1:], dtype=np.float32)
        ret.fill(fill)
        ret[inds, :] = data
    return ret



def _compute_targets(ex_rois, gt_rois):
    """Compute bounding-box regression targets for an image."""

    assert ex_rois.shape[0] == gt_rois.shape[0]#都是N行
    assert ex_rois.shape[1] == 4
    assert gt_rois.shape[1] == 5# # GT boxes (x1, y1, x2, y2, label)

    return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)