Faster-RCNN 代码阅读笔记(二)

最新推荐文章于 2024-01-07 23:13:58 发布

原创最新推荐文章于 2024-01-07 23:13:58 发布 · 416 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #神经网络 #计算机视觉

论文代码阅读笔记专栏收录该内容

10 篇文章

订阅专栏

Faster-RCNN 代码阅读笔记(二)

代码链接:https://github.com/chenyuntc/simple-faster-rcnn-pytorch

先放出一张图，我觉得总结的不错，来自于这个博客

3. VGG16RoIHead

上一篇阅读笔记已经知道，RPN输出了2000个RoIs区域传入到RoIHead中。首先通过RoI pooling层使每个RoI生成固定尺寸的feature map,进入到后面可利用全连接操作来进行目标分类和定位。换句话说，ROI Pooling 就是将不同大小的roi 池化成大小相同的feature map，利于输出到后续的全连接网络中。


head = VGG16RoIHead(
            n_class=n_fg_class + 1,  # 21
            roi_size=7,
            spatial_scale=(1. / self.feat_stride), # 1/16
            classifier=classifier
        )

class VGG16RoIHead(nn.Module):
    def __init__(self, n_class, roi_size, spatial_scale, classifier):
        super(VGG16RoIHead, self).__init__()

        self.classifier = classifier
        self.cls_loc = nn.Linear(4096, n_class * 4)
        self.score = nn.Linear(4096, n_class)

        normal_init(self.cls_loc, 0, 0.001)
        normal_init(self.score, 0, 0.01)

        self.n_class = n_class
        self.roi_size = roi_size
        self.spatial_scale = spatial_scale
        self.roi = RoIPool((self.roi_size, self.roi_size), self.spatial_scale)

    def forward(self, x, rois, roi_indices):
        roi_indices = at.totensor(roi_indices).float()
        rois = at.totensor(rois).float()
        indices_and_rois = torch.cat([roi_indices[:, None], rois], dim=1)

        xy_indices_and_rois = indices_and_rois[:, [0, 2, 1, 4, 3]]
        indices_and_rois = xy_indices_and_rois.contiguous()
        
        pool = self.roi(x, indices_and_rois) # RoI pooling
        pool = pool.view(pool.size(0), -1)
        fc7 = self.classifier(pool) # classifier
        roi_cls_locs = self.cls_loc(fc7) # localization分支
        roi_scores = self.score(fc7) # classification分支
        return roi_cls_locs, roi_scores

这里值得注意的是RoI Pooling这个操作。这里会疑虑，RoI Pooling输入的是特征图，大小为 $60 \times 40$ ，而RoIs是RPN输出的2000个anchor box，是基于完整图像的anchor。那究竟是如何操作的呢？在实际训练的过程中，VGG16RoIHead的调用如下:


# proposal_target_creator = ProposalTargetCreator()
sample_roi, gt_roi_loc, gt_roi_label = self.proposal_target_creator(
            roi,  # rpn网络输出的RoIs
            at.tonumpy(bbox),
            at.tonumpy(label),
            self.loc_normalize_mean,
            self.loc_normalize_std
            )

sample_roi_index = torch.zeros(len(sample_roi))

roi_cls_loc, roi_score = self.faster_rcnn.head(
            features, # extractor输出的
            sample_roi,
            sample_roi_index
            )

也就是说，它输入的RoIs，并不是由RPN网络直接生成的。 而是由一个类ProposalTargetCreator生成的。下面讲一下ProposalTargetCreator类。

(1) ProposalTargetCreator

可以看到，上面proposal_target_creator需要RPN网络输入的RoIs。那我们需要探讨以下这个ProposalTargetCreator类所做的事情了。这个类定义了__call__函数，使得对象可以像函数一样调用。

目的：为2000个rois挑选出128个正负样本，并且赋予这128个正负样本的ground truth

输入：2000个RPN输出的rois、一个batch（一张图）中所有的bbox ground truth（R，4）、对应bbox所包含的label（R，1）（VOC2007来说20类0-19）

输出：128个sample roi（128，4）、128个gt_roi_loc（128，4）、128个gt_roi_label（128，1）

class ProposalTargetCreator(object):
    def __init__(self, n_sample=128,
                 pos_ratio=0.25, pos_iou_thresh=0.5,
                 neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0):
        self.n_sample = n_sample
        self.pos_ratio = pos_ratio
        self.pos_iou_thresh = pos_iou_thresh
        self.neg_iou_thresh_hi = neg_iou_thresh_hi
        self.neg_iou_thresh_lo = neg_iou_thresh_lo  # NOTE:default 0.1 in py-faster-rcnn

    def __call__(self, roi, bbox, label,
                 loc_normalize_mean=(0., 0., 0., 0.),
                 loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):
        n_bbox, _ = bbox.shape
        #首先将2000个roi和m个bbox给concatenate了一下成为新的roi（2000+m，4）。
        roi = np.concatenate((roi, bbox), axis=0)
        #n_sample = 128,pos_ratio=0.25，round 对传入的数据进行四舍五入
        pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)
        #计算每一个roi与每一个bbox的iou
        iou = bbox_iou(roi, bbox)
        #按行找到最大值，返回最大值对应的序号以及其真正的IOU。返回的是每个roi与**哪个**bbox的最大，以及最大的iou值
        gt_assignment = iou.argmax(axis=1)
        #每个roi与对应bbox最大的iou
        max_iou = iou.max(axis=1)
        
        # Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].
        # The label with value 0 is the background.
        gt_roi_label = label[gt_assignment] + 1  #从1开始的类别序号，给每个类得到真正的label(将0-19变为1-20)

        # Select foreground RoIs as those with >= pos_iou_thresh IoU.
        #同样的根据iou的最大值将正负样本找出来，pos_iou_thresh=0.5
        pos_index = np.where(max_iou >= self.pos_iou_thresh)[0] 
        #需要保留的roi个数（满足大于pos_iou_thresh条件的roi与64之间较小的一个）
        pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
        if pos_index.size > 0:
            pos_index = np.random.choice(
                pos_index, size=pos_roi_per_this_image, replace=False) #找出的样本数目过多就随机丢掉一些

        # Select background RoIs as those within
        # [neg_iou_thresh_lo, neg_iou_thresh_hi).
        neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &
                             (max_iou >= self.neg_iou_thresh_lo))[0] #neg_iou_thresh_hi=0.5，neg_iou_thresh_lo=0.0
        # #需要保留的roi个数（满足大于0小于neg_iou_thresh_hi条件的roi与64之间较小的一个）
        neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image 
        neg_roi_per_this_image = int(min(neg_roi_per_this_image,
                                         neg_index.size))
        if neg_index.size > 0:
            neg_index = np.random.choice(
                neg_index, size=neg_roi_per_this_image, replace=False) #找出的样本数目过多就随机丢掉一些

        # The indices that we're selecting (both positive and negative).
        keep_index = np.append(pos_index, neg_index)
        gt_roi_label = gt_roi_label[keep_index]
        # 负样本label 设为0
        gt_roi_label[pos_roi_per_this_image:] = 0  # negative labels --> 0
        sample_roi = roi[keep_index]

        #那么此时输出的128*4的sample_roi就可以去扔到 RoIHead网络里去进行分类与回归了。同样， RoIHead网络利用这sample_roi+featue为输入，输出是分类（21类）和回归（进一步微调bbox）的预测值，那么分类回归的groud truth就是ProposalTargetCreator输出的gt_roi_label和gt_roi_loc。

        # Compute offsets and scales to match sampled RoIs to the GTs.
        #求这128个样本的groundtruth
        gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]]) 
        #ProposalTargetCreator首次用到了真实的21个类的label,且该类最后对loc进行了归一化处理，所以预测时要进行均值方差处理 
        gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
                       ) / np.array(loc_normalize_std, np.float32))  

        return sample_roi, gt_roi_loc, gt_roi_label

RPN输出的2000个roi作为ProposalTargetCreator的输入，同时输入的还有一张图上的所有bbox、label的ground trurh。如果此输入图像里有5个object，那么就有5个bbox和5个label。那么这时的三个输入可能是：下面我们将使用此例R=5来分析：

代码首先将2000个roi和5个bbox给concatenate了一下成为新的roi(2005，4)。我们只需要从这新的2005个中挑选128个roi出来来为Faster-RCNN提供训练sample。首先还是调用函数bbox_iou来求roi与bbox的iou矩阵，为(2005，5)。然后记录每行的最大值、最大值索引，即这2005个roi和5个bbox里某个roi最大，那么这个roi就属于某个label。下面就是选128个roi，记录下其中的索引，前32个为正类，后96个为负类。 然后利用这128个索引值keep_index就得到了128个sample roi，128个gt_label，将sample_roi和其所属bbox经函数bbox2loc就得到了128个gt_loc。

其次，这个类调用了bbox_iou和bbox2loc函数，我们来看一下具体实现:

bbox_iou实现的是交并比IOU，即任给两组bbox（N,4 与 K,4），输出数组shape为（N,K），即求出两组bbox中两两的交并比。

def bbox_iou(bbox_a, bbox_b):
    if bbox_a.shape[1] != 4 or bbox_b.shape[1] != 4:
        raise IndexError
    
    # top left
    tl = np.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
    # bottom right
    br = np.minimum(bbox_a[:, None, 2:], bbox_b[:, 2:])

    area_i = np.prod(br - tl, axis=2) * (tl < br).all(axis=2)
    area_a = np.prod(bbox_a[:, 2:] - bbox_a[:, :2], axis=1)
    area_b = np.prod(bbox_b[:, 2:] - bbox_b[:, :2], axis=1)

    return area_i / (area_a[:, None] + area_b - area_i)

我们得到的128个RoIs是用于训练网络的。也就是说，上面的128个RoIs需要相对应的ground-truth。根据RCNN论文的公式:

$w_{*} = \argmin_{\hat{w}_*}\sum_i^N(t^i_*-\hat{w}_*^T\phi_5(P^i))^2+\lambda||\hat{w}_*||^2$

也就是说，我们要找到相对应的 $t_*^i$ 来训练网络。而bbox2loc就是把128个sample_roi和其对应的groundtruth bounding box

gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]])

公式如下:

$t_x = (G_x - P_x)/P_w \\ t_y = (G_y - P_y)/P_h \\ t_w = log(G_w/P_w) \\ t_h = log(G_h/P_h)$

def bbox2loc(src_bbox, dst_bbox):
    height = src_bbox[:, 2] - src_bbox[:, 0]
    width  = src_bbox[:, 3] - src_bbox[:, 1]
    ctr_y  = src_bbox[:, 0] + 0.5 * height
    ctr_x  = src_bbox[:, 1] + 0.5 * width

    base_height = dst_bbox[:, 2] - dst_bbox[:, 0]
    base_width  = dst_bbox[:, 3] - dst_bbox[:, 1]

    base_ctr_y  = dst_bbox[:, 0] + 0.5 * base_height
    base_ctr_x  = dst_bbox[:, 1] + 0.5 * base_width

    eps = np.finfo(height.dtype).eps
    height = np.maximum(height, eps)
    width  = np.maximum(width, eps)

    dy = (base_ctr_y - ctr_y) / height
    dx = (base_ctr_x - ctr_x) / width
    dh = np.log(base_height / height)
    dw = np.log(base_width / width)

    loc = np.vstack((dy, dx, dh, dw)).transpose()
    return loc

Conclusion

这一章主要讲的是VGG16RoIHead。我们知道，从RPN网络中输出的2000个RoIs是不直接进入head网络进行训练的，而是现挑选出32个为正类，96个为负类(总共128个样本)进行训练。这个挑选的代码是由ProposalTargetCreator实现的。且ProposalTargetCreator还会同时为这128个样本赋予它们的ground-truth。

而输出的128个RoIs会通过ROI Pooling变成大小相同的特征图之后，输入到两个分支来进行回归和分类。