Libra R-CNN论文与代码解读

最新推荐文章于 2024-05-13 14:13:36 发布

DannisZgggg

最新推荐文章于 2024-05-13 14:13:36 发布

阅读量9.6k

点赞数 20

本文链接：https://blog.csdn.net/sinat_37145472/article/details/93903922

版权

一篇来自浙大、港中文、商汤的目标检测文章。论文链接
在这里插入图片描述

总览

论文主要讲述了三个贡献：

IoU-balanced sampling—— reducing the imbalance at sample，让选择的样本更 representative；
balanced feature pyramid—— reducing the imbalance at feature，更加有效地整合利用多尺度特征；
balanced L1 loss—— reducing the imbalance at objective，设计了一个更优的loss，引导整体训练更好的收敛；

1.IoU-balanced Sampling

在anchor-base的目标检测中，网络head输出大量的anchors（或者叫default boxes），这些box对于匹配到的ground truth有各自不同iou，而大部分iou都很小（负样本比例巨大），如果直接将所有正负样本送入loss的计算将会导致模型往背景的方向过拟合，因此前人做了很多解决正负样本不平衡的工作（例如在faster rcnn的RPN层中随机采样保证正负样本1：1；SSD中采用了难样本挖掘；OHEM等等），但是作者认为之前的方法多少还是有各自的缺点，比如OHEM对不干净的数据集不够鲁棒，focal loss不适合two-stage任务。

那么如何在避免上述问题的情况下，让模型更关注于此呢？

作者的思路：既然选择box的方式是sampling，那么sampling的时候能不能找到一些规律呢。因此做了下面的统计：
在这里插入图片描述
上面统计了随机采样和难样本随iou的分布。
可以知道：超过60%的hard negative与gt超过0.05的iou；但随机采样只有30%超过0.05iou；说明困难样本并不根据iou均匀分布在原有样本中，也就是两者分布不match！既然说要让模型更关注于hard example，那怎么结合这样的规律来做点事情呢？
作者的思路：分层抽样替换随机抽样
在这里插入图片描述
$K$ ：将原有对负样本的采样区间分成K个区间（不一定要均匀）；
$N$ ：总共采的负样本数；
$M_k$ ：每个区间sampling candidates数量；
$p_k$ ：最终算出每个区间采样的概率。

可以看出后两个变量反相关，candidates越少，越倾向于sample，这样做，表面上是有效提高了高iou区间的样本比例，但深层原因是给了一个先验的分布，将hard example与sample的分布match起来（注意这个分布也算是hyper-parameters，跟数据集有关的）。
虽然无法保证采的每一个样本都hard，但是不可否认的是，这是在不增加计算量的前提下，尽可能提高hard的比例。

mmdetection代码

mmdetection的配置文件对训练阶段采样是如此定义的：

train_cfg=dict(
    rpn=dict(sampler=dict(neg_pos_ub=5), allowed_border=-1),
    rcnn=dict(
        sampler=dict(
            _delete_=True,
            type='CombinedSampler',
            num=512,
            pos_fraction=0.25,
            add_gt_as_proposals=True,
            pos_sampler=dict(type='InstanceBalancedPosSampler'),
            neg_sampler=dict(
                type='IoUBalancedNegSampler',
                floor_thr=-1,
                floor_fraction=0,
                num_bins=3)))))

从配置文件中我们可以看到论文所提出的IoU Balanced Sampling采样方式位于RCNN head部分，并且针对的是负样本的采样。
以下代码定义了IoUBalancedNegSampler类。

涉及到sample_via_interval与_sample_neg两个成员，后者调用前者。后者有一个类为AssignResult的入参，该类定义可以查看链接，主要是用于存储预测框与真实框的匹配结果。
实现方式与论文中有些出入。代码中另外定义了一个IOU阈值floor_thr，低于floor_thr的随机采样floor_fraction比例的框，高于floor_thr才算论文中提出的IoU Balanced Sampling。

import numpy as np
import torch

from ..builder import BBOX_SAMPLERS
from .random_sampler import RandomSampler


@BBOX_SAMPLERS.register_module()
class IoUBalancedNegSampler(RandomSampler):
    """IoU Balanced Sampling.
    arXiv: https://arxiv.org/pdf/1904.02701.pdf (CVPR 2019)
    Sampling proposals according to their IoU. `floor_fraction` of needed RoIs
    are sampled from proposals whose IoU are lower than `floor_thr` randomly.
    The others are sampled from proposals whose IoU are higher than
    `floor_thr`. These proposals are sampled from some bins evenly, which are
    split by `num_bins` via IoU evenly.
	根据与gt的iou来采样proposals。
	所需“floor_fraction”比例的ROI从IOU低于“floor_thr”的proposal中随机采样。
	其余ROI从IOU高于“floor_thr”的proposal中采样，并且是根据bin均匀采样。
    Args:
        num (int): number of proposals.
        pos_fraction (float): fraction of positive proposals.
        floor_thr (float): threshold (minimum) IoU for IoU balanced sampling,
            set to -1 if all using IoU balanced sampling.
        floor_fraction (float): sampling fraction of proposals under floor_thr.
        num_bins (int): number of bins in IoU balanced sampling.
    """

    def __init__(self,
                 num,
                 pos_fraction,
                 floor_thr=-1,
                 floor_fraction=0,
                 num_bins=3,
                 **kwargs):
        super(IoUBalancedNegSampler, self).__init__(num, pos_fraction,
                                                    **kwargs)
        assert floor_thr >= 0 or floor_thr == -1
        assert 0 <= floor_fraction <= 1
        assert num_bins >= 1

        self.floor_thr = floor_thr
        self.floor_fraction = floor_fraction
        self.num_bins = num_bins

    def sample_via_interval(self, max_overlaps, full_set, num_expected):
        """Sample according to the iou interval.
        Args:
            max_overlaps (torch.Tensor): IoU between bounding boxes and ground
                truth boxes. 关于所有备选框
            full_set (set(int)): A full set of indices of boxes。
            num_expected (int): Number of expected samples。
        Returns:
            np.ndarray: Indices  of samples
        """

		# 计算IOU间隔
        max_iou = max_overlaps.max()
        iou_interval = (max_iou - self.floor_thr) / self.num_bins
        per_num_expected = int(num_expected / self.num_bins)

        sampled_inds = [] # 每个IOU区间存储的proposal索引
        for i in range(self.num_bins): # 遍历每个IOU区间
            start_iou = self.floor_thr + i * iou_interval
            end_iou = self.floor_thr + (i + 1) * iou_interval
            tmp_set = set(
                np.where(
                    np.logical_and(max_overlaps >= start_iou,
                                   max_overlaps < end_iou))[0])
            tmp_inds = list(tmp_set & full_set) # IOU位于该区间内的所有框的索引
            # 如果数量太多则采样per_num_expected个，否则全选
            if len(tmp_inds) > per_num_expected:
                tmp_sampled_set = self.random_choice(tmp_inds, per_num_expected)
            else:
                tmp_sampled_set = np.array(tmp_inds, dtype=np.int)
            sampled_inds.append(tmp_sampled_set)

        sampled_inds = np.concatenate(sampled_inds)
        # 如果采样得到的proposal数量少于num_expected，则在剩余没有采过的框中采
        if len(sampled_inds) < num_expected:
            num_extra = num_expected - len(sampled_inds)
            extra_inds = np.array(list(full_set - set(sampled_inds)))
            if len(extra_inds) > num_extra:
                extra_inds = self.random_choice(extra_inds, num_extra)
            sampled_inds = np.concatenate([sampled_inds, extra_inds])

        return sampled_inds

    def _sample_neg(self, assign_result, num_expected, **kwargs):
        """Sample negative boxes.
        Args:
            assign_result (:obj:`AssignResult`): The assigned results of boxes.
            num_expected (int): The number of expected negative samples
        Returns:
            Tensor or ndarray: sampled indices.
        """
        neg_inds = torch.nonzero(assign_result.gt_inds == 0, as_tuple=False)
		# AssignResult类中定义gt_inds == 0为未被匹配的框，即负样本
        if neg_inds.numel() != 0:
            neg_inds = neg_inds.squeeze(1)
        if len(neg_inds) <= num_expected:
            return neg_inds

        max_overlaps = assign_result.max_overlaps.cpu().numpy()
        # balance sampling for negative samples
        neg_set = set(neg_inds.cpu().numpy())

        if self.floor_thr > 0:
            # 低于floor_thr的框:随机采样
            floor_set = set(
                np.where( # np.where若仅输入一个array则返回非零索引，同torch.nonzero
                    np.logical_and(max_overlaps >= 0,
                                   max_overlaps < self.floor_thr))[0])
            # 高于floor_thr的框:IoU Balanced Sampling
            iou_sampling_set = set(
                np.where(max_overlaps >= self.floor_thr)[0])
        elif self.floor_thr == 0:
            floor_set = set(np.where(max_overlaps == 0)[0])
            iou_sampling_set = set(
                np.where(max_overlaps > self.floor_thr)[0])
        else:
            floor_set = set()
            iou_sampling_set = set(
                np.where(max_overlaps > self.floor_thr)[0])
            # for sampling interval calculation
            self.floor_thr = 0

        floor_neg_inds = list(floor_set & neg_set)
        iou_sampling_neg_inds = list(iou_sampling_set & neg_set)
        num_expected_iou_sampling = int(num_expected * (1 - self.floor_fraction))
        
        
        '''第一部分: iou balanced sampling'''
        # 如果iou balanced sampling的备选框冗余
        if len(iou_sampling_neg_inds) > num_expected_iou_sampling:
            if self.num_bins >= 2:
                iou_sampled_inds = self.sample_via_interval(
                    max_overlaps, set(iou_sampling_neg_inds),
                    num_expected_iou_sampling)
            else: # 如果只有一个bin则直接随机采样
                iou_sampled_inds = self.random_choice(
                    iou_sampling_neg_inds, num_expected_iou_sampling)
        # iou balanced sampling的备选框不足
        else:
            iou_sampled_inds = np.array(
                iou_sampling_neg_inds, dtype=np.int)
        
        '''第二部分: floor neg sampling'''
		# iou balanced sampling采样之后剩下的名额都给floor neg sampling
        num_expected_floor = num_expected - len(iou_sampled_inds)
        if len(floor_neg_inds) > num_expected_floor:
            sampled_floor_inds = self.random_choice(
                floor_neg_inds, num_expected_floor)
        else:
            sampled_floor_inds = np.array(floor_neg_inds, dtype=np.int)
        
        # 两种采样的结果concat
        sampled_inds = np.concatenate((sampled_floor_inds, iou_sampled_inds))
        return sampled_inds

2.Balanced Feature Pyramid

先前FPN、PANet、ZigZagNet等致力于特征融合的工作，都是涉及top-bottom,bottom-top等方式，更多的是关注相邻分辨率，并且非相邻层所包含的语义信息在信息融合过程中会被稀释一次，所以作者认为这样的过程也是imbalance的。
思路：以FPN（faster-rcnn）为基础，对四个level的特征进行rescale，integrate和refine进一步融合特征信息，最后再和原特征相加，增强原特征。
在这里插入图片描述
关键的refine操作使用的是kaiming大神的non-local。

non-local借鉴传统图像去噪算法，整合了全局信息，计算量少，并且输入输出维度相同，可以整合进目前各种baseline中（事实上已经有很多在这么做了），这边由于篇幅原因就不展开叙述。
而Libra RCNN利用FPN与non-local各自优势很好的解决了imbalance问题（有点像BN是不是）。

non_local复现

这部分代码参考mmdetection，代码还是非常清晰的，各位主要看一下计算过程。只截取部分代码，__ init__函数里面定义的self.g、self.theta、self.phi、self.conv_out 我都设置成了conv2d和gn组成的Sequential。

embedded_gaussian公式：

def embedded_gaussian(self, theta_x, phi_x):
    #[N, HxW, C] * [N, C, HxW] ->
    # pairwise_weight: [N, HxW, HxW]
    pairwise_weight = torch.matmul(theta_x, phi_x)
    if self.use_scale:
        # theta_x.shape[-1] is `self.inter_channels`
        pairwise_weight /= theta_x.shape[-1]**-0.5
    #pairwise_weight = pairwise_weight.softmax(dim=-1)
    pairwise_weight = torch.softmax(pairwise_weight,dim=-1)
    return pairwise_weight
    
def forward(self, x):
    n, _, h, w = x.shape
    # g_x: [N, HxW, C]
    g_x = self.g(x).view(n, self.inter_channels, -1)
    g_x = g_x.permute(0, 2, 1)
    # theta_x: [N, HxW, C]
    theta_x = self.theta(x).view(n, self.inter_channels, -1)
    theta_x = theta_x.permute(0, 2, 1)
    # phi_x: [N, C, HxW]
    phi_x = self.phi(x).view(n, self.inter_channels, -1)
    # pairwise_weight: [N, HxW, HxW]
    pairwise_weight = self.embedded_gaussian(theta_x, phi_x)
    # y: [N, HxW, C]
    y = torch.matmul(pairwise_weight, g_x)
    # y: [N, C, H, W]
    y = y.permute(0, 2, 1).reshape(n, self.inter_channels, h, w)
    output = x + self.conv_out(y)
    return output

BFP复现

这边就是整合FPN中的上下文信息进行的一系列操作。就放了forward函数。

def forward(self, inputs):
	assert len(inputs) == self.num_levels
	#{C2; C3; C4; C5} channel:256
	# step 1: 整合四个特征图，resize成相同维度
	#使用了F.adaptive_max_pool2d和F.upsample，注意pytorch版本的区别，所以可能要换一下函数。
	feats = []
	gather_size = inputs[self.refine_level].size()[2:]
	for i in range(self.num_levels):
	    input_size=inputs[i].size()[2:]
	    #print('inputs:',input_size,'gather_size:',gather_size)
	    if input_size[0] >  gather_size[0]:#i < self.refine_level:
	        gathered = F.adaptive_max_pool2d(
	            inputs[i], output_size=gather_size)
	    elif input_size[0] == gather_size[0]:#i == self.refine_level:
	        gathered = inputs[i]
	    else:
	        gathered = F.upsample(inputs[i], size=gather_size, mode='bilinear') #interpolate
	    feats.append(gathered)
	
	bsf = sum(feats) / len(feats) #取平均
	
	# step 2: 进行non-local,这边self.refine就是non-local
	bsf = self.refine(bsf)
	
	# step 3: refine后的feature加回去，并resize回原尺寸
	outs = []
	for i in range(self.num_levels):
	    out_size = inputs[i].size()[2:]
	    if out_size[0] > gather_size[0]:#i < self.refine_level:
	        #residual = F.interpolate(bsf, size=out_size, mode='nearest')
	        residual = F.upsample(bsf, size=out_size, mode='bilinear')
	    elif out_size[0] == gather_size[0]:#i == self.refine_level:
	        residual = bsf
	    else:
	        residual = F.adaptive_max_pool2d(bsf, output_size=out_size)
	    outs.append(residual + inputs[i])
	
	return tuple(outs)

3.Balanced L1 Loss

目标检测实质上是多任务学习（cls&reg），那么如何平衡两者的权重应该是一个值得探讨的话题。通常都是人为手动调整各任务之间的权重（比如对回归loss乘以系数），但是由于回归任务unbounded的特性，直接增大回归loss常导致对outliers更加敏感（类似于噪声）。这边把样本损失大于等于 1.0 的叫做 outliers，小于的叫做 inliers。这也不是胡乱猜测，经过统计，发现outliers贡献了70%以上的梯度，而大量的inliers只有30%的贡献。作者从损失函数的角度增大了inliers贡献的梯度，从而在分类、整体定位和准确定位方面实现更加平衡的训练。具体就是将原来的smooth L1 loss 的梯度替换为：
在这里插入图片描述
其中

γ可以调整梯度的上界，用以balance各任务所贡献的梯度。
梯度和损失函数如图：

可以看到随着α的减小，inliers的梯度能够很好地增强。

Balanced L1 Loss复现

def _balanced_l1_loss(bbox_pred, bbox_targets,alpha,gamma):
    '''bbox_pred, bbox_targets:[batch,num_boxes,4]'''
    diff=torch.abs(bbox_pred-bbox_targets)
    b = np.e**(gamma / alpha) - 1
    loss_box = torch.where(
        diff<1,
        alpha / b *(b * diff + 1) * torch.log(b * diff + 1) - alpha * diff,
        gamma * diff + gamma / b - alpha

    )
    return loss_box.mean()

其中 torch.where 用法：
第一个是判断条件，第二个是符合条件的设置值，第三个是不满足条件的设置值。

总结

三个方法都很干净，尤其是sampling和loss，即使没有改变网络结构仍然可以有效涨点。其实思路都是一样的，通过试验或统计看到潜在的imbalance现象，根据样本或loss或feature的分布来做一些事情，非常实用。

DannisZgggg

关注

20
点赞
踩
67

收藏

觉得还不错? 一键收藏
5
评论
Libra R-CNN论文与代码解读

一篇来自浙大、港中文、商汤的目标检测文章。现在CV领域各种模型真的是满天飞，这篇Libra RCNN在不大量增加模型复杂度的前提下还可以有效涨分，还是给人眼前一亮的。论文链接总览论文主要讲述了三个贡献：IoU-balanced sampling—— reducing the imbalance at sample，让选择的样本更 representative；balanced fe...
复制链接

扫一扫