四、RetinaNet论文总结

最新推荐文章于 2024-08-05 19:57:46 发布

一个热爱学习的深度渣渣

最新推荐文章于 2024-08-05 19:57:46 发布

阅读量1.6k

点赞数 1

分类专栏：目标检测文章标签：深度学习 pytorch 机器学习

本文链接：https://blog.csdn.net/weixin_40620310/article/details/120822207

版权

目标检测专栏收录该内容

5 篇文章 1 订阅

订阅专栏

论文导读

论文标题：Focal Loss for Dense Object Detection；

提出疑问：为什么One-stage精度低？

最主要的原因为：在一张图片中，目标所占的比例远小于北京所占的比例，也就是正负样本不均衡，负样本过多；

这会造成以下两个问题：

1、针对负样本来说，数量过多造成它的loss太大，以至于主导了损失函数，不利于收敛；

2、针对单个负样本来说，简单负样本的loss很小，反向计算时梯度小，梯度小造成简单负样本对参数的收敛作用有限，我们需要loss大的对参数收敛大的样本，也就是困难样本；

Fast R-CNN中解决的办法:

1、根据IOU的大小来调整正负样本的比例，比如设置成1：3，防止了负样本过多的情况；（在SSD中的先验框匹配也有用到该策略）

2、在RPN中能根据前景分数过滤大量北京概率高的简单样本；

RetinaNet中的解决办法：

采用Focal Loss损失函数，解决单阶段检测网络正负难易样本比重严重失衡的问题，这也是RetinaNet的核心；

论文概要

摘要

1、单阶段算法精度不高的主要原因是样本不均衡；

2、提出了Focal Loss能够解决负样本过多的问题；

3、训练了一个简单高效的RetinaNet来验证Focal Loss的有效性；

4、实验证明RetinaNet能够与之前的单阶段检测器速度匹配，同时超越了当时最先进的两阶段检测网络；

Focal Loss讲解

首先回顾一下常用的分类损失函数交叉熵损失函数：

在这里插入图片描述

困难样本：

简单样本（即置信度很高的样本）对模型的提升非常小，应该关注那些困难样本；

Focal Loss的公式：

在这里插入图片描述

例子：比如取γ=2，如果p=0.9，那么(1-0.9)² =0.01，loss降低了100倍；

对比效果图：

在这里插入图片描述

由上图可以看出，当γ取到5的时候，置信度高的样本产生的loss基本为零；

最终的公式和参数值：

在这里插入图片描述

网路结构

在这里插入图片描述

RetinaNet本身就是一个FPN的改进网络；
其中后面一部分Detection backend采用SubNet的结构，结构图如下：

在这里插入图片描述

其中由Classification SubNet和box SubNet组成，Focal Loss是使用在分类模块中；

论文总结

关键点：

1、Focal Loss；

2、RetinaNet网络结构；

创新点：

1、解决正负难易样本不平衡的问题；

2、提高单阶段目标检测效果，能够达到两阶段的精度；

启发点：

该如何提高RetinaNet检测的速度，是否需要用到所有的anchor；

论文代码

1、构建自顶向下的FPN网络，这里代码实现P3-P7网络：

class PyramidFeatures(nn.Module):
    def __init__(self, C3_size, C4_size, C5_size, feature_size=256):  # 与FPN的代码一致，通道数保持为256
        super(PyramidFeatures, self).__init__()

        # upsample C5 to get P5 from the FPN paper
        self.P5_1 = nn.Conv2d(C5_size, feature_size, kernel_size=1, stride=1, padding=0)  # C5通过1x1且stride=1的卷积做通道上的降维
        self.P5_upsampled = nn.Upsample(scale_factor=2, mode='nearest')  # 用于上采样至与C4的尺寸相同
        self.P5_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1)  # 通过3x3且stride=1的卷积输出特征为P5

        # add P5 elementwise to C4
        self.P4_1 = nn.Conv2d(C4_size, feature_size, kernel_size=1, stride=1, padding=0)  # C4通过1x1且stride=1的卷积做通道上的降维
        self.P4_upsampled = nn.Upsample(scale_factor=2, mode='nearest')  # 用于上采样至与C3的尺寸相同
        self.P4_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1)  # 通过3x3且stride=1的卷积输出特征为P4

        # add P4 elementwise to C3
        self.P3_1 = nn.Conv2d(C3_size, feature_size, kernel_size=1, stride=1, padding=0)  # C3通过1x1且stride=1的卷积做通道上的降维
        self.P3_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1)  # 通过3x3且stride=1的卷积输出特征为P3

        # "P6 is obtained via a 3x3 stride-2 conv on C5"
        self.P6 = nn.Conv2d(C5_size, feature_size, kernel_size=3, stride=2, padding=1)  # C5通过3x3且stride=2的卷积尺寸缩小到1/2^2，且直接输出为P6

        # "P7 is computed by applying ReLU followed by a 3x3 stride-2 conv on P6"
        self.P7_1 = nn.ReLU()  # 通过一个relu，添加非线性性
        self.P7_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=2, padding=1)  # 通过3x3且stride=2的卷积尺寸进一步缩小到1/2^2，且直接输出为P7

    def forward(self, inputs):
        C3, C4, C5 = inputs  # 与FPN不同，未使用C2的特征

        P5_x = self.P5_1(C5)  # 代表C5处的Lateral Connection
        P5_upsampled_x = self.P5_upsampled(P5_x)  # 上采样到与C4同等大小
        P5_x = self.P5_2(P5_x)  # 通过卷积结构获得最后输出的P5

        P4_x = self.P4_1(C4)  # 代表C4处的Lateral Connection
        P4_x = P5_upsampled_x + P4_x  # 与C5上采样后的特征相加
        P4_upsampled_x = self.P4_upsampled(P4_x)  # 上采样到与C3同等大小
        P4_x = self.P4_2(P4_x)  # 通过卷积结构获得最后输出的P4

        P3_x = self.P3_1(C3)  # 代表C3处的Lateral Connection
        P3_x = P3_x + P4_upsampled_x  # 与C4上采样后的特征相加
        P3_x = self.P3_2(P3_x)  # 通过卷积结构获得最后输出的P3

        P6_x = self.P6(C5)  # 卷积C5直接获得P6

        P7_x = self.P7_1(P6_x)
        P7_x = self.P7_2(P7_x)  # 通过relu加卷积获得P7

        return [P3_x, P4_x, P5_x, P6_x, P7_x]

分类子网络和回归子网络都是对P7特征图做一些卷积操作，在这就不具体说明了；

2、Anchor的生成：

def generate_anchors(base_size=16, ratios=None, scales=None):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales w.r.t. a reference window.
    """

    if ratios is None:
        ratios = np.array([0.5, 1, 2])

    if scales is None:
        scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])

    # 计算Anchor的总数9
    num_anchors = len(ratios) * len(scales)

    # initialize output anchors  初始化输出的结果9×4的大小
    anchors = np.zeros((num_anchors, 4))

    # scale base_size
    # 复制成2行，3列 ,即（2，9）
    # 转置成（9，2），每行都是一组ratio和scale的组合，比例是base_size的
    anchors[:, 2:] = base_size * np.tile(scales, (2, len(ratios))).T

    # compute areas of anchors 其实2、3值是一样的
    areas = anchors[:, 2] * anchors[:, 3]

    # correct for ratios 实际2列上等于anchors[:, 2:]/sqrt（scales）而实际3列上等于anchors[:, 2:]×sqrt（scales）
    anchors[:, 2] = np.sqrt(areas / np.repeat(ratios, len(scales)))
    anchors[:, 3] = anchors[:, 2] * np.repeat(ratios, len(scales))

    # transform from (x_ctr, y_ctr, w, h) -> (x1, y1, x2, y2) 转换anchors的形式
    anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T
    anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T

    return anchors

3、Focal Loss实现代码：

框回归子网络的Loss中prediction（regression）与ground truth之间的关系与SSD的部分基本一致，比较有区别的地方在于loss的定义形式为如下分段函数的形式（从代码中获得，论文中未详细提及）
$\text{diff}=|\text{targets}-\text{pred}| \\ \text{loss}= \begin{cases} 0.5*9*\text{diff}^2 & \text{if}{\quad}\text{diff}<\frac{1}{9}\\ \text{diff}-\frac{0.5}{9} & \text{otherwise} \end{cases}$
首先是IOU的实现：

def calc_iou(a, b):  # 用于计算IoU的函数
    area = (b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1])
    iw = torch.min(torch.unsqueeze(a[:, 2], dim=1), b[:, 2]) - torch.max(torch.unsqueeze(a[:, 0], 1), b[:, 0])
    ih = torch.min(torch.unsqueeze(a[:, 3], dim=1), b[:, 3]) - torch.max(torch.unsqueeze(a[:, 1], 1), b[:, 1])
    iw = torch.clamp(iw, min=0)
    ih = torch.clamp(ih, min=0)
    ua = torch.unsqueeze((a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1]), dim=1) + area - iw * ih
    ua = torch.clamp(ua, min=1e-8)
    intersection = iw * ih
    IoU = intersection / ua

    return IoU

接着定义Focal Loss的类：

def forward(self, classifications, regressions, anchors, annotations):
    alpha = 0.25  # Focal Loss中的alpha和gamma与论文中的一致
    gamma = 2.0
    batch_size = classifications.shape[0]
    classification_losses = []
    regression_losses = []

    anchor = anchors[0, :, :]

    # 重新将anchors的值从左上坐标，右下坐标）转为（中心坐标，宽高）格式
    anchor_widths  = anchor[:, 2] - anchor[:, 0]
    anchor_heights = anchor[:, 3] - anchor[:, 1]
    anchor_ctr_x   = anchor[:, 0] + 0.5 * anchor_widths
    anchor_ctr_y   = anchor[:, 1] + 0.5 * anchor_heights

    for j in range(batch_size):  # 对于batch_size中的每一张图片，做以下处理

        classification = classifications[j, :, :]
        regression = regressions[j, :, :]

        bbox_annotation = annotations[j, :, :]
        bbox_annotation = bbox_annotation[bbox_annotation[:, 4] != -1]  # 取bbox_annotation的值不为-1的框

        classification = torch.clamp(classification, 1e-4, 1.0 - 1e-4)  # 将类别数值规范到[1e-4, 1.0 - 1e-4]，避免取对数时候出现问题

        if bbox_annotation.shape[0] == 0:  # 只计算classification_losses，不计算regression_losses，并执行完后跳过
            alpha_factor = torch.ones(classification.shape).cuda() * alpha

            alpha_factor = 1. - alpha_factor
            focal_weight = classification
            focal_weight = alpha_factor * torch.pow(focal_weight, gamma)

            bce = -(torch.log(1.0 - classification))

            # cls_loss = focal_weight * torch.pow(bce, gamma)
            cls_loss = focal_weight * bce
            classification_losses.append(cls_loss.sum())  # 有classification_losses
            regression_losses.append(torch.tensor(0).float())  # 但regression_losses为常数0

实际实现Loss的计算（接在Focal Loss类后面）：

IoU = calc_iou(anchors[0, :, :], bbox_annotation[:, :4])  # num_anchors x num_annotations

# 找到所有anchor IOU最大的真实框的索引以及该IOU大小
IoU_max, IoU_argmax = torch.max(IoU, dim=1)  # num_anchors x 1

#import pdb
#pdb.set_trace()

# 开始计算两个子网络的损失
targets = torch.ones(classification.shape) * -1  # (anchor_nums,class_num),初始全为-1

if torch.cuda.is_available():  # 判断是否有GPU，有则用
targets = targets.cuda()

targets[torch.lt(IoU_max, 0.4), :] = 0  # IOU<0.4为负样本，记为0

positive_indices = torch.ge(IoU_max, 0.5)  # IOU>=0.5为正样本，找到index

num_positive_anchors = positive_indices.sum()  # 正样本个数

assigned_annotations = bbox_annotation[IoU_argmax, :]  # 通过IoU_argmax找到对应的实际annotations为哪一个（anchor_nums,4）

# compute the loss for classification 计算分类子网络的损失
targets[positive_indices, :] = 0  # 将targets中正样本对应的类别全赋值为0
targets[positive_indices, assigned_annotations[positive_indices, 4].long()] = 1  # 通过查assigned_annotations第5位上的标签信息，实现one-hot的效果

if torch.cuda.is_available():  # 判断是否有GPU，有则用
alpha_factor = torch.ones(targets.shape).cuda() * alpha
else:
alpha_factor = torch.ones(targets.shape) * alpha

# torch.where的作用是[1]满足则[2]，不满足则[3]
alpha_factor = torch.where(torch.eq(targets, 1.), alpha_factor, 1. - alpha_factor)  # 正样本用alpha，负样本用1-alpha
focal_weight = torch.where(torch.eq(targets, 1.), 1. - classification, classification)  # 正样本用1-classification ，负样本用classification
focal_weight = alpha_factor * torch.pow(focal_weight, gamma)  # 对应文中的alpha×(1-classification)^gamma

bce = -(targets * torch.log(classification) + (1.0 - targets) * torch.log(1.0 - classification))  # 普通的Balanced Cross Entropy公式

# cls_loss = focal_weight * torch.pow(bce, gamma)
cls_loss = focal_weight * bce  # 将focal_weight与普通的Balanced Cross Entropy就可以得到Focal Loss

if torch.cuda.is_available():  # 如果targets不存在（为-1），此时的cls_loss置为常数0
cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros(cls_loss.shape).cuda())
else:
cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros(cls_loss.shape))

classification_losses.append(cls_loss.sum()/torch.clamp(num_positive_anchors.float(), min=1.0))  # 将classification loss求并除以num_positive_anchors的数目

# compute the loss for regression 计算回归框子函数的损失

if positive_indices.sum() > 0:  # 当存在positive_indices的时候进行计算
assigned_annotations = assigned_annotations[positive_indices, :]  # 找到当存在positive_indices的时候进行计算对应的assigned_annotations

# 找到positive_indices对应的anchors的四个值
anchor_widths_pi = anchor_widths[positive_indices]
anchor_heights_pi = anchor_heights[positive_indices]
anchor_ctr_x_pi = anchor_ctr_x[positive_indices]
anchor_ctr_y_pi = anchor_ctr_y[positive_indices]

# 重新将assigned_annotations的值从左上坐标，右下坐标）转为（中心坐标，宽高）格式
gt_widths  = assigned_annotations[:, 2] - assigned_annotations[:, 0]
gt_heights = assigned_annotations[:, 3] - assigned_annotations[:, 1]
gt_ctr_x   = assigned_annotations[:, 0] + 0.5 * gt_widths
gt_ctr_y   = assigned_annotations[:, 1] + 0.5 * gt_heights

# clip widths to 1  最小框的长宽不会小于1个像素点
gt_widths  = torch.clamp(gt_widths, min=1)
gt_heights = torch.clamp(gt_heights, min=1)

# 结合assigned_annotations（实际的）和anchor计算regression应该预测的值为多少（这部分和SSD的过程一致）
targets_dx = (gt_ctr_x - anchor_ctr_x_pi) / anchor_widths_pi
targets_dy = (gt_ctr_y - anchor_ctr_y_pi) / anchor_heights_pi
targets_dw = torch.log(gt_widths / anchor_widths_pi)
targets_dh = torch.log(gt_heights / anchor_heights_pi)

targets = torch.stack((targets_dx, targets_dy, targets_dw, targets_dh))
targets = targets.t()

if torch.cuda.is_available():  # 将targets的值做一个扩大，应该是为了扩大regression输出值拟合的范围
targets = targets/torch.Tensor([[0.1, 0.1, 0.2, 0.2]]).cuda()
else:
targets = targets/torch.Tensor([[0.1, 0.1, 0.2, 0.2]])

negative_indices = 1 + (~positive_indices)  # 无用代码

regression_diff = torch.abs(targets - regression[positive_indices, :])  # 取实际与预测的相对误差

regression_loss = torch.where(
torch.le(regression_diff, 1.0 / 9.0),
0.5 * 9.0 * torch.pow(regression_diff, 2),
regression_diff - 0.5 / 9.0
)  # 分段式的loss，小于1/9时，为二范数，大于1/9时为y=x+c
regression_losses.append(regression_loss.mean())
else:
if torch.cuda.is_available():
regression_losses.append(torch.tensor(0).float().cuda())
else:
regression_losses.append(torch.tensor(0).float())

# 分别返回classification_losses和regression_losses
return torch.stack(classification_losses).mean(dim=0, keepdim=True), torch.stack(regression_losses).mean(dim=0, keepdim=True)