RetinaNet论文详解和代码（ResNet，FPN）

本文链接：https://blog.csdn.net/weixin_44021967/article/details/116197614

文章目录

1.参考资料

目标检测算法 - RetinaNet：https://zhuanlan.zhihu.com/p/67768433
https://zhuanlan.zhihu.com/p/59910080
RetinaNet论文原文：https://arxiv.org/pdf/1708.02002.pdf
RetinaNet论文最全解析！一文读懂！：https://blog.csdn.net/weixin_40400177/article/details/103449280
RetinaNet非官方代码：:https://github.com/ChingHo97/RetinaNet-Pytorch-36.4AP(本文的附加代码的链接)
预备知识：
困难样本挖掘技术OHEM

本文提出了新的损失函数，解决类别不均衡问题。损失函数是动态缩放的交叉熵损失，其中缩放因子随着对正确类别的置信度增加而衰减到0.
设计了一个RetinaNet的one-stage对象检测器来说明focalloss的有效性。
相关工作
Two-stage的准确率比one-stage的准确率高是解决了类别不平衡问题，在第一阶段过滤掉了大量的背景。

2.RetinaNet

RetinaNet由三部分组成：ResNet，FPN，RetinaHead。
Focal Loss是为了解决one-stage算法中正负样本比例严重失调的问题。该损失函数降低大量简单正负样本的loss权重，是一种困难样本挖掘策略。
作者认为one-stage比two-stage的精确度低，由类别失衡引起的，且大量easy example的样本占据了大量loss。
（1）negative example数量太大，占据了大量loss。
（2）大量easy positive\negative example造成loss下，反向梯度小。
因此作者提出了facal loss。
在这里插入图片描述

class RetinaNet(nn.Module):
		self.backbone = resnet50(pretrained=self.config.pretrained)
        self.fpn = FPN(features=self.config.fpn_out_channels,
                       use_p5=self.config.use_p5)
        self.head = RetinaHead(config=self.config)

    def forward(self, images):
        C3,C4,C5 = self.backbone(images)#三个特征图
        all_p_level = self.fpn([C3,C4,C5])#[p3,p4,p5,p6,p7]
        cls_logits, reg_preds = self.head(all_p_level)#预测值

        return cls_logits, reg_preds

3.ResNet

3.ResNet论文详解

随着网络层数的加深，出现了梯度消失、爆炸的问题，和精度下降问题。精度下降问题是因为更多的层导致更高的训练误差。梯度消失和爆炸通过归一化（BN batch normalization）来解决，但是只能解决20层网络。一个解决办法是引入恒等映射。简单地说，原来网络的输出为H(x)预测值，现在H(x)=F(x)+x,因为x是恒等映射，不需要学习，则只需要学习F(x)=H(x)-x就可以了，学习残差F(x)=H(x)-x比学习原始特征简单的多。
在这里插入图片描述

如何说解决了梯度爆炸和消失

原始残差公式，函数F表示一个残差函数，函数f表示激活函数
在这里插入图片描述
ResNet v2使用恒等映射，且不适用激活函数，则有：

L层是比l更深的层次，到L层是，预测值表达式：

在这里插入图片描述
反向求解梯度：

在第l层包含了深层次L的梯度，也就是说L层的梯度直接传给了l层，而
在训练过程中，它也不会一直等于-1，因此解决了梯度消失的问题。

引入恒等映射使网络结构自动调整深度？

假设存在一个性能最强的完美网络N NN，与它相比，我们所训练的网络中必定有一些层是多余的（甚至是起反作用的），那么这些多余的层的训练目标就是恒等变换（换句话说，就是让这些多余的层变成一个恒等变换，使得整个网络无限趋近于完美网络N NN），只有达到这个目标我们的网络性能才能跟N NN一样。

那么，如何另这些多余的层变成恒等映射呢？很简单，要想多余的层变成恒等映射，只要把输出所要拟合的目标变成x xx就行了（此时H ( x ) H(x)H(x)摇身一变成了x xx），即输入为x xx，输出仍然是x xx。

因此，作者提出了利用残差来重构网络的映射，说白了就是把输入x xx再次引入到结果，这样堆叠层的权重会趋向于零，用表达式表示就是：F ( x , w ) + x = H ( x ) = x F(x,w) + x = H(x) = xF(x,w)+x=H(x)=x，即F ( x , w ) → 0 F(x,w) \to 0F(x,w)→0。

总结一句话：残差结构人为制造了恒等映射，就能让整个结构朝着恒等映射的方向去收敛，确保最终的错误率不会因为深度的变大而越来越差。
————————————————
版权声明：本文为CSDN博主「lairongxuan」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/lairongxuan/article/details/91040698

4.ResNet代码及resnet50

resnet18 50网络结构以及pytorch实现代码：https://www.jianshu.com/p/085f4c8256f1
在这里插入图片描述
虚线表示需要1*1的卷积修正channels，然后x+layer.out相加时张量大小必须保持一致。

代码：

逐层调用：
class RetinaNetDetector(nn.Module):
	self.body = RetinaNet(config=config)
class RetinaNet(nn.Module):
	self.backbone = resnet50(pretrained=self.config.pretrained)
def resnet50(pretrained=False, **kwargs):
	model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)

resnet50的基础结构，Bottleneck
class Bottleneck(nn.Module):
    # ResNet-B
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample#虚线部分，为了使x和layer.out的形状保持一致
        self.stride = stride

	def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

resnet18的基础结构
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride
   
   def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out
def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias