【重读经典网络2】--ResNet

AI小花猫

于 2023-05-31 23:01:12 发布

阅读量138

点赞数

分类专栏：深度学习记录文章标签：网络深度学习人工智能

本文链接：https://blog.csdn.net/caobin_cumt/article/details/130895167

版权

深度学习记录专栏收录该内容

11 篇文章 0 订阅

订阅专栏

1、前言

深度学习模型可以通过增加网络深度来获得高级的语义信息，以增加模型的辨别能力，从而提高模型分类的准确性。伴随模型深度的增加，模型也将变得相对难以收敛，伴随而来的是梯度消失/梯度爆炸，应对方法是使用BN层+归一化的初始化参数方案。但是有了BN层以及归一化的初始化参数，模型的效果就会随着网络层的增加而无限制的增加吗？

答案可能是不行的，原因有二：其一，随着网络深度的增加，模型变得更加难以优化，因为梯度在经过激活函数后是一个属于0~1之间的数，随着网络深度的增加梯度会无限的接近0；其二，因为随着网络深度的增加，模型虽然可以收敛，但是模型在训练集和测试集上都会出现更高的误差，也就是出现所谓的退化问题。这和过拟合不太一样，过拟合是训练集误差很小，测试集误差特别大。

ResNet的出现，主要就是利用残差模块，解决了上述提到的模型退化问题。

在这里插入图片描述

2、残差网络

深层次的模型会出现退化现象，为了解决这一问题，何凯明提出使用残差网络来解决这个问题。
此节借鉴：深度学习：残差网络（ResNet），理论及代码结构以及残差网络

2.1 恒等映射

深度学习神经网络其实本质就是学习一个映射函数f()，使得输入x经过f()函数，可以得到对应的输出结果。
理想情况下，神经网络的层数越多，就越容易找到f(x)。如下图所示，随着神经网络层数的增加，可以拟合出的函数关系更多，看图中Fi(i=1…6)的面积就可以知道，而随着网络层的增加，拟合的函数关系式越来越接近f()。
在这里插入图片描述
真实场景下，如下图所示（非嵌套结构），随着神经网络的层数的增加，尽管可以拟合出的函数关系的范围会增大（即最后一层神经网络覆盖的面积），但很有可能反而越来越难以拟合出最接近 f() 的函数，因为f()可能仅仅只用f(3)就可以表达，过多层反而不易拟合出f()。
在这里插入图片描述
因此，只有当较复杂的函数类包含较小的函数类时，也即是较深网络所能拟合的函数关系包含较为浅层的网络所能拟合的函数时，我们才能确保持续提高网络的性能。否则网络在达到一定的深度过后，就会出现退化现象。

为了应对退化现象，大神提出了恒等映射，那如何实现恒等映射了？

如下图所示的两个神经网络局部展示图，左边为正常模块，原始输入为x，我们希望得到的理想映射为f(x)，作为后续激活函数的输入，图中虚线框为需要拟合出的映射关系，我们通过直接将输入x连接到输出，则变成右图所示模块，则函数映射关系变为：
f(x) = x+h(x)
当h(x)->0的时候，h(x)=x实现了恒等映射
上述所说的右图即为残差模块，图中虚线框中的部分则为h(x)，是需要拟合出的残差映射，而残差映射在现实中往往更容易优化。

在这里插入图片描述

2.2 残差模块及实现

残差映射有两种形式，一种是下图左边的映射，主要是两个3*3的卷积模块+BN+RELU构成，主要使用于浅层网络，例如：ResNet18，ResNet34。另外一种映射形式如下图右边所示，由一个1*1的卷积+3*3的卷积+1*1的卷积以及BN和RELU构成，主要使用深层网络，例如：ResNet50，ResNet101等。
在这里插入图片描述
而残差模块由于输入的特征图的通道与输出的特征图通道是否相等分为两种模块，其一为identity block，主要针对输入输出通道数相同的情况，其二为conv block，主要针对输入输出通道不相同的情况，需要在侧边使用1*1的卷积进行通道调整。

如下为pytorch官方所示实现代码：

//*************************BasicBlock ****************************************
#定义BasicBlock 针对第一种残差映射情况
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsaple=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups !=1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        self.conv1 = con3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = con3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.dowansample = downsaple
        self.stride = stride

    def forward(self, x):
        identity = x 

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.dowansample is not None:
            identity = self.dowansample(x)
            
        out += identity
        
        out = self.relu(out)
        return out


//***********************Bottleneck*****************************************
 #下面定义Bottleneck层（上面所提到的第二种映射关系）
class Bottleneck(nn.Module):
    expansion = 4 #Bottleneck层输出通道都是输入的4倍

    def __init__(self, inplanes, planes, stride=1, downnsaple=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        width = int(planes * (base_width / 64.)) * groups
        self.conv1 = con1x1(inplanes, width)
        self.bn1 = norm_layer(width)
        self.conv2 = con3x3(width, width, stride, groups, dilation)
        self.bn2 = norm_layer(width)
        self.conv3 = con1x1(width, planes * self.expansion)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU(inplanes=True)
        self.downsaple = downnsaple
        self.stride = stride
        
    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)
        out = self.relu(out)

        if self.downsaple is not None:
            identity = self.downsaple(x)

        out += identity
        out = self.relu(out)

        return out

3、ResNet网络

3.1 ResNet由来

在发现残差模块可以解决网络退化问题后，作者依据VGG19网络模型进行修改，首先得到了34层的plain(直接堆叠)网络，同样分为5个stage，卷积个数分别为【1，6，8，12，6】，对应输出通道数为【64，64，128，256，512】。接着针对上述34-layer plain网络每一个stage分别加入残差连接，残差连接个数分别为【3，4，6，3】，这样就得到了34-layer的残差网络。整个过程如下图所示。
在这里插入图片描述
得到34-layer plain网络以及34-layer residual网络后，作者做了消融实验对比验证了残差模块的作用。如下图所示，直接堆叠的网络34层的错误率高于18层的网络，而加入残差模块的34层网络拥有低于18层网络的错误率。
在这里插入图片描述
作者得出结论：
(1) Our extremely deep residual nets are easy to optimize, but the counterpart “plain” nets (that simply stack layers) exhibit higher training error when the depth increases
(2) Our deep residual nets can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks

作者在34层残差网络的基础上，将其中两个3*3的卷积形式的残差映射模块都替换为瓶颈结构（一个1*1的卷积+3*3的卷积+1*1的卷积）就得到了50层的ResNet，因此我们常用的ResNet50就闪亮登场了。

3.2 ResNet的不同种类和代码实现

ResNe模型从总的来说都是分为5个stage，第一个stage输入图像经过7*7的卷积以及3*3的maxpool后，进入关键的四个stage，stage的数量的不同，导致模型可以拓展到不同的深度。具体情况如下所示。
备注：请注意啊方括号中两种不同的模块
在这里插入图片描述
ResNet的代码比较简单，如下所示：

import torch
import  torch.nn as nn
from .utils import  load_state_dict_from_url #这里是为了加载预训练模型需要的

#提供官方预训练模型的下载地址
model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
    'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
    'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
    'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
    'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
}

#封装下3x3卷积层（卷积层的bias置为False是因为卷积层后面要加BN层，因此这里的bias不需要）
#Conv2d函数的具体参数说明可参见Pytorch官方手册https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-nn/#_1
def con3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, groups=groups, bias=False, dilation=dilation)

#封装下1x1卷积层
def con1x1(in_planes, out_planes, stride=1):
    return nn.Conv2d(in_planes, out_planes, kenerl_size=1, stride=stride, bias=False)

#定义BasicBlock
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsaple=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups !=1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")

        #下面定义BasicBlock中的各个层
        self.conv1 = con3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True) #inplace为True表示进行原地操作，一般默认为False，表示新建一个变量存储操作
        self.conv2 = con3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.dowansample = downsaple
        self.stride = stride

    #定义前向传播函数将前面定义的各层连接起来
    def forward(self, x):
        identity = x #这是由于残差块需要保留原始输入

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.dowansample is not None: #这是为了保证原始输入与卷积后的输出层叠加时维度相同
            identity = self.dowansample(x)

        out += identity
        out = self.relu(out)

        return out

#下面定义Bottleneck层（Resnet50以上用到的基础块）
class Bottleneck(nn.Module):
    expansion = 4 #Bottleneck层输出通道都是输入的4倍

    def __init__(self, inplanes, planes, stride=1, downnsaple=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        width = int(planes * (base_width / 64.)) * groups
        #定义Bottleneck中各层
        self.conv1 = con1x1(inplanes, width)
        self.bn1 = norm_layer(width)
        self.conv2 = con3x3(width, width, stride, groups, dilation)
        self.bn2 = norm_layer(width)
        self.conv3 = con1x1(width, planes * self.expansion)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU(inplanes=True)
        self.downsaple = downnsaple
        self.stride = stride

    #定义Bottleneck的前向传播
    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)
        out = self.relu(out)

        if self.downsaple is not None:
            identity = self.downsaple(x)

        out += identity
        out = self.relu(out)

        return out

#下面进入正题，定义ResNet类
class ResNet(nn.Module):
    def __init__(self, block, layer, num_classes=1000, zero_init_residual=False,
                 groups=1, width_per_group=64, replace_stride_with_dilation=None,
                 norm_layer=None):
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64
        self.dilation = 1
        if replace_stride_with_dilation is None:
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(self.inplanes)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layer[0])
        self.layer2 = self._make_layer(block, 128, layer[1], stride=2,
                                       dilate=replace_stride_with_dilation[0])
        self.layer3 = self._make_layer(block, 256, layer[2], stride=2,
                                       dilate=replace_stride_with_dilation[1])
        self.layer4 = self._make_layer(block, 512, layer[3], stride=2,
                                       dilate=replace_stride_with_dilation[2])
        self.avgpool = nn.AdaptiveAvgPool2d((1,1))
        self.fc = nn.Linear(512 * block.expanion, num_classes)

        #定义初始化方式
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_nomal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)
    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsaple = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expanion:
            downsaple = nn.Sequential(
                con1x1(self.inplanes, planes * block.expanion, stride),
                norm_layer(planes * block.expanion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsaple, self.groups,
                            self.base_width, previous_dilation, norm_layer))
        self.inplanes = planes * block.expanion
        for _ in range(1, block):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilate=self.dilation,
                                norm_layer=norm_layer))

        return  nn.Sequential(*layers)

    def _forward_impl(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

    def forward(self, x):
        return self._forward_impl(x)

    def _resnet(arch, block, layers, pretrained, progress, **kwargs):
        model = ResNet(block, layers, **kwargs)
        if pretrained:
            state_dict = load_state_dict_from_url(model_urls[arch],
                                                  progress=progress)
            model.load_state_dict(state_dict)
        return model

    def resnet34(pretrained=False, progress=True, **kwargs):
        return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress,
                       **kwargs)

    def resnet101(pretrained=False, progress=True, **kwargs):
        return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, progress,
                       **kwargs)