经典网络结构 (五)：ResNet (残差网络)

连理o

已于 2022-03-03 10:15:18 修改

阅读量1.3w

点赞数 15

分类专栏：深度学习文章标签：深度学习神经网络卷积神经网络人工智能

于 2020-06-28 15:29:16 首次发布

本文链接：https://blog.csdn.net/weixin_42437114/article/details/106996019

版权

深度学习专栏收录该内容

27 篇文章

订阅专栏

让我们先思考一个问题：对神经网络模型添加新的层，充分训练后的模型是否只可能更有效地降低训练误差？理论上，原模型解的空间只是新模型解的空间的子空间。也就是说，如果我们能将新添加的层训练成恒等映射 $f (x) = x$ ，新模型和原模型将同样有效。由于新模型可能得出更优的解来拟合训练数据集，因此添加层似乎更容易降低训练误差。然而在实践中，添加过多的层后训练误差往往不降反升。即使利用批量归一化带来的数值稳定性使训练深层模型更加容易，该问题仍然存在

Residual block (残差块)

设输入为 $x$ 。假设我们希望学出的理想映射为 $f (x)$ ，从而作为下图中上方激活函数的输入。左图虚线框中的部分需要直接拟合出该映射 $f (x)$ ，而右图虚线框中的部分则需要拟合出有关恒等映射的残差映射 $f (x) - x$ 。残差映射在实际中往往更容易优化
- 假如将恒等映射作为我们希望学出的理想映射 $f (x)$ 。我们只需将下图中右图虚线框内上方的加权运算（如仿射）的权重和偏差参数学成 0，那么 $f (x)$ 即为恒等映射。也就是说在引入残差块中的 skip connection 之后，即使在原有网络的基础上多加了两层网络，这两层网络也很容易学习为恒等映射，可以保证加深层之后至少不会对原有网络的性能产生影响。实际中，当理想映射 $f (x)$ 极接近于恒等映射时，残差映射也易于捕捉恒等映射的细微波动
下图中右图也是 ResNet 的基础块，即残差块（residual block）。在残差块中，输入可通过跨层的数据线路更快地向前传播。同时，通过 skip connection，反向传播时信号也可以无衰减地传递，可以缓解因加深层而导致的梯度消失或者梯度爆炸现象。同时还可以减少深度网络过拟合现象的发生

ResNet 沿用了 VGG 全 $3 \times 3$ 卷积层的设计。残差块里首先有 2 个有相同输出通道数的 $3 \times 3$ 卷积层。每个卷积层后接一个 BN 层和 ReLU 激活函数。然后我们将输入跳过这 2 个卷积运算后直接加在最后的 ReLU 激活函数前
- 注意：skip connection 是被加在激活函数之前的，同时如果 $x$ 与 $f (x)$ 形状不同的话，则还要对快捷结构中的 $x$ 添加权重 $w$ 或引入 $1 \times 1$ 卷积层并加上适当的步幅来变换形状。如果想改变通道数，就需要引入一个额外的 $1 \times 1$ 卷积层来将输入变换成需要的形状后再做相加运算。因此可以看到在 ResNet 中用到了很多 $3\times3$ , padding 为 1 的“相同卷积”

class Residual(nn.Module):
    def __init__(self, in_channel, out_channel, stride=1):
        super(Residual, self).__init__()
        self.bottleneck = nn.Sequential(
            nn.Conv2d(in_channel, out_channel, 3, stride, padding=1, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channel, out_channel, 3, padding=1, bias=False),
            nn.BatchNorm2d(out_channel),
            )
        self.relu = nn.ReLU(inplace=True)
        self.downsample = nn.Sequential(
                nn.Conv2d(in_channel, out_channel, 1, stride),
                nn.BatchNorm2d(out_channel),
            )

    def forward(self, x):
        out = self.bottleneck(x)
        identity = self.downsample(x)
        out += identity
        out = self.relu(out)
        return out

# 通道数翻倍，宽高减半
net = Residual(3, 6, 2)
x = torch.randn(4, 3, 6, 6)
net(x).shape

torch.Size([4, 6, 3, 3])

# 保持形状不变
net = Residual(3, 3)
x = torch.randn(4, 3, 6, 6)
net(x).shape

torch.Size([4, 3, 6, 6])

对于比较深的网络， ResNet 论文中介绍了一个 “瓶颈”架构 来降低模型复杂度：

class Residual(nn.Module):
    def __init__(self, in_channel, out_channel, bottleneck_channel=None, stride=1):
        super(Residual, self).__init__()
        bottleneck_channel = in_channel / 2 if bottleneck_channel is None
        	
        self.bottleneck = nn.Sequential(
            nn.Conv2d(in_channel, bottleneck_channel, 1, bias=False),
            nn.BatchNorm2d(bottleneck_channel),
            nn.ReLU(inplace=True),
            nn.Conv2d(bottleneck_channel, bottleneck_channel, 3, stride, padding=1, bias=False),
            nn.BatchNorm2d(bottleneck_channel),
            nn.ReLU(inplace=True),
            nn.Conv2d(bottleneck_channel, out_channel, 1, bias=False),
            nn.BatchNorm2d(out_channel),
            )
        self.relu = nn.ReLU(inplace=True)
        self.downsample = nn.Sequential(
                nn.Conv2d(in_channel, out_channel, 1, stride),
                nn.BatchNorm2d(out_channel),
            )

    def forward(self, x):
        out = self.bottleneck(x)
        identity = self.downsample(x)
        out += identity
        out = self.relu(out)
        return out

ResNet 网络结构

在这里插入图片描述

ResNet 的前两层跟 GoogLeNet 中的一样：在输出通道数为 64、步幅为 2 的 $7 \times 7$ 卷积层后接步幅为 2 的 $3 \times 3$ 的最大池化层。不同之处在于 ResNet 每个卷积层后增加的批量归一化层
GoogLeNet 在后面接了 4 个由 Inception 块组成的模块。ResNet 则使用 4 个由残差块组成的模块，每个模块使用若干个同样输出通道数的残差块。第一个模块的通道数同输入通道数一致。由于之前已经使用了步幅为 2 的最大池化层，所以无须减小高和宽。之后的每个模块在第一个残差块里将上一个模块的通道数翻倍，并将高和宽减半
最后，与 GoogLeNet 一样，加入全局平均池化层后接上全连接层输出

def resnet_block(in_channels, out_channels, num_residuals, first_block=False):
    blk = nn.Sequential()
    for i in range(num_residuals):
        blk.add_module(str(i), Residual(in_channels, out_channels, stride=2 if i == 0 and not first_block else 1))
        in_channels = out_channels
    return blk

class ResNet(nn.Module):
    def __init__(self, in_channel, class_num):
        super(ResNet, self).__init__()
        self.stem = nn.Sequential(
            nn.Conv2d(in_channel, 64, 7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, stride=2, padding=1),
            )
        resnet_blocks = []
        in_channels = [64, 64, 128, 256, 512]
        for i in range(4):
            resnet_blocks += [resnet_block(in_channels[i], in_channels[i + 1], 2, first_block=True if i == 0 else False)]
        self.resnet_blocks = nn.Sequential(*resnet_blocks)
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(512, class_num)

    def forward(self, x):
        out = self.stem(x)
        out = self.resnet_blocks(out)
        out = self.avg_pool(out).view(-1, 512)
        out = self.fc(out)
        
        return out

这里每个模块里有 4 个卷积层（不计算 $1 \times 1$ 卷积层），加上最开始的卷积层和最后的全连接层，共计 18 层。这个模型通常也被称为 ResNet-18。通过配置不同的通道数和模块里的残差块数可以得到不同的 ResNet 模型，例如更深的含 152 层的 ResNet-152