Res-Net（Pytorch）

最新推荐文章于 2022-07-07 16:53:29 发布

邵东恒的技术博客

最新推荐文章于 2022-07-07 16:53:29 发布

阅读量210

点赞数

分类专栏：深度学习pytorch

本文链接：https://blog.csdn.net/shaodongheng/article/details/106756628

版权

深度学习pytorch 专栏收录该内容

9 篇文章 3 订阅

订阅专栏

Res-Net

论文
- 论文简读
代码
- 主要代码块
- 代码

论文

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.90

论文简读

Recent evidence[40, 43] reveals that network depth is of crucial importance, and the leading results [40, 43, 12, 16] on the challenging ImageNet dataset [35] all exploit “very deep” [40] models, with a depth of sixteen [40] to thirty [16].
对于神经网络需要一个较大的深度，才会有更好的结果。
Driven by the significance of depth, a question arises: Is learning better networks as easy as stacking more layers? An obstacle to answering this question was the notorious problem of vanishing/exploding gradients [14, 1, 8], which hamper convergence from the beginning. This problem, however, has been largely addressed by normalized initialization [23, 8, 36, 12] and intermediate normalization layers [16], which enable networks with tens of layers to start converging for stochastic gradient descent (SGD) with backpropagation [22].
When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error, as reported in [10, 41] and thoroughly verified by our experiments. Fig. 1 shows a typical example.
梯度消失和爆炸的问题已经可以由正则化的方式解决。
随着网络深度增加，会出现精度下降的问题，这并不是由于过拟合造成的，而且网络层数越深，训练误差越大，如图一。
Formally, denoting the desired underlying mapping as H(x), we let the stacked nonlinear layers fit another mapping of F(x) := H(x)−x. The original mapping is recast into F(x)+x. We hypothesize that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.
The formulation of F(x)+x can be realized by feedforward neural networks with “shortcut connections” (Fig. 2). Shortcut connections [2, 33, 48] are those skipping one or more layers. In our case, the shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers (Fig. 2). Identity shortcut connections add neither extra parameter nor computational complexity. The entire network can still be trained end-to-end by SGD with backpropagation, and can be easily implemented using common libraries (e.g., Caffe [19]) without modifying the solvers.
让一个模块的映射变成F(x)+x，F(x)是普通的网络层，x是相等映射，如图2.
就是将前几层的输入x加到后几层的输出上，x是相等映射，不增加参数也不增加复杂度。
具体地：假设某层的输入是 x，期望输出是 H(x)，如果我们直接把输入 x 传到输出作为初始结果，这就是一个更浅层的网络，更容易训练，而这个网络没有学会的部分，我们可以使用更深的网络 F(x) 去训练它，使得训练更加容易，最后希望拟合的结果就是 F(x) = H(x) - x，这就是一个残差的结构
Based on the above plain network, we insert shortcut connections (Fig. 3, right) which turn the network into its counterpart residual version. The identity shortcuts (Eqn.(1)) can be directly used when the input and output are of the same dimensions (solid line shortcuts in Fig. 3). When the dimensions increase (dotted line shortcuts in Fig. 3), we consider two options: (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut in Eqn.(2) is used to match dimensions (done by 1×1 convolutions). For both options, when the shortcuts go across feature maps of two sizes, they are performed with a stride of 2.
残差有两种连接方式，如图3中的实线和虚线：实线，维度相同；虚线，维度不同，有两种方式解决：1. 用0填充，2. 通过一个映射矩阵（1x1的卷积）转换到相同维度。

代码

主要代码块

可以定义一个残差模块：

#定义一个3*3的卷积层
def conv3x3(in_channel, out_channel, stride=1):
    return nn.Conv2d(in_channel, out_channel, 3, stride=stride, padding=1, bias=False)
#残差模块    
class residual_block(nn.Module):
    def __init__(self, in_channel, out_channel, same_shape=True):
        super(residual_block, self).__init__()
        self.same_shape = same_shape
        stride=1 if self.same_shape else 2    #如果需要不改变输出大小，则stride=1，否则=2
        
        self.conv1 = conv3x3(in_channel, out_channel, stride=stride) #第一层卷积
        self.bn1 = nn.BatchNorm2d(out_channel)
        
        self.conv2 = conv3x3(out_channel, out_channel) #第二层卷积
        self.bn2 = nn.BatchNorm2d(out_channel)
        if not self.same_shape:  #如果需要改变输出特征的大小，那么需要通过一维卷积调整输入x的大小
            self.conv3 = nn.Conv2d(in_channel, out_channel, 1, stride=stride)
        
    def forward(self, x):
        out = self.conv1(x)
        out = F.relu(self.bn1(out), True)
        out = self.conv2(out)
        out = F.relu(self.bn2(out), True)
        
        if not self.same_shape: #如果输出的大小与输入不同，则通过一维卷积调整x的大小，使x维度与输出保持相同
            x = self.conv3(x)
        return F.relu(x+out, True)  #将输出out与输入x相加到一起形成新的特征

ResNet本质上就是这样的多个残差网络的堆叠。

代码

实现一个简单的残差网络，层数较少。

class resnet(nn.Module):
    def __init__(self, in_channel, num_classes, verbose=False):
        super(resnet, self).__init__()
        self.verbose = verbose  #是否输出的一个标志
        
        self.block1 = nn.Conv2d(in_channel, 64, 7, 2)
        
        self.block2 = nn.Sequential(
            nn.MaxPool2d(3, 2),
            residual_block(64, 64),
            residual_block(64, 64)
        )
        
        self.block3 = nn.Sequential(
            residual_block(64, 128, False),  #残差模块的输入输出大小不一样
            residual_block(128, 128)
        )
        
        self.block4 = nn.Sequential(
            residual_block(128, 256, False),
            residual_block(256, 256)
        )
        
        self.block5 = nn.Sequential(
            residual_block(256, 512, False),
            residual_block(512, 512),
            nn.AvgPool2d(3)
        )
        
        self.classifier = nn.Linear(512, num_classes)
        
    def forward(self, x):
        x = self.block1(x)
        if self.verbose:  #为True的话，就输出这个block的输出的大小
            print('block 1 output: {}'.format(x.shape))
        x = self.block2(x)
        if self.verbose:
            print('block 2 output: {}'.format(x.shape))
        x = self.block3(x)
        if self.verbose:
            print('block 3 output: {}'.format(x.shape))
        x = self.block4(x)
        if self.verbose:
            print('block 4 output: {}'.format(x.shape))
        x = self.block5(x)
        if self.verbose:
            print('block 5 output: {}'.format(x.shape))
        x = x.view(x.shape[0], -1)
        x = self.classifier(x)
        return x

邵东恒的技术博客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Res-Net（Pytorch）

Res-Net论文论文简读代码主要代码块代码论文He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.90论文简读Recent evidence[40, 43] reveals
复制链接

扫一扫