文献精读——（第六篇）ResNet

最新推荐文章于 2023-12-31 18:42:37 发布

我学数学我骄傲

最新推荐文章于 2023-12-31 18:42:37 发布

阅读量1.2k

点赞数 1

分类专栏：文献阅读笔记（CNN，CV）文章标签：深度学习计算机视觉人工智能

本文链接：https://blog.csdn.net/weixin_37799689/article/details/105220943

版权

文献阅读笔记（CNN，CV）专栏收录该内容

32 篇文章 16 订阅

订阅专栏

由于此论文价值较高，本文不会先梳理结构再重点讲，而是重头到尾进行细致的研读。对其中重要的思路进行介绍，对其中的关键问题进行解释和推导

一、论文背景

论文的核心观点（论文意义）：深层次表征信息是很多视觉任务的核心，这是加深网络的原因。（应寻找一种增加网络深度的方法）

进一步解释为什么深层次表征信息是核心：映射函数具有更多的非线性映射，越复杂的映射函数则映射的更好。

现象：较深的神经网络训练效果并不好，原因是网格退化。

网格退化：梯度消失，导致训练集、测试集效果差，不是过拟合（由于参数所需量大于样本数导致训练集效果极好而测试集效果差）

二、论文成果与重点

1）提出残差结构

（1）残差：特征对损失的贡献度，即（x1，x2，......）对y-yk的贡献度。

（2）残差结构推导：

（3）补充：残差结构与传统结构的本质区别在哪？输出是否一致呢？

plain Net：building block的映射F(x)需要做的就是拟合H(x)，即F(x) := H(x)

Res Net：加入了skip connection 结构，这时候由一个building block 的任务：F(x) := H(x)-x

（4）补充：残差结构为什么比传统结构更好优化？提升了对梯度与损失的相关性，进而是网络的学习能力增强，解决退化问题。

比如把5映射到5.1，那么引入残差前是F'(5)=H(5)=5.1，引入残差后是H(5)=F(5)+5=5.1, F(5)=0.1。这里的F'和F都表示网络参数映射，引入残差后的映射对输出的变化更敏感。比如输出从5.1变到5.2，映射F'的输出增加了1/51=2%，而对于残差结构输出从5.1到5.2，映射F是从0.1到0.2，增加了100%。明显后者输出变化对权重的调整作用更大，所以效果更好。训练残差的思想都是去掉相同的主体部分，从而突出微小的变化，残差网络可以看做是差分放大器。

（5）补充：为什么残差网络可以进一步解决梯度消失？

（3）优点：

进一步解决梯度消失问题

学习残差函数，容易优化、随深度提升，准确率提升显著

更好的学习了深度特征表示，即通俗理解的高级语义信息

2）建立残差神经网络

补充：第二版比第一版参数更少，同时由于增加了通道数使其训练效果更好。注：同一残差块的补全方式不同。

三、网络细节

1）关于残差模块的输入输出大小是否一致

当输入为5*5，卷积核为3*3时，输出为【5-3】/1+1=3*3；使用 padding=same，输出为【5/1】=5*5，此时输入与输出一致。可以进行连接，即ResNet中所使用的补全方式为same。

2）关于网络的基本构成（1200层）

（1）残差网络以残差块为单位，不是以各卷积层单位，同时为了防止梯度消失，故而使用多层跳跃连接。（2、3层最优）

（2）图中的虚线与实线的区别：因为经过“shortcut connections（捷径连接）”后，H(x)=F(x)+x，如果F(x)和x的通道相同，则可直接相加，那么通道不同怎么相加呢。上图中的实线、虚线就是为了区分这两种情况：

实线的Connection部分，表示通道相同，如上图的第一个粉色矩形和第三个粉色矩形，都是3x3x64的特征图，由于通道相同，所以采用计算方式为H(x)=F(x)+x

虚线的的Connection部分，表示通道不同，如上图的第一个绿色矩形和第三个绿色矩形，分别是3x3x64和3x3x128的特征图，通道不同，采用的计算方式为H(x)=F(x)+Wx，其中W是卷积操作，用来调整x维度的。

四、实验结果

经检验，深度残差网络的确解决了退化问题，如下图所示，左图为平原网络（plain network）网络层次越深（34层）比网络层次浅的（18层）的误差率更高；右图为残差网络ResNet的网络层次越深（34层）比网络层次浅的（18层）的误差率更低。

在ResNet的作者的第二篇相关论文《Identity Mappings in Deep Residual Networks》中，提出了ResNet V2。ResNet V2 和 ResNet V1 的主要区别在于，作者通过研究 ResNet 残差学习单元的传播公式，发现前馈和反馈信号可以直接传输，因此“shortcut connection”（捷径连接）的非线性激活函数（如ReLU）替换为 Identity Mappings。同时，ResNet V2 在每一层中都使用了 Batch Normalization。这样处理后，新的残差学习单元比以前更容易训练且泛化性更强。参量链接

五、核心代码

1、小技巧：运行github代码先读readme

2、个性化代码：

def inference(input_tensor_batch, n, reuse):##残差网络架构，允许个性化
    '''
    The main function that defines the ResNet. total layers = 1 + 2n + 2n + 2n +1 = 6n + 2
    :param input_tensor_batch: 4D tensor
    :param n: num_residual_blocks
    :param reuse: To build train graph, reuse=False. To build validation graph and share weights
    with train graph, resue=True
    :return: last layer in the network. Not softmax-ed
    '''

    layers = []
    with tf.variable_scope('conv0', reuse=reuse):
        conv0 = conv_bn_relu_layer(input_tensor_batch, [3, 3, 3, 16], 1)# SAME通道不会变
        activation_summary(conv0)
        layers.append(conv0)

    for i in range(n):
        with tf.variable_scope('conv1_%d' %i, reuse=reuse):
            if i == 0:
                conv1 = residual_block(layers[-1], 16, first_block=True)
            else:
                conv1 = residual_block(layers[-1], 16)
            activation_summary(conv1)
            layers.append(conv1)

    for i in range(n):
        with tf.variable_scope('conv2_%d' %i, reuse=reuse):
            conv2 = residual_block(layers[-1], 32)
            activation_summary(conv2)
            layers.append(conv2)

    for i in range(n):
        with tf.variable_scope('conv3_%d' %i, reuse=reuse):
            conv3 = residual_block(layers[-1], 64)
            layers.append(conv3)
        assert conv3.get_shape().as_list()[1:] == [8, 8, 64]

    with tf.variable_scope('fc', reuse=reuse):
        in_channel = layers[-1].get_shape().as_list()[-1]
        bn_layer = batch_normalization_layer(layers[-1], in_channel)
        relu_layer = tf.nn.relu(bn_layer)
        global_pool = tf.reduce_mean(relu_layer, [1, 2])

        assert global_pool.get_shape().as_list()[-1:] == [64]
        output = output_layer(global_pool, 10)
        layers.append(output)

    return layers[-1]
#########################################################################################

def residual_block(input_layer, output_channel, first_block=False):
    '''
    Defines a residual block in ResNet
    :param input_layer: 4D tensor
    :param output_channel: int. return_tensor.get_shape().as_list()[-1] = output_channel
    :param first_block: if this is the first residual block of the whole network
    :return: 4D tensor.
    '''
    input_channel = input_layer.get_shape().as_list()[-1]

    # When it's time to "shrink" the image size, we use stride = 2
    if input_channel * 2 == output_channel:
        increase_dim = True
        stride = 2
    elif input_channel == output_channel:#通道不翻倍
        increase_dim = False
        stride = 1
    else:
        raise ValueError('Output and input channel does not match in residual blocks!!!')

    # The first conv layer of the first residual block does not need to be normalized and relu-ed.
    with tf.variable_scope('conv1_in_block'):
        if first_block:
            filter = create_variables(name='conv', shape=[3, 3, input_channel, output_channel])
            conv1 = tf.nn.conv2d(input_layer, filter=filter, strides=[1, 1, 1, 1], padding='SAME')
        else:
            conv1 = bn_relu_conv_layer(input_layer, [3, 3, input_channel, output_channel], stride)

    with tf.variable_scope('conv2_in_block'):
        conv2 = bn_relu_conv_layer(conv1, [3, 3, output_channel, output_channel], 1)

    # When the channels of input layer and conv2 does not match, we add zero pads to increase the
    #  depth of input layers
    if increase_dim is True:
        pooled_input = tf.nn.avg_pool(input_layer, ksize=[1, 2, 2, 1],
                                      strides=[1, 2, 2, 1], padding='VALID')
        padded_input = tf.pad(pooled_input, [[0, 0], [0, 0], [0, 0], [input_channel // 2,input_channel // 2]])   #（batch,H,W,C）,c对于通道数，上方补一半，下方补一半，其他同理
    else:
        padded_input = input_layer# 第一次为[128 32 32 16]

    output = conv2 + padded_input
    return output

我学数学我骄傲

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
文献精读——（第六篇）ResNet

一、论文梳理1、论文背景现象：较深的神经网络训练效果并不好。论文的核心观点（论文意义）：深层次表征信息是很多视觉任务的核心，这是不断加深网络的原因。为什么深层网络效果好：映射函数具有更多的非线性映射，越复杂的映射函数则映射的更好。网格退化：由于梯度消失，导致训练集、测试集效果差。过拟合：由于参数（所需量）大于样本数，导致训练集效果极好而测试集效果差。2、论文成果1）...
复制链接

扫一扫

专栏目录