ResNet解析

最新推荐文章于 2024-07-18 09:04:55 发布

米小凡

最新推荐文章于 2024-07-18 09:04:55 发布

阅读量1.7k

点赞数

本文链接：https://blog.csdn.net/xiaomifanhxx/article/details/83749918

版权

ResNet在2015年被提出，在ImageNet比赛classification任务上获得第一名，因为它“简单与实用”并存，之后很多方法都建立在ResNet50或者ResNet101的基础上完成的，检测，分割，识别等领域都纷纷使用ResNet，Alpha zero也使用了ResNet，所以可见ResNet确实很好用。

Resnet网络的概览

1   为了解决训练很深的网络时候出现的梯度退化(gradient degradation)的问题,Kaiming He提出了Resnet结构。由于使用了残差学习的方法(Resuidal learning)，使得网络的层数得到了大大的提升。
2   ResNet由于使用了shortcut,把原来需要学习逼近的未知函数H(x)恒等映射(Identity mapping),变成了逼近F(x)=H(x)-x的一个函数。作者认为这两种表达的效果相同，但是优化的难度却并不相同，作者假设F(x)的优化会比H(x)简单的多。这一想法也是源于图像处理中的残差向量编码，通过一个reformulation，将一个问题分解成多个尺度直接的残差问题，能够很好的起到优化训练的效果。
3   ResNet针对较深(层数大于等于50)的网络提出了BottleNeck的结构，这个结构可以减少运算的时间复杂度。
4   ResNet里存在两种shortcut,Identity shortcut & Projection shortcut。Identity shortcut使用零填充的方式保证其纬度不变，而Projection shortcut则具有下面的形式y=F(x,Wi)+Wsx来匹配纬度的变换。
5    ResNet这个模型在图像处理的相关任务中具有很好的泛化性，在2015年的ImageNet Recognization,ImageNet detection,ImageNet localization,COCO detection,COCO segmentation等等任务上取得第一的成绩。

1.ResNet意义

随着网络的加深，出现了训练集准确率下降的现象，我们可以确定这不是由于Overfit过拟合造成的(过拟合的情况训练集应该准确率很高)；所以作者针对这个问题提出了一种全新的网络，叫深度残差网络，它允许网络尽可能的加深，其中引入了全新的结构如图1；

图1

为了解决梯度退化的问题，论文中提出了Residual learning这个方法，它通过构造一个Residual block来完成。引入残差结构以后，把原来需要学习逼近的未知函数H(x)恒等映射(Identity mapping),变成了逼近F(x)=H(x)-x的一个函数。作者认为这两种表达的效果相同，但是优化的难度却并不相同，作者假设F(x)的优化会比H(x)简单的多。这一想法也是源于图像处理中的残差向量编码，通过一个reformulation，将一个问题分解成多个尺度直接的残差问题，能够很好的起到优化训练的效果。上图的恒等映射，是把一个输入x和其堆叠了2次后的输出F(x)的进行元素级和作为总的输出。因此它没有增加网络的运算复杂度，而且这个操作很容易被现在的一些常用库执行(e.g.,Caffe,tensorflow)。

2 ResNet的构造

这两种结构分别针对ResNet34（左图）和ResNet50/101/152（右图），一般称整个结构为一个”building block“。其中右图又称为”bottleneck design”，目的一目了然，就是为了降低参数的数目，第一个1x1的卷积把256维channel降到64维，然后在最后通过1x1卷积恢复，整体上用的参数数目：1x1x256x64 + 3x3x64x64 + 1x1x64x256 = 69632，而不使用bottleneck的话就是两个3x3x256的卷积，参数数目: 3x3x256x256x2 = 1179648，差了16.94倍。
对于常规ResNet，可以用于34层或者更少的网络中，对于Bottleneck Design的ResNet通常用于更深的如101这样的网络中，目的是减少计算和参数量（实用目的）。
在针对F(x)与x的channel数的时候，要分为两种情况

如上图所示，我们可以清楚的”实线“和”虚线“两种连接方式，
实线的的Connection部分(”第一个粉色矩形和第三个粉色矩形“)都是执行3x3x64的卷积，他们的channel个数一致，所以采用计算方式：y=F(x)+x
虚线的的Connection部分(”第一个绿色矩形和第三个绿色矩形“)分别是3x3x64和3x3x128的卷积操作，他们的channel个数不同(64和128)，所以采用计算方式：y=F(x)+Wx
其中W是卷积操作，用来调整x的channel维度的；

3 Resnet网络构建表

所有的网络都分成5部分，分别是：conv1，conv2_x，conv3_x，conv4_x，conv5_x

4 搭建ResNet

使用identity_block这个函数来搭建Resnet34,使用bottleneck这个函数来搭建Resnet50。

每个卷积层后都使用BatchNormalization，来防止模型过拟合，并且使输出满足高斯分布。

def Conv2d_BN(x, nb_filter, kernel_size, strides=(1, 1), padding='same', name=None):
    if name is not None:
        bn_name = name + '_bn'
        conv_name = name + '_conv'
    else:
        bn_name = None
        conv_name = None
    x = Conv2D(nb_filter, kernel_size, padding=padding, strides=strides, activation='relu', name=conv_name)(x)
    x = BatchNormalization(axis=3, name=bn_name)(x)
    return x
def identity_Block(inpt, nb_filter, kernel_size, strides=(1, 1), with_conv_shortcut=False):
    x = Conv2d_BN(inpt, nb_filter=nb_filter, kernel_size=kernel_size, strides=strides, padding='same')
    x = Conv2d_BN(x, nb_filter=nb_filter, kernel_size=kernel_size, padding='same')
    if with_conv_shortcut:
        shortcut = Conv2d_BN(inpt, nb_filter=nb_filter, strides=strides, kernel_size=kernel_size)
        x= add([x, shortcut])
        return x
    else:
        x = add([x, inpt])
        return x
def bottleneck_Block(inpt,nb_filters,strides=(1,1),with_conv_shortcut=False):
    k1,k2,k3=nb_filters
    x = Conv2d_BN(inpt, nb_filter=k1, kernel_size=1, strides=strides, padding='same')
    x = Conv2d_BN(x, nb_filter=k2, kernel_size=3, padding='same')
    x = Conv2d_BN(x, nb_filter=k3, kernel_size=1, padding='same')
    if with_conv_shortcut:
        shortcut = Conv2d_BN(inpt, nb_filter=k3, strides=strides, kernel_size=1)
        x = add([x, shortcut])
        return x
    else:
        x = add([x, inpt])
        return x
 def resnet_34(width,height,channel,classes):
     inpt = Input(shape=(width, height, channel))
     x = ZeroPadding2D((3, 3))(inpt)
     #conv1
     x = Conv2d_BN(x, nb_filter=64, kernel_size=(7, 7), strides=(2, 2), padding='valid')
     x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)
     #conv2_x
     x = identity_Block(x, nb_filter=64, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=64, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=64, kernel_size=(3, 3))
     #conv3_x
     x = identity_Block(x, nb_filter=128, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True)
     x = identity_Block(x, nb_filter=128, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=128, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=128, kernel_size=(3, 3))
     #conv4_x
     x = identity_Block(x, nb_filter=256, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True)
     x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))
     #conv5_x
     x = identity_Block(x, nb_filter=512, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True)
     x = identity_Block(x, nb_filter=512, kernel_size=(3, 3))
     x = identity_Block(x, nb_filter=512, kernel_size=(3, 3))

     x = AveragePooling2D(pool_size=(7, 7))(x)
     x = Flatten()(x)
     x = Dense(classes, activation='softmax')(x)
     model = Model(inputs=inpt, outputs=x)
     return model

def resnet_50(width,height,channel,classes): 
    inpt = Input(shape=(width, height, channel)) 
    x = ZeroPadding2D((3, 3))(inpt) 
    #conv1
    x = Conv2d_BN(x, nb_filter=64, kernel_size=(7, 7), strides=(2, 2), padding='valid') 
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x) 
    #conv2_x 
    x = bottleneck_Block(x, nb_filters=[64,64,256],strides=(1,1),with_conv_shortcut=True) 
    x = bottleneck_Block(x, nb_filters=[64,64,256]) 
    x = bottleneck_Block(x, nb_filters=[64,64,256]) 
    #conv3_x 
    x = bottleneck_Block(x, nb_filters=[128, 128, 512],strides=(2,2),with_conv_shortcut=True) 
    x = bottleneck_Block(x, nb_filters=[128, 128, 512]) 
    x = bottleneck_Block(x, nb_filters=[128, 128, 512]) 
    x = bottleneck_Block(x, nb_filters=[128, 128, 512]) 
    #conv4_x 
    x = bottleneck_Block(x, nb_filters=[256, 256, 1024],strides=(2,2),with_conv_shortcut=True) 
    x = bottleneck_Block(x, nb_filters=[256, 256, 1024]) 
    x = bottleneck_Block(x, nb_filters=[256, 256, 1024]) 
    x = bottleneck_Block(x, nb_filters=[256, 256, 1024]) 
    x = bottleneck_Block(x, nb_filters=[256, 256, 1024]) 
    x = bottleneck_Block(x, nb_filters=[256, 256, 1024]) 
    #conv5_x 
    x = bottleneck_Block(x, nb_filters=[512, 512, 2048], strides=(2, 2), with_conv_shortcut=True) 
    x = bottleneck_Block(x, nb_filters=[512, 512, 2048]) 
    x = bottleneck_Block(x, nb_filters=[512, 512, 2048]) 
    x = AveragePooling2D(pool_size=(7, 7))(x) 
    x = Flatten()(x) 
    x = Dense(classes, activation='softmax')(x) 
    model = Model(inputs=inpt, outputs=x) 
    return model

为什么深度过深，训练的准确率降低，而且其不是过拟合？按照我们的惯性思维，一个网络越深则这个网络就应该具有更好的学习能力，而梯度退化是指下面一种现象：随着网络层数的增加，网络的效果先是变好到饱和，然后立即下降的一个现象。