【Keras-ResNet】CIFAR-10

bryant_meng

已于 2023-07-13 09:17:53 修改

阅读量4.4k

点赞数 6

分类专栏： PyTorch/Keras/Caffe/TensroFlow 文章标签： Keras Tensorboard resnet

于 2018-12-03 12:32:05 首次发布

本文链接：https://blog.csdn.net/bryant_meng/article/details/81187434

版权

PyTorch/Keras/Caffe/TensroFlow 专栏收录该内容

63 篇文章 9 订阅

订阅专栏

系列连载目录

请查看博客《Paper》 4.1 小节【Keras】Classification in CIFAR-10 系列连载

学习借鉴

github：BIGBALLON/cifar-10-cnn
知乎专栏：写给妹子的深度学习教程
resnet caffe model：https://github.com/soeaver/caffe-model

参考

代码

链接：https://pan.baidu.com/s/1JApBTf5oV4jIA3sV71vYfw
提取码：5v7l

硬件

TITAN XP

1 ResNet

简单的堆叠网络，越深效果反而变差了，恺明大神提出了学残差
在这里插入图片描述
图片来源给妹纸的深度学习教学(4)——同Residual玩耍

学residual 比学映射更简单
更容易 optimize

在这里插入图片描述

《Deep Residual Learning for Image Recognition》中大于50层的采用了右边的结构，层数比较少的时候，采用的左边的结构，他们的参数量是一样的。
在这里插入图片描述
$64 * 3 * 3 * 64 + 64 * 3 * 3 * 64 = 73728$
$64 * 1 * 1 * 64 + 64 * 3 * 3 * 64 + 64 * 1 * 1 * 256 + 64 * 1 * 1 * 256 = 73728$

《Identity Mappings in Deep Residual Networks》中对其结构进行了进一步分析与探讨，篇幅有限，不太深入的描写论文的细节。
在这里插入图片描述
我们先实现一个（e）版本的结构

2 resnet_32

2.1 my_resnet_32

1）导入库，设置 hyper parameters

import keras
import numpy as np
from keras.datasets import cifar10, cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers import Conv2D, Dense, Input, add, Activation, GlobalAveragePooling2D
from keras.callbacks import LearningRateScheduler, TensorBoard
from keras.models import Model
from keras import optimizers, regularizers
from keras import backend as K

stack_n            = 5
layers             = 6 * stack_n + 2
num_classes        = 10
batch_size         = 128
epochs             = 200
iterations         = 50000 // batch_size + 1
weight_decay       = 1e-4

log_filepath = './my_resnet_32/'

2）数据预处理，training schedule 设置

def color_preprocessing(x_train,x_test):
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    mean = [125.307, 122.95, 113.865]
    std  = [62.9932, 62.0887, 66.7048]
    for i in range(3):
        x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i]
        x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]
    return x_train, x_test

def scheduler(epoch):
    if epoch < 81:
        return 0.1
    if epoch < 122:
        return 0.01
    return 0.001

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = keras.utils.to_categorical(y_train, 10) # number of classes
y_test = keras.utils.to_categorical(y_test,10)# number of classes

# color preprocessing
x_train, x_test = color_preprocessing(x_train, x_test)

3）搭建 resnet_32 网络
通过 strides 来减小图片的分辨率，而不是 pooling
在这里插入图片描述
32层由 卷积层+15个residual block +fc 构成，每个residual block 中有2个卷积层

residual block 1-5：上图左边结构
residual block 6：上图中间结构
residual block 7-10：上图左边结构
residual block 11：上图中间结构
residual block 12-15：上图左边结构

def res_32(img_input):
    # input: 32x32x3 output: 32x32x16
    x = Conv2D(16, (3, 3), strides=(1,1),padding='same', kernel_regularizer=keras.regularizers.l2(weight_decay),
               kernel_initializer="he_normal")(img_input)

    # res_block1 to res_block5 input: 32x32x16 output: 32x32x16
    for _ in range(5):
        b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
        a0 = Activation('relu')(b0)
        conv_1 = Conv2D(16,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a0)
        b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
        a1 = Activation('relu')(b1)
        conv_2 = Conv2D(16,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
        kernel_initializer="he_normal")(a1)
        
        x = add([x,conv_2])
    
    # res_block6 input: 32x32x16 output: 16x16x32
    b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    a0 = Activation('relu')(b0)
    conv_1 = Conv2D(32,kernel_size=(3,3),strides=(2,2),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a0)
    b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
    a1 = Activation('relu')(b1)
    conv_2 = Conv2D(32,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a1)
    
    projection = Conv2D(32,kernel_size=(1,1),strides=(2,2),padding='same', kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a0)
    x = add([projection,conv_2])
    
    # res_block7 to res_block10 input: 16x16x32 output: 16x16x32
    for _ in range(1,5):
        b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
        a0 = Activation('relu')(b0)
        conv_1 = Conv2D(32,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a0)
        b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
        a1 = Activation('relu')(b1)
        conv_2 = Conv2D(32,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a1)
        x = add([x,conv_2])
        
    # res_block11 input: 16x16x32 output: 8x8x64
    b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    a0 = Activation('relu')(b0)
    conv_1 = Conv2D(64,kernel_size=(3,3),strides=(2,2),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a0)
    b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
    a1 = Activation('relu')(b1)
    conv_2 = Conv2D(64,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a1)
    
    projection = Conv2D(64,kernel_size=(1,1),strides=(2,2),padding='same', kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a0)
    x = add([projection,conv_2])    

    # res_block12 to res_block15 input: 8x8x64 output: 8x8x64
    for _ in range(1,5):
        b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
        a0 = Activation('relu')(b0)
        conv_1 = Conv2D(64,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a0)
        b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
        a1 = Activation('relu')(b1)
        conv_2 = Conv2D(64,kernel_size=(3,3),strides=(1,1),padding='same',kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a1)
        x = add([x,conv_2])
    
    # Dense input: 8x8x64 output: 64
    x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    # input: 64 output: 10
    x = Dense(10,activation='softmax',kernel_initializer="he_normal",
              kernel_regularizer=regularizers.l2(weight_decay))(x)
    return x

然后生成模型

img_input = Input(shape=(32,32,3))
output = res_32(img_input)
resnet = Model(img_input, output)

4）开始训练，trick 和【Keras-VGG19】CIFAR-10 中一样，不再赘述

# set optimizer
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

# set callback
tb_cb = TensorBoard(log_dir=log_filepath, histogram_freq=0)
change_lr = LearningRateScheduler(scheduler)
cbks = [change_lr,tb_cb]

# dump checkpoint if you need.(add it to cbks)
# ModelCheckpoint('./checkpoint-{epoch}.h5', save_best_only=False, mode='auto', period=10)

# set data augmentation
datagen = ImageDataGenerator(horizontal_flip=True,
                             width_shift_range=0.125,
                             height_shift_range=0.125,
                             fill_mode='constant',cval=0.)

datagen.fit(x_train)

# start training
resnet.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size),
                     steps_per_epoch=iterations,
                     epochs=epochs,
                     callbacks=cbks,
                     validation_data=(x_test, y_test))
resnet.save('my_resnet_32.h5')

5）结果分析
resnet32 没有用预训练模型
在这里插入图片描述
training accuracy 和 training loss

test accuracy 和 test loss

6）参数量和模型大小
VGG的恐怖之处不是盖的……
在这里插入图片描述

my_resnet_32
Total params: 470,218
Trainable params: 467,946
Non-trainable params: 2,272
vgg19_pretrain_0.0005
Total params: 39,002,738
Trainable params: 38,975,326
Non-trainable params: 27,412

2.2 resnet_32（resnet_32_e）

写代码的时候会发现上面 residual block 结构反复的重复，可以用更简单的形式实现，本节是对上一节的代码优化，结构不变。只需修改网络设计部分即可，其它部分都一样
1）定义residual block，分两种情况，就是上一节图中的左边和中间部分

def residual_block(x,o_filters,increase=False):
    stride = (1,1)
    if increase:
        stride = (2,2)

    o1 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(x))
    conv_1 = Conv2D(o_filters,kernel_size=(3,3),strides=stride,padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o1)
    o2  = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1))
    conv_2 = Conv2D(o_filters,kernel_size=(3,3),strides=(1,1),padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o2)
    if increase:
        projection = Conv2D(o_filters,kernel_size=(1,1),strides=(2,2),padding='same',
                            kernel_initializer="he_normal",
                            kernel_regularizer=regularizers.l2(weight_decay))(o1)
        block = add([conv_2, projection])
    else:
        block = add([conv_2, x])
    return block

2）搭建网络

def residual_network(img_input,classes_num=10,stack_n=5):
    # build model ( total layers = stack_n * 3 * 2 + 2 )
    # stack_n = 5 by default, total layers = 32
    # input: 32x32x3 output: 32x32x16
    x = Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same',
               kernel_initializer="he_normal",
               kernel_regularizer=regularizers.l2(weight_decay))(img_input)

    # input: 32x32x16 output: 32x32x16
    for _ in range(stack_n):
        x = residual_block(x,16,False)

    # input: 32x32x16 output: 16x16x32
    x = residual_block(x,32,True)
    for _ in range(1,stack_n):
        x = residual_block(x,32,False)
    
    # input: 16x16x32 output: 8x8x64
    x = residual_block(x,64,True)
    for _ in range(1,stack_n):
        x = residual_block(x,64,False)

    x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    # input: 64 output: 10
    x = Dense(classes_num,activation='softmax',kernel_initializer="he_normal",
              kernel_regularizer=regularizers.l2(weight_decay))(x)
    return x

3）生成模型

# build network
img_input = Input(shape=(32,32,3))
output    = residual_network(img_input,10,stack_n)  # 5
resnet    = Model(img_input, output)

对比下效果
在这里插入图片描述

OK，简洁多了，这样写也方便后续对《Identity Mappings in Deep Residual Networks》中 a、b、c、d、e 结构的探讨。

2.3 my_resnet_34

把 my_resnet_32 中 residual_block6 和 residual_block11 中左边形式的结构换成右边形式的结构，多出来两层
在这里插入图片描述
看看结果

效果差不多，不过resnet_34的参数量会少一些

my_resnet_32
Total params: 470,218
Trainable params: 467,946
Non-trainable params: 2,272
my_resnet_34
Total params: 416,458
Trainable params: 414,186
Non-trainable params: 2,272

在这里插入图片描述

2.4 resnet_32_a/b/c/d/e

在这里插入图片描述
前面的章节中resnet_32就是图上的（e）结构，需要注意的时候，当 feature map size 减半的时候，skip connection 是从 activation之后连接的。（d）也做同样的处理

代码如下

2.4.1 resnet_32_a

residual_block 修改为

def residual_block(x,o_filters,increase=False):
    stride = (1,1)
    if increase:
        stride = (2,2)
        
    conv_1 = Conv2D(o_filters,kernel_size=(3,3),strides=stride,padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(x)
    
    o1 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1))
    
    conv_2 = Conv2D(o_filters,kernel_size=(3,3),strides=(1,1),padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o1)
    
    o2  = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_2)

    if increase:
        projection = Conv2D(o_filters,kernel_size=(1,1),strides=(2,2),padding='same',
                            kernel_initializer="he_normal",
                            kernel_regularizer=regularizers.l2(weight_decay))(x)
        block = add([o2, projection])
        block = Activation('relu')(block)
    else:
        block = add([o2, x])
        block = Activation('relu')(block)
    return block

2.4.2 resnet_32_b

residual_block 修改为

def residual_block(x,o_filters,increase=False):
    stride = (1,1)
    if increase:
        stride = (2,2)
        
    conv_1 = Conv2D(o_filters,kernel_size=(3,3),strides=stride,padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(x)
    
    o1 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1))
    
    conv_2 = Conv2D(o_filters,kernel_size=(3,3),strides=(1,1),padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o1)

    if increase:
        projection = Conv2D(o_filters,kernel_size=(1,1),strides=(2,2),padding='same',
                            kernel_initializer="he_normal",
                            kernel_regularizer=regularizers.l2(weight_decay))(x)
        block = add([conv_2, projection])
        block  = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(block))
    else:
        block = add([conv_2, x])
        block  = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(block))
    return block

2.4.3 resnet_32_c

residual_block 修改为

def residual_block(x,o_filters,increase=False):
    stride = (1,1)
    if increase:
        stride = (2,2)
        
    conv_1 = Conv2D(o_filters,kernel_size=(3,3),strides=stride,padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(x)
    
    o1 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1))
    
    conv_2 = Conv2D(o_filters,kernel_size=(3,3),strides=(1,1),padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o1)
    o2 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_2))

    if increase:
        projection = Conv2D(o_filters,kernel_size=(1,1),strides=(2,2),padding='same',
                            kernel_initializer="he_normal",
                            kernel_regularizer=regularizers.l2(weight_decay))(x)
        block = add([o2, projection])
    else:
        block = add([o2, x])
    return block

2.4.4 resnet_32_d

residual_block 修改为

def residual_block(x,o_filters,increase=False):
    stride = (1,1)
    if increase:
        stride = (2,2)

    o1 = Activation('relu')(x)
    conv_1 = Conv2D(o_filters,kernel_size=(3,3),strides=stride,padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o1)
    o2  = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1))
    conv_2 = Conv2D(o_filters,kernel_size=(3,3),strides=(1,1),padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o2)
    o3 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_2)
    if increase:
        projection = Conv2D(o_filters,kernel_size=(1,1),strides=(2,2),padding='same',
                            kernel_initializer="he_normal",
                            kernel_regularizer=regularizers.l2(weight_decay))(o1)
        block = add([o3, projection])
    else:
        block = add([o3, x])
    return block

resnet_32_a/b/c/d 参数量都一样
Total params: 470,410
Trainable params: 468,042
Non-trainable params: 2,368
resnet_32
Total params: 470,218
Trainable params: 467,946
Non-trainable params: 2,272

training accuracy 和 training loss
在这里插入图片描述

testing accuracy 和 testing loss

伯仲之间，拉大一点看看

2.5 resnet32_d/e-v2

在这里插入图片描述
区别于 resnet_32_d / e， resnet_32_d / e_v2 当 feature map size 减少的时候，采用右边的结构，而不是左边的结构，代码只用修改projection 的输入即可。

哈哈，改变都不明显。
模型大小
在这里插入图片描述

3 resnet_50 / 101 / 152

参考 ResNet-50 architecture（caffe）
根据如下的图，我们改进下来一个 resnet_50 /101 /152，residual block 采用 Relu before addtion 的结构，也就是 resnet_32_c 的结构。
在这里插入图片描述

在这里插入图片描述

修改如下，第一层卷积，让 channels 变成 64，大小还是32，后面 con2_x 到 con5_x 和表中一样，最后 average pooling 之后 fc 层为 10

1） resnet_50 /101 /152 的 residual block 的设计都一样，如下代码所示：

def residual_block(x,o_filters_1,o_filters_2,increase=False):
    """
    increase: feature map size 减半，channels 增加
    """
    stride = (1,1)
    if increase:
        stride = (2,2)
        
    conv_1 = Conv2D(o_filters_1,kernel_size=(1,1),strides=stride,padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(x)
    
    o1 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1))
    
    conv_2 = Conv2D(o_filters_1,kernel_size=(3,3),strides=(1,1),padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o1)
    o2 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_2))
    
    conv_3 = Conv2D(o_filters_2,kernel_size=(1,1),strides=(1,1),padding='same',
                    kernel_initializer="he_normal",
                    kernel_regularizer=regularizers.l2(weight_decay))(o2)
    
    o3 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_3))
    if increase:
        projection = Conv2D(o_filters_2,kernel_size=(1,1),strides=stride,padding='same',
                            kernel_initializer="he_normal",
                            kernel_regularizer=regularizers.l2(weight_decay))(x)
        block = add([o3, projection])
    else:
        block = add([o3,  x])
    return block

2）网络结构如下：

resnet_50

def residual_network(img_input,classes_num=10):
    # input: 32x32x3 output: 32x32x64
    x = Conv2D(filters=64,kernel_size=(3,3),strides=(1,1),padding='same',
               kernel_initializer="he_normal",
               kernel_regularizer=regularizers.l2(weight_decay))(img_input)

    # input: 32x32x64 output: 16x16x256
    x = residual_block(x,64,256,True)
    for _ in range(2):
        x = residual_block(x,64,256,False)

    # input: 16x16x256 output: 8*8*512
    x = residual_block(x,128,512,True)
    for _ in range(3):
        x = residual_block(x,128,512,False)
    
    # input: 8*8*512 output: 4*4*1024
    x = residual_block(x,256,1024,True)
    for _ in range(5):
        x = residual_block(x,256,1024,False)
    
    # input: 4*4*1024 output: 2*2*2048
    x = residual_block(x,512,2048,True)
    for _ in range(2):
        x = residual_block(x,512,2048,False) 

    x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    # input: 64 output: 10
    x = Dense(classes_num,activation='softmax',kernel_initializer="he_normal",
              kernel_regularizer=regularizers.l2(weight_decay))(x)
    return x

resnet_101

def residual_network(img_input,classes_num=10):
    # input: 32x32x3 output: 32x32x64
    x = Conv2D(filters=64,kernel_size=(3,3),strides=(1,1),padding='same',
               kernel_initializer="he_normal",
               kernel_regularizer=regularizers.l2(weight_decay))(img_input)

    # input: 32x32x64 output: 16x16x256
    x = residual_block(x,64,256,True)
    for _ in range(2):
        x = residual_block(x,64,256,False)

    # input: 16x16x256 output: 8*8*512
    x = residual_block(x,128,512,True)
    for _ in range(3):
        x = residual_block(x,128,512,False)
    
    # input: 8*8*512 output: 4*4*1024
    x = residual_block(x,256,1024,True)
    for _ in range(22):
        x = residual_block(x,256,1024,False)
    
    # input: 4*4*1024 output: 2*2*2048
    x = residual_block(x,512,2048,True)
    for _ in range(2):
        x = residual_block(x,512,2048,False) 

    x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    # input: 64 output: 10
    x = Dense(classes_num,activation='softmax',kernel_initializer="he_normal",
              kernel_regularizer=regularizers.l2(weight_decay))(x)
    return x

resnet_152

def residual_network(img_input,classes_num=10):
    # input: 32x32x3 output: 32x32x64
    x = Conv2D(filters=64,kernel_size=(3,3),strides=(1,1),padding='same',
               kernel_initializer="he_normal",
               kernel_regularizer=regularizers.l2(weight_decay))(img_input)

    # input: 32x32x64 output: 16x16x256
    x = residual_block(x,64,256,True)
    for _ in range(2):
        x = residual_block(x,64,256,False)

    # input: 16x16x256 output: 8*8*512
    x = residual_block(x,128,512,True)
    for _ in range(3):
        x = residual_block(x,128,512,False)
    
    # input: 8*8*512 output: 4*4*1024
    x = residual_block(x,256,1024,True)
    for _ in range(35):
        x = residual_block(x,256,1024,False)
    
    # input: 4*4*1024 output: 2*2*2048
    x = residual_block(x,512,2048,True)
    for _ in range(2):
        x = residual_block(x,512,2048,False) 

    x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    # input: 64 output: 10
    x = Dense(classes_num,activation='softmax',kernel_initializer="he_normal",
              kernel_regularizer=regularizers.l2(weight_decay))(x)
    return x

3）结果如下：

training accuracy 和 training loss
test accuracy 和 test loss

4）test accuracy 放大结果如下
在这里插入图片描述

5）参数量

resnet_30_c
Total params: 470,410
Trainable params: 468,042
Non-trainable params: 2,368
resnet_50
Total params: 23,593,098
Trainable params: 23,543,690
Non-trainable params: 49,408
resnet_101
Total params: 42,663,562
Trainable params: 42,561,930
Non-trainable params: 101,632
resnet_152
Total params: 57,246,858
Trainable params: 57,105,290
Non-trainable params: 141,568

6）生成的模型的大小
在这里插入图片描述

4 总结

没有采用 max pooling 而是通过 strides 让 feature map size 减半。由 residual block 中的第一个 conv 的 strides 控制，如果 increase，第一个 conv 的 strides = (2,2)，skip connections 中也会多一个 1×1 的 conv，其 strides = (2,2)，否则，第一个 conv 的 strides = (1,1)，skip connections 中没有 conv，直接与卷积后的结果 add

5 附录

《Deep Residual Learning for Image Recognition》读后感

Q1：Is learning better networks as easy as stacking more layers?

A1：非也，会梯度爆炸，两种解决方案，
1）normalized initialization
2）intermediate normalization layers（BN）

Q2：采用上述两种方案，When deeper networks are able to start converging, a degradation problem has been exposed，网络效果饱和后，越深 performance 反而降低了（见 figure 1）……

A2：Kaiming He 大神提出了 deep residual learning framework，来使得网络越深效果越好，原理是，如果 conv 学习的是自身映射，WX=X，那么网络的 performance 随着网络越来越深，是不会下降的，但事实并非如此，怎么办？说明网络学习的并非自身映射（WX=X），因为网络随着深度增加，效果反而变差，所以 Kaiming He 提出 identity 的 shortcut，相当于给网络加先验，在 identity 的基础上精益求精，因为 identity 至少能保证网络的效果不会随着深度的增加而下降！
在这里插入图片描述
假设 WX=X 是网络的最优结构，那么加入的 identity shortcut 也不会忽略这种结果，也即加入 identity shortcut 后，原网络 W 接近0（To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.）

下面一张图能很好的反应出作者这种 innovation 的正确性！
在这里插入图片描述
统计 3x3 layer 的在BN后，activation之前的输出，可以看出，std 随着网络的增加减少，也就是说，网络学到后面，在 identity 的基础下，改的越来越少！

Q3：知道网络中 down sampling 喜欢 resolution 减半，channels 翻倍的原因吗？（哈哈，看到解释了）
A3：if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer.

Q4：shortcut 能只跨一层吗？
A4：跨一层没有意义，WX+X 线性变换（注意图2是 addition 之后，才 activation，如果addition之前activation，情况就不一样了，一层就有意义了）
在这里插入图片描述

Q5：你了解 PASCAL VOC 中 07+12 和 07++12 的含义吗？
A5：
for the PASCAL VOC 2007 test set,
we use the 5k trainval images in VOC 2007 and 16k trainval images in VOC 2012 for training (“07+12”).

For the PASCAL VOC 2012 test set,
we use the 10k trainval+test images in VOC 2007 and 16k trainval images in VOC 2012 for training (“07++12”).

对ResNet本质的一些思考这篇文章的视角很好，eg 如果一个信息可以完整地流过一个非线性激活层，则这个非线性激活层对于这个信息而言，相当于仅仅作了一个线性激活。

在这里插入图片描述