ResNet

论文信息

原文地址: Deep Residual Learning for Image Recognition

作者:Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

成就:模型在ImageNet ILSVRC和COCO 2015上,一共取得了5项冠军。

ResNet 设计动机

在深度卷积神经网络中,增加网络层数意味着网络可以进行更加复杂的特征模式的提取。理论上,更深的模型可以取得更好的结果,对于梯度消失/爆炸的问题,也可以通过标准初始化方法(normalized initialization)和归一化层(normalization layers)进行解决。

然而作者发现,当更深的网络开始收敛时,出现了退化问题(Degradation problem):网络深度增加时,网络准确度出现饱和,甚至出现下降。下图实验数据中:56层的网络比20层网络效果还要差。这不会是过拟合问题,因为56层网络的训练误差同样高。

深度残差学习(deep residual learning)

对于一个浅层网络,通过向上堆积新层来建立深层网络,一个极端情况是,这些增加的层都作为恒等映射(identity mapping),什么也不学习,仅仅复制浅层网络的特征。在这种情况下,深层网络应该至少和浅层网络性能一样,也不应该出现退化现象。

基于这个假设,作者提出了深度残差学习(deep residual learning)来解决退化问题,残差学习的结构下图所示,类似与电路中的“短路”,恒等链接是一种短路链接(shortcut connection),既不增加额外的参数也不增加计算复杂度。

  • 对于一个堆积层结构(几层堆积而成),当输入为 x x x 时其学习到的特征记为 H ( x ) H(x) H(x)

  • 现在我们希望其可以学习到残差 F ( x ) = H ( x ) − x F(x)=H(x)-x F(x)=H(x)x ,相比原始的学习特征 F ( x ) + x F(x)+x F(x)+x,残差学习更容易。

  • 在极端情况下,如果一个恒等映射是最优的,那么将残差置为 0 比使用一堆非线性层来拟合恒等映射更容易,至少网络性能不会下降。

  • 残差不为 0 时,堆积层在输入特征基础上学习到了新的特征,从而拥有更好的性能。

恒等映射块结构

恒等映射(identity mapping)块结构(building block)定义为:

y = F ( x , W i ) + x y = F(x, {W_i}) + x y=F(x,Wi)+x

  • x x x y y y 是层的输入和输出向量。

  • 函数 F ( x , W i ) F(x, {W_i}) F(x,Wi) 表示要学习的残差映射(residual mapping)。

  • x x x F F F 的维度必须是相等的。当更改输入输出通道数不对等时,可以对短路链接执行线性投影 W s W_s Ws来匹配维度:

    y = F ( x , W i ) + W s x y = F(x, {W_i }) + W_sx y=F(x,Wi)+Wsx

  • 对于 ResNet 的两种映射方式,如果网络已经到达最优,继续加深网络,residual mapping 将被 push 为0,只剩下 identity mapping,这样理论上网络一直处于最优状态了,网络的性能也就不会随着深度增加而降低了。

  • 残差函数 F F F 的形式是可变的,作者提出了两种设计,分别针对 ResNet 34(左图)和 ResNet 50/101/152(右图),一般称整个结构为一个 building block,其中右图又称为 bottleneck design 。

  • Bottleneck Design 通常用于较深的网络中,目的是减少计算和参数量,第一个 1 × 1 1×1 1×1 的卷积把 256 256 256 维 channel 降到 64 64 64 维,最后再通过 1 × 1 1×1 1×1 卷积恢复channel 数量。

ResNet 网络架构

查看 resnet-50 网络架构 :地址

ResNet 网络架构遵循两个简单的设计规则:

  • 特征图输出尺寸相同的层,具有相同数量的滤波器。

  • 如果特征图尺寸减半,则滤波器数量加倍,以便保持每层的时间复杂度。

作者通过步长为 2 的卷积层直接执行下采样。网络以全局平均池化层和具有softmax的1000维全连接层结束。

值得注意的是,ResNet 34与 VGG 网络相比,有更少的滤波器和更低的复杂度。ResNet 34 有 36 亿FLOP(乘加),仅是VGG-19(196亿FLOP)的18%。

  • 作者一共提出了 5 种深度的 ResNet,分别是18、34、50、101和152

  • 所有的网络都分成5部分,分别是:conv1、conv2_x、conv3_x、conv4_x、conv5_x,之后的其他论文也会专门用这个称呼指代 ResNet50 或者 101 的每部分。

  • 以 101-layer 为例,首先经过 7 × 7 × 64 7×7×64 7×7×64 的卷积,然后经过 3 + 4 + 23 + 3 = 33 3 + 4 + 23 + 3 = 33 3+4+23+3=33 个 building block,每个 block 为3层,所以有 33 × 3 = 99 33 × 3 = 99 33×3=99层,最后是 fc 层(用于分类),所以一共有 1 + 99 + 1 = 101 1 + 99 + 1 = 101 1+99+1=101 层。

注:101层网络仅仅指卷积或者全连接层,而激活层或者Pooling层并没有计算在内;

ResNet 50 微调

ResNet 50 权重文件: 地址

# -*- coding: utf-8 -*-
from keras.optimizers import SGD
from keras.layers import Input, Dense, Convolution2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D, Flatten, Add, Activation
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras import backend as K

from sklearn.metrics import log_loss
from get_traffic_dataset import TrafficImageDataGenerator 

train_file = './citySpace/outData/train/'
val_file = './citySpace/outData/val/'

def identity_block(input_tensor, kernel_size, filters, stage, block):
	"""
	The identity_block is the block that has no conv layer at shortcut
	Arguments
		input_tensor: input tensor
		kernel_size: defualt 3, the kernel size of middle conv layer at main path
		filters: list of integers, the nb_filters of 3 conv layer at main path
		stage: integer, current stage label, used for generating layer names
		block: 'a','b'..., current block label, used for generating layer names
	"""

	nb_filter1, nb_filter2, nb_filter3 = filters
	conv_name_base = 'res' + str(stage) + block + '_branch'
	bn_name_base = 'bn' + str(stage) + block + '_branch'

	x = Convolution2D(nb_filter1, 1, 1, name=conv_name_base + '2a')(input_tensor)
	x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
	x = Activation('relu')(x)

	x = Convolution2D(nb_filter2, kernel_size, kernel_size,
	                  border_mode='same', name=conv_name_base + '2b')(x)
	x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
	x = Activation('relu')(x)

	x = Convolution2D(nb_filter3, 1, 1, name=conv_name_base + '2c')(x)
	x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)

	x = Add()([x, input_tensor])
	x = Activation('relu')(x)
	return x


def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):
	"""
	conv_block is the block that has a conv layer at shortcut
	# Arguments
		input_tensor: input tensor
		kernel_size: defualt 3, the kernel size of middle conv layer at main path
		filters: list of integers, the nb_filters of 3 conv layer at main path
		stage: integer, current stage label, used for generating layer names
		block: 'a','b'..., current block label, used for generating layer names
	Note that from stage 3, the first conv layer at main path is with subsample=(2,2)
	And the shortcut should have subsample=(2,2) as well
	"""

	nb_filter1, nb_filter2, nb_filter3 = filters
	conv_name_base = 'res' + str(stage) + block + '_branch'
	bn_name_base = 'bn' + str(stage) + block + '_branch'

	x = Convolution2D(nb_filter1, 1, 1, subsample=strides,
	                  name=conv_name_base + '2a')(input_tensor)
	x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
	x = Activation('relu')(x)

	x = Convolution2D(nb_filter2, kernel_size, kernel_size, border_mode='same',
	                  name=conv_name_base + '2b')(x)
	x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
	x = Activation('relu')(x)

	x = Convolution2D(nb_filter3, 1, 1, name=conv_name_base + '2c')(x)
	x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)

	shortcut = Convolution2D(nb_filter3, 1, 1, subsample=strides,
	                         name=conv_name_base + '1')(input_tensor)
	shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + '1')(shortcut)

	x = Add()([x, shortcut])
	x = Activation('relu')(x)
	return x


def resnet50_model(img_rows, img_cols, color_type=1, num_classes=None):
	"""
	Resnet 50 Model for Keras
	Parameters:
	  img_rows, img_cols - resolution of inputs
	  channel - 1 for grayscale, 3 for color
	  num_classes - number of class labels for our classification task
	"""

	# Handle Dimension Ordering for different backends
	global bn_axis
	if K.image_dim_ordering() == 'tf':
		bn_axis = 3
		img_input = Input(shape=(img_rows, img_cols, color_type))
	else:
		bn_axis = 1
		img_input = Input(shape=(color_type, img_rows, img_cols))

	x = ZeroPadding2D((3, 3))(img_input)
	x = Convolution2D(64, 7, 7, subsample=(2, 2), name='conv1')(x)
	x = BatchNormalization(axis=bn_axis, name='bn_conv1')(x)
	x = Activation('relu')(x)
	x = MaxPooling2D((3, 3), strides=(2, 2))(x)

	x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
	x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
	x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

	x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
	x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
	x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
	x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

	x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
	x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
	x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
	x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
	x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
	x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

	x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
	x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
	x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')

	# Fully Connected Softmax Layer
	x_fc = AveragePooling2D((7, 7), name='avg_pool')(x)
	x_fc = Flatten()(x_fc)
	x_fc = Dense(1000, activation='softmax', name='fc1000')(x_fc)

	# Create model
	model = Model(img_input, x_fc)

	# Load ImageNet pre-trained data
	if K.image_dim_ordering() == 'th':
		# Use pre-trained weights for Theano backend
		weights_path = 'imagenet_models/resnet50_weights_th_dim_ordering_th_kernels.h5'
	else:
		# Use pre-trained weights for Tensorflow backend
		weights_path = 'resnet50_weights_tf_dim_ordering_tf_kernels.h5'

	model.load_weights(weights_path)

	# Truncate and replace softmax layer for transfer learning
	# Cannot use model.layers.pop() since model is not of Sequential() type
	# The method below works since pre-trained weights are stored in layers but not in the model
	x_newfc = AveragePooling2D((7, 7), name='avg_pool')(x)
	x_newfc = Flatten()(x_newfc)
	x_newfc = Dense(num_classes, activation='softmax', name='fc10')(x_newfc)

	# Create another model with our customized softmax
	model = Model(img_input, x_newfc)

	# Learning rate is changed to 0.001
	sgd = SGD(lr=1e-3, decay=1e-6, momentum=0.9, nesterov=True)
	model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])

	return model


if __name__ == '__main__':
	# Example to fine-tune on 3000 samples from Cifar10

	img_rows, img_cols = 224, 224  # Resolution of inputs
	channel = 3
	num_classes = 3
	batch_size = 16
	nb_epoch = 10

	# Initalize the data generator seperately for the training and validation set
	train_generator = TrafficImageDataGenerator(train_file, horizontal_flip=True, shuffle=True)
	val_generator = TrafficImageDataGenerator(val_file, horizontal_flip=True, shuffle=True)
 
	X_valid, Y_valid,val_labels = val_generator.all(10000)
	X_train, Y_train, train_labels = train_generator.all(10000)

	# Load our model
	model = resnet50_model(img_rows, img_cols, channel, num_classes)

	# Start Fine-tuning
	model.fit(X_train, Y_train,
	          batch_size=batch_size,
	          nb_epoch=nb_epoch,
	          shuffle=True,
	          verbose=1,
	          validation_data=(X_valid, Y_valid),
	          )

	# Make predictions
	predictions_valid = model.predict(X_valid, batch_size=batch_size, verbose=1)

	# Cross-entropy loss score
	score = log_loss(Y_valid, predictions_valid)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值