1 AlexNet
《ImageNet Classification with Deep Convolutional Neural Networks》
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
1.1 综述及训练
数据集:ImageNet LSVRC-2010 、120万图像、1000类别
错误率:top-1 37.5%, top-5 17.0%
架构:5个卷积层 + 3个池化 + 3个全连接层
防止过拟合:Dropout+数据增强
数据处理:1. img短边resize到256 。2. 另外一边在中间截取256个像素[256,256]。3.每个通道减去其均值。
训练数据:1. img[256,256,3。2. 随机截取[224,224,3]。3. 水平翻转
测试数据:1.分别以img的四个角及中心点[(0,0),(0,256),(256,0),(265,256),(256/2,256/2)]截取[224,224] 尺寸的图像。2.对上一步得到的5张图像镜像翻转。3.对十张照片的结果平均得到最终结果。
权重初始化:每一层w_N(0,001)。第2、4、5卷积层及全连接的bias初始化值为1,其余层的bias值为0。
训练:1. mini_batch[batch_size=128]。2.optimal(momentum,mu=0.9。3.L2正则化(mu=0.9,lambda=0.0005[正则化系数])。 4 .所有层学习率相同, lr=0.01,当loss不再下降时,lr/=10,在训练过程中,学习率一共下降了三次。5.
img_num=1200000,epoches=90
1.2 AlexNet创新
- 非线性激活ReLU。计算简单,提高训练速度 --> 当输入数据大于0使,导数为1,缓解梯度消失 --> 输入数据小于0,梯度为0,让一些神经元失活,起到正则化的效果,但同时也会损失掉一部分信息。
- 多GPU训练。减少参数量+提高训练速度。实验数据表示,two-GPU方案会比只用one-GPU跑半个上面大小网络的方案,在准确度上提高了1.7%的top-1和1.2%的top-5。当然,one-GPU的半个网络和two-GPU网络结构是不一样的,two-GPU有指标上的提升也并不奇怪。
- 局部响应归一化LRN。当像素点比临近像素点大时,像素点变大;反之,变小。 top-1 and top-5 error rates by 减少1.4% and 1.2%
- MaxPooling。strides < kernel_size,重叠池化层,避免了平均池化带来的模糊化效果。实验表示使用 带交叠的Pooling的效果比的传统要好,在top-1和top-5上分别提高了0.4%和0.3%,在训练阶段有避免过拟合。但野增加计算量,带来冗余信息。
- Dropout。神经元随机失活,减少神经元之间的相互依赖,从而确保提取出相互独立的重要特征。
- 数据增强(Data Augmentation)。随机裁剪 + 镜像翻转 + 对RGB通道做PCA,主成分做一个N(0, 0.1)的高斯扰动, top-1错误率下降1%。
I x y = [ I x y R , I x y G , I x y B ] T + [ p 1 , p 2 , p 3 ] [ α 1 λ 1 , α 2 λ 2 , α 3 λ 3 ] T I_{xy}=[I_{xy}^R,I_{xy}^G,I_{xy}^B]^T+[p_1,p_2,p_3][\alpha_1\lambda_1,\alpha_2\lambda_2,\alpha_3\lambda_3]^T Ixy=[IxyR,IxyG,IxyB]T+[p1,p2,p3][α1λ1,α2λ2,α3λ3]T
α N ( 0 , 0.1 ) \alpha ~ N(0,0.1) α N(0,0.1)
2 网络构架
input[227,227,3]
↓
Conv2(k=11,f=2*48,s=4) + ReLU [55,55,96]
↓
MaxPool1(k=3,s=2) [27,27,96]
↓
Norm1(local_size=5)
↓
Conv2(k=5,f=2*128,s=1,p=2) + ReLU [27,27,256]
↓
MaxPool2(k=3,s=2) [13,13,256]
↓
Norm2(local_size=5)
↓
Conv3(k=3,f=2*192,s=1,p=1) + ReLU [13,13,384] concate
↓
Conv4(k=3,f=2*192,s=1,p=1) + ReLU [13,13,384]
↓
Conv5(k=3,f=2*128,s=1,p=1) + ReLU [13,13,256]
↓
MaxPool3(k=3,s=2) [6,6,256]
↓
FC(4096) + ReLU concate
↓
Dropout(rate=0.5)
↓
FC(4096) + ReLU concate
↓
Dropout(rate=0.5)
↓
FC(1000) + Softmax concate
2 代码(没有添加LRN)
from keras.layers import Input,Conv2D, Concatenate,Flatten, MaxPooling2D,Dense,Dropout
from keras.models import Model
def alexNet(input):
x1 = Conv2D(filters=48, kernel_size=(11,11),
activation='relu',
padding='valid',
strides= 4,
name='conv1_1')(input)
x1 = MaxPooling2D(pool_size=(3,3),strides=(2, 2),padding='valid')(x1)
x1 = Conv2D(filters=128, kernel_size=(5,5),
activation='relu',
padding='same',
strides= 1,
name='conv1_2')(x1)
x1 = MaxPooling2D(pool_size=(3,3),strides=(2, 2),padding='valid')(x1)
x2 = Conv2D(filters=48, kernel_size=(11,11),
activation='relu',
padding='valid',
strides= 4,
name='conv2_1')(input)
x2 = MaxPooling2D(pool_size=(3,3),strides=(2, 2),padding='valid')(x2)
x2 = Conv2D(filters=128, kernel_size=(5,5),
activation='relu',
padding='same',
strides= 1,
name='conv2_2')(x2)
x2 = MaxPooling2D(pool_size=(3,3),strides=(2, 2),padding='valid')(x2)
x = Concatenate(axis=3)([x1,x2])
x1 = Conv2D(filters=192, kernel_size=(3,3),
activation='relu',
padding='same',
strides= 1,
name='conv3_1')(x)
x1 = Conv2D(filters=192, kernel_size=(3,3),
activation='relu',
padding='same',
strides= 1,
name='conv4_1')(x1)
x1 = Conv2D(filters=128, kernel_size=(3,3),
activation='relu',
padding='same',
strides= 1,
name='conv5_1')(x1)
x1 = MaxPooling2D(pool_size=(3,3),strides=(2, 2),padding='valid')(x1)
x2 = Conv2D(filters=192, kernel_size=(3,3),
activation='relu',
padding='same',
strides= 1,
name='conv3_2')(x)
x2 = Conv2D(filters=192, kernel_size=(3,3),
activation='relu',
padding='same',
strides= 1,
name='conv4_2')(x2)
x2 = Conv2D(filters=128, kernel_size=(3,3),
activation='relu',
padding='same',
strides= 1,
name='conv5_2')(x2)
x2 = MaxPooling2D(pool_size=(3,3),strides=(2, 2),padding='valid')(x2)
x = Concatenate(axis=3)([x1,x2])
x = Flatten()(x)
x1 = Dense(2048,activation='relu')(x)
x2 = Dense(2048,activation='relu')(x)
x = Concatenate(axis=0)([x1,x2])
x = Dropout(rate=0.5)(x)
x1 = Dense(2048,activation='relu')(x)
x2 = Dense(2048,activation='relu')(x)
x = Concatenate(axis=0)([x1,x2])
x = Dropout(rate=0.5)(x)
x = Dense(1000,activation='softmax')(x)
return x
if __name__ == '__main__':
input = Input(shape=[227,227,3])
output = alexNet(input)
model = Model(input,output)
print(model.summary())
'''
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 227, 227, 3) 0
__________________________________________________________________________________________________
conv1_1 (Conv2D) (None, 55, 55, 48) 17472 input_1[0][0]
__________________________________________________________________________________________________
conv2_1 (Conv2D) (None, 55, 55, 48) 17472 input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 27, 27, 48) 0 conv1_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, 27, 27, 48) 0 conv2_1[0][0]
__________________________________________________________________________________________________
conv1_2 (Conv2D) (None, 27, 27, 128) 153728 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2_2 (Conv2D) (None, 27, 27, 128) 153728 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 13, 13, 128) 0 conv1_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D) (None, 13, 13, 128) 0 conv2_2[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 13, 13, 256) 0 max_pooling2d_2[0][0]
max_pooling2d_4[0][0]
__________________________________________________________________________________________________
conv3_1 (Conv2D) (None, 13, 13, 192) 442560 concatenate_1[0][0]
__________________________________________________________________________________________________
conv3_2 (Conv2D) (None, 13, 13, 192) 442560 concatenate_1[0][0]
__________________________________________________________________________________________________
conv4_1 (Conv2D) (None, 13, 13, 192) 331968 conv3_1[0][0]
__________________________________________________________________________________________________
conv4_2 (Conv2D) (None, 13, 13, 192) 331968 conv3_2[0][0]
__________________________________________________________________________________________________
conv5_1 (Conv2D) (None, 13, 13, 128) 221312 conv4_1[0][0]
__________________________________________________________________________________________________
conv5_2 (Conv2D) (None, 13, 13, 128) 221312 conv4_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D) (None, 6, 6, 128) 0 conv5_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_6 (MaxPooling2D) (None, 6, 6, 128) 0 conv5_2[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 6, 6, 256) 0 max_pooling2d_5[0][0]
max_pooling2d_6[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 9216) 0 concatenate_2[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 2048) 18876416 flatten_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 2048) 18876416 flatten_1[0][0]
__________________________________________________________________________________________________
concatenate_3 (Concatenate) (None, 2048) 0 dense_1[0][0]
dense_2[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 2048) 0 concatenate_3[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 2048) 4196352 dropout_1[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 2048) 4196352 dropout_1[0][0]
__________________________________________________________________________________________________
concatenate_4 (Concatenate) (None, 2048) 0 dense_3[0][0]
dense_4[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 2048) 0 concatenate_4[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) (None, 1000) 2049000 dropout_2[0][0]
==================================================================================================
Total params: 50,528,616
Trainable params: 50,528,616
Non-trainable params: 0
__________________________________________________________________________________________________
None
Process finished with exit code 0
'''
3 dropout函数
- Dropout:论文中说多个模型预测结果求求平均,预测结果更准确。但是训练多个模型使成本增加,Dropout就很好解决这个问题。每次训练随机使神经元失活,等于每次训练一个模型,
r_j^l (-)Bernoulli(p)
y^{l} = r^l * y^l
z_i^{l+1} = w_i^{l+1}* y^{l} + b_i^{l+1}
y_i^{l+1} = f(z_i^{l+1})
训练过程:第l层神经元以p的概率保存,参数值由w变为pw。
测试过程:神经元w乘以p。这样神经元期望值不变。
Dropout有正则化的效果:降低神经元之间的相关性。
- 代码 (参考:https://zhuanlan.zhihu.com/p/38200980)
# dropout函数的实现
def dropout(x, level):
if level < 0. or level >= 1: #level是概率值,必须在0~1之间
raise ValueError('Dropout level must be in interval [0, 1[.')
retain_prob = 1. - level
# 我们通过binomial函数,生成与x一样的维数向量。binomial函数就像抛硬币一样,我们可以把每个神经元当做抛硬币一样
# 硬币 正面的概率为p,n表示每个神经元试验的次数
# 因为我们每个神经元只需要抛一次就可以了所以n=1,size参数是我们有多少个硬币。
random_tensor = np.random.binomial(n=1, p=retain_prob, size=x.shape) #即将生成一个0、1分布的向量,0表示这个神经元被屏蔽,不工作了,也就是dropout了
print(random_tensor)
x *= random_tensor
print(x)
x /= retain_prob
return x
#对dropout的测试,大家可以跑一下上面的函数,了解一个输入x向量,经过dropout的结果
x=np.asarray([1,2,3,4,5,6,7,8,9,10],dtype=np.float32)
dropout(x,0.4)
4 数据处理
把最短边resize到224,宽高比例不变+随机裁剪 + 镜像翻转 + 对RGB通道做PCA,主成分做一个N(0, 0.1)的高斯扰动
# 1 把最短边resize到224,宽高比例不变
def resize_image(image, size):
iw, ih = image.size
w, h = size
scale = min(w/iw, h/ih)
nw = int(iw*scale)
nh = int(ih*scale)
image = image.resize((nw,nh), Image.BICUBIC)
return new_image
# 2 随机裁剪
# 3 镜像翻转
image = image.transpose(Image.FLIP_LEFT_RIGHT)
# 4 对RGB通道做PCA,主成分做一个N(0, 0.1)的高斯扰动
5 论文精华
- Dropout
- 数据增强
- 分组卷积