1.阅读论文
- 论文作者的目的是什么或实现了什么?
较于传统的机器学习方法,在面对数量大,类别更且分辨率高的数据集时,我们需要一个更深更宽的模型。(To learn about thousands of objects from millions of images, we need a model with a large learning capacity.)
- 新方法的关键点是什么?
1.Relu激活函数(同时期另外提出的 f ( x ) = ∣ t a n h ( x ) ∣ f(x) = |tanh(x)| f(x)=∣tanh(x)∣效果对比)
2.当时期下的多GPU训练(当时的硬件条件有限)
3.LRN(Local Response Normalization)方法
4.Overlapping Pooling(指kernel小于image size的Pooling)
5.Reducing Overfitting(1.Data Augmentation(数据增强)2.Dropout方法)
- 还有哪些这方面的研究?
有很多。
- 模型的结构概览
2.模型概览
- 模型的组成部分
COMPONENT | TYPE |
---|---|
Convolution(卷积) | Convolutions |
Grouped Convolution(分组卷积) | Convolutions |
Dense Connections(全连接) | Feedforward Networks |
Dropout | Regularization |
Local Response Normalization(局部响应归一化) | Normalization |
Maxpooling(最大池化) | Pooling Operations |
Relu | Activation Functions |
Softmax | Output Functions |
3.模型细节
- Local Response Normalization(局部响应归一化)
1.在Keras中可以直接将tf.nn.local_response_normalization封装成Layer进行调用,完成模型构建,但不懂其原理。
2.查阅资料【写得真棒!】
开始理解模式!
在AlexNet论文中给出的公式如下所示【写得是啥?】
为什么要Normalization?
Normalization是为了限制无界激活函数的输出。 如ReLU,ELU等等。
什么是Local Response Normalization?
LRN受Lateral inhibition所启发,Lateral inhibition指的是神经生物学中强烈兴奋神经元会抑制其邻近神经元表达的一种现象:“这会导致局部最大值形式的峰值出现,从而在该区域产生对比度并增加感官知觉。”(我的理解是,地主出现在江湖,以其强大开始镇压周围百姓,抢夺金钱,时代更替,要改变过去的现象,开始斗地主,因为地主剥削百姓,其臭名昭著与平凡的百姓成鲜明对比,人民军队一下就发现了地主。)
比较正规的解释:
LRN is a non-trainable layer that square-normalizes the pixel values in a feature map in a within a local neighborhood.(LRN是不可训练的图层,用于对局部邻域内的特征图中的像素值进行平方归一化。)
Local Response Normalization有两种
按照当前像素点,可以把LRN分为两种:
1.Inter-Channel LRN(通道间LRN)
顾名思义,在Inter-Channel LRN中,定义的邻居,是以通道方向来划分,继续往下看。
计算的公式如上图所示,其中:
- i i i指得是,第 i i i个kernel
- a x , y a_{x,y} ax,y, b x , y b_{x,y} bx,y,分别指归一化前后 ( x , y ) (x,y) (x,y)点处的像素值
- n n n,指的是要计算时与"邻近点"的距离
- N N N,是通道的总数
- k k k,常数,避免分母为0的情况出现
- α α α,归一化常数
- β β β,对比常数
综上,超参数有 ( k , α , β , n ) (k,α,β,n) (k,α,β,n),当其取值为 ( 0 , 1 , 1 N ) (0,1,1N) (0,1,1N)时,即是Standard Normalization,在上示图中第一行,可知其 n = 2 n=2 n=2, N = 4 N=4 N=4。
举个栗子:
在上图中,不同颜色泛指不同的通道,根据上述可知, N = 4 N=4 N=4;
我们对超参数取值为 ( k , α , β , n ) = ( 0 , 1 , 1 , 2 ) (k,α,β,n)=(0,1,1,2) (k,α,β,n)=(0,1,1,2),其中:
- n = 2 n=2 n=2指的是,在计算位置 ( i , x , y ) (i,x,y) (i,x,y)的归一化结果值时,我们要相应考虑它的上一个 f i l t e r filter filter和下一个 f i l t e r filter filter在同一位置的值,即 ( i − 1 , x , y ) (i-1,x,y) (i−1,x,y)和 ( i + 1 , x , y ) (i+1,x,y) (i+1,x,y);
- 在上图中的, ( i , x , y ) = ( 0 , 0 , 0 ) (i,x,y)=(0,0,0) (i,x,y)=(0,0,0)处,我们可得其下一个 f i l t e r filter filter为 ( i + 1 , x , y ) = ( 1 , 0 , 0 ) (i+1,x,y)=(1,0,0) (i+1,x,y)=(1,0,0)的像素值为1,上一个 f i l t e r filter filter不存在,所以通过公式:
b 0 , 0 1 = a 0 , 0 1 ∑ j = m a x ( 0 , 0 ) m i n ( 3 , 2 ) ( a 0 , 0 j ) 2 = 0.5 b^{1}_{0,0}=\frac{a^{1}_{0,0}}{\sum^{min(3,2)}_{j=max(0,0)}(a^{j}_{0,0})^2}=0.5 b0,01=∑j=max(0,0)min(3,2)(a0,0j)2a0,01=0.5
以此类推,计算其它的像素值。
2.Intra-Channel LRN(通道内LRN)
此时的计算公式应该是这样的:
- ( W , H ) (W,H) (W,H)指的是,在当前Feature Map的宽度与高度,如下图所示的 ( W , H ) = ( 5 , 5 ) (W,H)=(5,5) (W,H)=(5,5)
- 其计算的步骤与通道间LRN计算的方式一致;
比较
通道间LRN与通道内LRN的最大区别,便在于其定义的“邻居”区别,在通道间其对比的坐标点应为 ( x , y , c ) (x,y,c) (x,y,c)其中的 c c c指的是通道,通道内LRN对比坐标点应为 ( x , y ) (x,y) (x,y),前者是3D内的归一化,后者是2D内的归一化。
在reddit上的提问与回答
Tensorflow的官方实现接口
- Dropout
- Grouped Convolution(分组卷积)
AlexNet里面使用的初衷是因为单GPU的硬件资源不够,但在后续的许多模型借鉴其思想:”使用一组卷积,然后每层多个内核,得到每层多个通道输出,再进行整合。“
诸如ResNeXt模型等中也表明,能通过它来提高准确性。
To do list:
Dropout
4.数据集的准备
- 准备数据集
1.Hackathon Blossom(102种花分类)
链接:https://pan.baidu.com/s/1ZPeKrPLR2_81qgi012DgEg
提取码:dk96
2.104 Flowers Garden of Eden(104种花分类)
链接:https://pan.baidu.com/s/1VIEVF3-dMOGiFPyl4hsxgA
提取码:kdur
3…后续添加
5.模型结构图表
细节表
operation | input | filter(W,H,C,N,S) | output |
---|---|---|---|
Conv1 | (224,224,3) | (11,11,3,96,4) | (55,55,96) |
LRN | (55,55,96) | None | (55,55,96) |
MaxPooling | (55,55,96) | None | (27,27,96) |
Conv2 | (27,27,96) | (5,5,96,256) | (27,27,256) |
LRN | (27,27,256) | None | (27,27,256) |
MaxPooling | (27,27,256) | None | (13,13,256) |
Conv3 | (13,13,256) | (3,3,256,384) | (13,13,384) |
Conv4 | (13,13,384) | (3,3,384,384) | (13,13,384) |
Conv5 | (13,13,384) | (3,3,384,256) | (13,13,256) |
MaxPooling | (13,13,256) | None | (6,6,256) |
Dense | ( 6 ∗ 6 ∗ 256 6*6*256 6∗6∗256) | None | (4096,) |
Dropout | |||
Dense | (4096,) | None | (4096,) |
Dropout | |||
Dense | (4096,) | None | (1000,) |
结构图
Model: "AlexNet"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Input (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
zero_padding2d (ZeroPadding2 (None, 227, 227, 3) 0
_________________________________________________________________
Conv_1 (Conv2D) (None, 55, 55, 96) 34944
_________________________________________________________________
Lrn_1 (Lambda) (None, 55, 55, 96) 0
_________________________________________________________________
activation (Activation) (None, 55, 55, 96) 0
_________________________________________________________________
Maxpool_1 (MaxPooling2D) (None, 27, 27, 96) 0
_________________________________________________________________
Conv_2 (Conv2D) (None, 27, 27, 256) 614656
_________________________________________________________________
Lrn_2 (Lambda) (None, 27, 27, 256) 0
_________________________________________________________________
activation_1 (Activation) (None, 27, 27, 256) 0
_________________________________________________________________
Maxpool_2 (MaxPooling2D) (None, 13, 13, 256) 0
_________________________________________________________________
Conv_3_1 (Conv2D) (None, 13, 13, 384) 885120
_________________________________________________________________
Conv_3_2 (Conv2D) (None, 13, 13, 384) 1327488
_________________________________________________________________
Conv_3_3 (Conv2D) (None, 13, 13, 256) 884992
_________________________________________________________________
Maxpool_3 (MaxPooling2D) (None, 6, 6, 256) 0
_________________________________________________________________
Flt_1 (Flatten) (None, 9216) 0
_________________________________________________________________
fc_1 (Dense) (None, 4096) 37752832
_________________________________________________________________
drop_1 (Dropout) (None, 4096) 0
_________________________________________________________________
fc_2 (Dense) (None, 4096) 16781312
_________________________________________________________________
drop_2 (Dropout) (None, 4096) 0
_________________________________________________________________
Output (Dense) (None, 102) 417894
=================================================================
Total params: 58,699,238
Trainable params: 58,699,238
Non-trainable params: 0
_________________________________________________________________
6.代码实现
import tensorflow as tf
from tensorflow import keras
import os
import shutil
from tensorflow.keras.layers import Conv2D, Lambda, MaxPool2D, Flatten, Dense, Dropout, Activation, ZeroPadding2D, Input
from tensorflow.keras.utils import plot_model
from tensorflow.keras import optimizers, losses, initializers
def AlexNet(input_shape, num_classes):
inputs = Input(input_shape, name="Input")
x = ZeroPadding2D(((3, 0), (3, 0)))(inputs)
x = Conv2D(96,
(11, 11),
4,
kernel_initializer=initializers.RandomNormal(stddev=0.01),
name="Conv_1")(x)
x = Lambda(tf.nn.local_response_normalization, name="Lrn_1")(x)
x = Activation(activation="relu")(x)
x = MaxPool2D(name="Maxpool_1")(x)
x = Conv2D(256,
(5, 5),
kernel_initializer=initializers.RandomNormal(stddev=0.01),
padding="SAME",
name="Conv_2")(x)
x = Lambda(tf.nn.local_response_normalization, name="Lrn_2")(x)
x = Activation(activation="relu")(x)
x = MaxPool2D(name="Maxpool_2")(x)
x = Conv2D(384,
(3, 3),
padding="Same",
kernel_initializer=initializers.RandomNormal(stddev=0.01),
name="Conv_3_1")(x)
x = Conv2D(384,
(3, 3),
padding="Same",
kernel_initializer=initializers.RandomNormal(stddev=0.01),
name="Conv_3_2")(x)
x = Conv2D(256,
(3, 3),
activation="relu",
padding="Same",
kernel_initializer=initializers.RandomNormal(stddev=0.01),
name="Conv_3_3")(x)
x = MaxPool2D(name="Maxpool_3")(x)
x = Flatten(name="Flt_1")(x)
x = Dense(4096,
activation="relu",
kernel_initializer=initializers.RandomNormal(stddev=0.01),
name="fc_1")(x)
x = Dropout(0.5, name="drop_1")(x)
x = Dense(4096,
activation="relu",
kernel_initializer=initializers.RandomNormal(stddev=0.01),
name="fc_2")(x)
x = Dropout(0.5, name="drop_2")(x)
output = keras.layers.Dense(num_classes, activation="softmax", name="Output")(x)
m = keras.Model(inputs, output, name="AlexNet")
m.summary()
return m
if __name__ == '__main__':
# Pre-training
saved_dir = "data"
if os.path.isdir(saved_dir):
shutil.rmtree(saved_dir)
os.makedirs(saved_dir)
else:
os.makedirs(saved_dir)
# Prepare the data set
base_path = r"D:\keras_dataset\Hackathon Blossom\flower_data"
train_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255
)
valid_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
os.path.join(base_path, "train"),
target_size=(224, 224),
batch_size=32,
class_mode="categorical",
save_prefix="train",
save_to_dir="data"
)
valid_generator = valid_datagen.flow_from_directory(
os.path.join(base_path, "valid"),
target_size=(224, 224),
batch_size=32,
class_mode="categorical",
save_prefix="valid",
save_to_dir="data"
)
input_shape = (224, 224, 3)
num_classes = 102
model = AlexNet(input_shape, num_classes)
plot_model(model)
model.compile(
# optimizer=optimizers.SGD(momentum=0.9),
optimizer=optimizers.Adam(1e-3),
loss=losses.categorical_crossentropy,
metrics=["acc"]
)
model.fit(
train_generator,
steps_per_epoch=int(6552 / 128),
epochs=100,
validation_data=valid_generator,
validation_steps=int(818 / 128)
)