AlexNet(论文中并没有给网络取名,这名字谁取的?)是Alex Krizhevsky等人创造的网络,在2012年ImageNet分类任务中夺得第一。尽管四年多的时间过去了,各种分析的文章也很多,新的更好的网络层出不穷,但是作为经典,这个模型依然有很多值得探讨的地方。
一、网络结构探讨
Alexnet共8层(有训练参数的层),5个卷积层,3个全连接层,首先使用了收敛速度更快的激活函数ReLU,为防止过拟合使用LRN和dropout。简单的网络结构和丰富技巧的应用使之十分具有学习和实验价值,许多论文长把AlexNet作为实验或比较对象。整体网络结构如下:
| size/stride | output | padding | value |
Input | 227*227*3 |
|
|
|
conv | 11*11/4 | 55*55*96 | VALID |
|
relu |
|
|
|
|
lrn |
|
|
|
|
pool | 3*3/2 | 27*27*96 | VALID |
|
conv | 5*5/1 | 27*27*256 | SAME |
|
relu |
|
|
|
|
lrn |
|
|
|
|
pool | 3*3/2 | 13*13*256 |
|
|
conv | 3*3/1 | 13*13*384 | SAME |
|
relu |
|
|
|
|
conv | 3*3/1 | 13*13*384 | SAME |
|
relu |
|
|
|
|
conv | 3*3/1 | 13*13*256 | SAME |
|
relu |
|
|
|
|
pool | 3*3/2 | 6*6*256 |
|
|
reshape |
|
|
|
|
fc |
| 1*1*4096 |
|
|
relu |
|
|
|
|
dropout |
|
|
| 0.5 |
fc |
| 1*1*4096 |
|
|
relu |
|
|
|
|
dropout |
|
|
| 0.5 |
fc |
| 1*1*1000 |
|
|
这里的结构图没按照传统结构图画法,是按照op运算的顺序来的,列出全部的op方便读者实现网络。
除此之外,AlexNet有三个要注意的地方:
1、caffe的model zoo里有个caffenet,结构基本与Alexnet一样,只是LRN和pooling的顺序换了一下,据考证这是caffenet作者在复现网络时的失误caffe issues ,作者本来是想实现个一样的,不怪他,我把论文仔仔细细看了,说得的确不直观。尽管如此caffenet依然被广泛的使用,为什么呢?从实现上来看,LRN与pooling顺序的调换并没有影响到整体的网络结构;相反,从另一种角度来看,顺序的调换反而节约了计算量,因为Alexnet先LRN计算完后,许多值被后面的pooling丢掉了;在GoogLeNet中就是先池化后LRN。2、在论文中,模型的输入是224*224*3,经过第一层卷积后输出的featuremap是55*55*96=290400经过考证应该是作者写错了;
3、第一层卷积的卷积核大小为11*11,stride=4,按照TensorFlow卷积的算法,padding=SAME时,输出是ceil(224/4)=56,padding=VALID时,输出是ceil((224-11+1)/4)=54,不管怎样都不可能是55,考证之后我比较相信这种说法CS231.
正是由于这些疑惑,“一种莫名其妙的冲动,叫我继续追寻”,寻求答案的过程让我对理论知识和网络结构有了更深的了解,我想这才是学习和科研的感觉。
二、模型实现
查阅了很多资料,很多基于TensorFlow的AlexNet实现并没有完全基于原文,也许是因为上面提到的两点困扰,多多少少都有一些改动;因此,我重新写了一份代码,尽量做到严格遵从论文,并在细节上保持一致。代码如下:
import numpy as np
import tensorflow as tf
def inference(images):
""" An reimplementation of AlexNet """
# conv1
with tf.name_scope('conv1') as scope:
kernel = tf.Variable(tf.truncated_normal([11,11,3,96], dtype=tf.float32, stddev=1e-2), name='weights')
conv = tf.nn.conv2d(images, kernel, [1,4,4,1], padding='VALID')
# according to the paper, initialized the biases with 1 may accelerates the learning
biases = tf.Variable(tf.constant(1.0, shape=[96], dtype=tf.float32), name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope)
# lrn1
# in tflearn defaut bias=1.0
lrn1 = tf.nn.local_response_normalization(conv1, depth_radius=5, bias=2, alpha=1e-4, beta=0.75, name='lrn1')
# max_pool
pool1 = tf.nn.max_pool(lrn1, [1,3,3,1], [1,2,2,1], padding='VALID', name='pool1')
# conv2
with tf.name_scope('conv2') as scope:
kernel = tf.Variable(tf.truncated_normal([5,5,96,256], dtype=tf.float32, stddev=1e-2), name='weights')
conv = tf.nn.conv2d(pool1, kernel, [1,1,1,1], padding='SAME')
biases = tf.Variable(tf.constant(1.0, shape=[256], dtype=tf.float32), name='biases')
bias = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(bias, name=scope)
# lrn2
lrn2 = tf.nn.local_response_normalization(conv2, depth_radius=5, bias=2, alpha=1e-4, beta=0.75, name='lrn2')
# max_pool
pool2 = tf.nn.max_pool(lrn2, [1,3,3,1], [1,2,2,1], padding='VALID', name='pool2')
# conv3
with tf.name_scope('conv3') as scope:
kernel = tf.Variable(tf.truncated_normal([3,3,256,384], dtype=tf.float32, stddev=1e-2), name='weights')
conv = tf.nn.conv2d(pool2, kernel, [1,1,1,1], padding='SAME')
biases = tf.Variable(tf.constant(1.0, shape=[384], dtype=tf.float32), name='biases')
bias = tf.nn.bias_add(conv, biases)
conv3 = tf.nn.relu(bias, name=scope)
# conv4
with tf.name_scope('conv4') as scope:
kernel = tf.Variable(tf.truncated_normal([3,3,384,384], dtype=tf.float32, stddev=1e-2), name='weights')
conv = tf.nn.conv2d(conv3, kernel, [1,1,1,1], padding='SAME')
biases = tf.Variable(tf.constant(1.0, shape=[384], dtype=tf.float32), name='biases')
bias = tf.nn.bias_add(conv, biases)
conv4 = tf.nn.relu(bias, name=scope)
# conv5
with tf.name_scope('conv5') as scope:
kernel = tf.Variable(tf.truncated_normal([3,3,384,256], dtype=tf.float32, stddev=1e-2), name='weights')
conv = tf.nn.conv2d(conv4, kernel, [1,1,1,1], padding='SAME')
biases = tf.Variable(tf.constant(1.0, shape=[256], dtype=tf.float32), name='biases')
bias = tf.nn.bias_add(conv, biases)
conv5 = tf.nn.relu(bias, name=scope)
# max_pool
pool3 = tf.nn.max_pool(conv5, [1,3,3,1], [1,2,2,1], padding='VALID', name='pool3')
# fully connected layer
# fc1
with tf.name_scope('fc1') as scope:
# reshape = tf.reshape(pool3, [pool3.get_shape()[0].value,-1])
a = pool3.get_shape().as_list()
dim = np.prod(a[1:])
pool3 = tf.reshape(pool3, [-1,dim])
weight = tf.Variable(tf.truncated_normal([dim,4096], dtype=tf.float32, stddev=1e-2), name='weights')
fc = tf.matmul(pool3, weight)
biases = tf.Variable(tf.constant(1.0, shape=[4096], dtype=tf.float32), name='biases')
bias = fc + biases
fc1 = tf.nn.relu(bias, name=scope)
# dropout
dropout1 = tf.nn.dropout(fc1, 0.5)
# fc2
with tf.name_scope('fc2') as scope:
weight = tf.Variable(tf.truncated_normal([4096,4096], dtype=tf.float32, stddev=1e-2), name='weights')
fc = tf.matmul(dropout1, weight)
biases = tf.Variable(tf.constant(1.0, shape=[4096], dtype=tf.float32), name='biases')
bias = fc + biases
fc2 = tf.nn.relu(bias, name=scope)
# dropout
dropout2 = tf.nn.dropout(fc2, 0.5)
# fc3
# from 4096 to num of classes matmul
with tf.name_scope('fc3') as scope:
weight = tf.Variable(tf.truncated_normal([4096,NUM_CLASSES], dtype=tf.float32, stddev=1e-2), name='weights')
fc = tf.matmul(dropout2, weight)
biases = tf.Variable(tf.constant(1.0, shape=[NUM_CLASSES], dtype=tf.float32), name='biases')
bias = fc + biases
# fc3 = tf.nn.relu(bias, name=scope)
return bias
注:
1、 论文中两支网络是因为GPU显存不够,现在的实现都是合成一个网络;
2、 输入的问题上节已提到,为了保证经过第一个卷积层输出的featuremap大小为55*55*96,这里将输入改为227*227*3。