计算机视觉学习之-AlexNet原理及tensorflow实现

最新推荐文章于 2024-09-18 23:38:35 发布

interstellar-ai

最新推荐文章于 2024-09-18 23:38:35 发布

阅读量505

点赞数 1

分类专栏：计算机视觉学习文章标签：计算机视觉学习

本文链接：https://blog.csdn.net/m0_38128647/article/details/80158381

版权

计算机视觉学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Alex在2012年提出的alexnet网络结构模型引爆了神经网络的应用热潮，并赢得了2012届图像识别大赛的冠军，使得CNN成为在图像分类上的核心算法模型。

AlexNet有6千万个参数，65万个神经元，由5个卷积层和3个全连接层，一些卷积层后面还跟着一个最大池化层，还有一个最后的1000分类的softmax层。

这里写图片描述

我这里和论文有些不一样，论文是分成两部分训练的，我这里算的时候是合成了一个
注意：卷积和池化计算方法相同都是：(imageSize + 2*paddingSize - filterSize) / strideSize + 1 = newfeatureSize，不同之处在于，卷积时是向下取整，池化是向上取整
第一层卷积层：
1.网络的输入是224*224*3的图片，实际上会经预处理调整为227*227*3的大小
2.kernel大小是11*11*3，一共是96个，论文里是两个48，no padding，stride是4，outputSize = 55*55*96，卷积后跟ReLu
3.归一化处理，归一化运算的尺度为5*5，不改变特征大小
4.最大池化运算的尺度为3*3，stride为2，输出为27*27*96

第二层卷积层：
1.inputSIze = 27*27*96
2.kernel大小是5*5*96，一共256个，padding大小是2，stride是1，outputSize = 27*27*256，卷积后跟ReLu
3.归一化处理，归一化运算的尺度为5*5，不改变特征大小
4.最大池化运算的尺度为3*3，stride为2，输出为13*13*256

第三层卷积层：
1.inputSIze = 13*13*256
2.kernelSize = 3*3*256，一共384个，paddingSize = 1， strideSize = 1，outputSize = 13*13*384，卷积后跟ReLu

第四层卷积层：
1.inputSize = 13*13*384
2.kernelSize = 3*3*384，一共384个，paddingSize = 1， strideSize = 1，outputSize = 13*13*384，卷积后跟ReLu

第五层卷积层：
1.inputSize = 13*13*384
2.kernelSize = 3*3*384，一共256个，paddingSize = 1， strideSize = 1，outputSize = 13*13*256，卷积后跟ReLu
3.最大池化运算最大池化运算的尺度为3*3，no padding，stride为2，输出为6*6*256

第六层全连接层：
1.inputSIze = 6*6*256
2.kernelSize = 6*6*256，一共4096个，no padding， no stride， outputSize = 4096*1，后跟ReLu
3.通过drop运算后输出4096个本层的输出结果值
　　由于第六层的运算过程中，采用的滤波器的尺寸(6*6*256)与待处理的feature map的尺寸(6*6*256)相同，即滤波器中的每个系数只与feature map中的一个像素值相乘；而其它卷积层中，每个滤波器的系数都会与多个feature map中像素值相乘；因此，将第六层称为全连接层。

第七层全连接层：
1.inputSize = 4096*1
2.4096个神经元和第六层的4096个神经元全连接，然后跟ReLu
3.通过drop运算后输出4096个本层的输出结果值

第八层全连接层：
1.inputSize = 4096*1
2.1000个神经元与第七层的4096个神经元全连接，然后跟ReLu
3.然后输出被训练的值

关于训练的一些细节：
　　采用随机梯度下降法进行训练，batch size = 128，动力为0.9、weight decay = 0.0005，我们发现，这少量的权重衰减对于模型学习是重要的。
　　我们用一个均值为0、标准差为0.01的高斯分布初始化了每一层的权重。我们用常数1初始化了第二、第四和第五个卷积层以及全连接隐层的神经元偏差。该初始化通过提供带正输入的ReLU来加速学习的初级阶段。我们在其余层用常数0初始化神经元偏差。
　　对于所有层都使用了相等的学习率，这是在整个训练过程中手动调整的。我们遵循的启发式是，当验证误差率在当前学习率下不再提高时，就将学习率除以10。学习率初始化为0.01，在终止前降低三次。作者训练该网络时大致将这120万张图像的训练集循环了90次，在两个NVIDIA GTX 580 3GB GPU上花了五到六天。

tensorflow实现：
首先定义一个打印每一层参数尺寸的函数：

def print_activations(t):
  print(t.op.name, ' ', t.get_shape().as_list())

然后就是卷积层的代码实现：

def inference(images):
  """Build the AlexNet model.

  Args:
    images: Images Tensor

  Returns:
    pool5: the last Tensor in the convolutional component of AlexNet.
    parameters: a list of Tensors corresponding to the weights and biases of the
        AlexNet model.
  """
  parameters = []
  # conv1
  with tf.name_scope('conv1') as scope:
    kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(bias, name=scope)
    print_activations(conv1)
    parameters += [kernel, biases]

  # lrn1
  with tf.name_scope('lrn1') as scope:
    lrn1 = tf.nn.local_response_normalization(conv1,
                                              alpha=1e-4,
                                              beta=0.75,
                                              depth_radius=2,
                                              bias=2.0)

  # pool1
  pool1 = tf.nn.max_pool(lrn1,
                         ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1],
                         padding='VALID',
                         name='pool1')
  print_activations(pool1)

  # conv2
  with tf.name_scope('conv2') as scope:
    kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
  print_activations(conv2)

  # lrn2
  with tf.name_scope('lrn2') as scope:
    lrn2 = tf.nn.local_response_normalization(conv2,
                                              alpha=1e-4,
                                              beta=0.75,
                                              depth_radius=2,
                                              bias=2.0)

  # pool2
  pool2 = tf.nn.max_pool(lrn2,
                         ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1],
                         padding='VALID',
                         name='pool2')
  print_activations(pool2)

  # conv3
  with tf.name_scope('conv3') as scope:
    kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384],
                                             dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv3 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
    print_activations(conv3)

  # conv4
  with tf.name_scope('conv4') as scope:
    kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
                                             dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv4 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
    print_activations(conv4)

  # conv5
  with tf.name_scope('conv5') as scope:
    kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256],
                                             dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv5 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
    print_activations(conv5)

  # pool5
  pool5 = tf.nn.max_pool(conv5,
                         ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1],
                         padding='VALID',
                         name='pool5')
  print_activations(pool5)

  return pool5, parameters

想看全部代码可以去tensorflow的github社区：

https://github.com/tensorflow/models/blob/master/tutorials/image/alexnet/alexnet_benchmark.py

本文参考文章：

https://blog.csdn.net/zyqdragon/article/details/72353420

interstellar-ai

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录