Caffe框架的理解（二）：详解AlexNet

最新推荐文章于 2019-06-20 22:45:41 发布

持久决心

最新推荐文章于 2019-06-20 22:45:41 发布

阅读量5.5k

点赞数

分类专栏： caffe 文章标签：深度学习 caffe

本文链接：https://blog.csdn.net/u013832707/article/details/56299468

版权

caffe 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

引言

在2012年的时候，Geoffrey和他学生Alex为了回应质疑者，在ImageNet的竞赛中利用AlexNet一举刷新image classification的记录，奠定了deep learning 在计算机视觉中的地位。这里将利用对这一模型的分析学习caffe的结构。

AlexNet的模型结构

模型的文件在根目录的models/bvlc_reference_caffenet/deploy.prototxt，内容如附录一所示。利用draw_net.py可获得模型的结构图像，输入命令：
python python/draw_net.py models/bvlc_reference_caffenet/deploy-gph.prototxt examples/AlexNet-gph/pic/alexnet.png --rankdir=TB --phase=ALL
得到的图像如附录二所示。

逐层分析模型

1.各层的数据结构

在终端中输入如下命令，准备相关环境：

gph@gph-pc:~ $ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import caffe
>>> import cv2
>>> import cv2.cv as cv
>>> caffe.set_mode_gpu()
>>> caffe_root = '/home/gph/Desktop/caffe-ssd/'
>>> model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy-gph.prototxt'
>>> model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'
>>> img_file = caffe_root + 'examples/images/cat.jpg'
>>>

加载模型：

>>> net = caffe.Net(model_def, model_weights, caffe.TEST)

显示所有的层以及对应的data和diff的维度：

输入命令：

for layer, blob in net.blobs.iteritems():
...   print layer + ' ' + str(blob.data.shape) + ' ' + str(blob.diff.shape)
...

输出如下：

data (10, 3, 227, 227) (10, 3, 227, 227)
conv1 (10, 96, 55, 55) (10, 96, 55, 55)
pool1 (10, 96, 27, 27) (10, 96, 27, 27)
norm1 (10, 96, 27, 27) (10, 96, 27, 27)
conv2 (10, 256, 27, 27) (10, 256, 27, 27)
pool2 (10, 256, 13, 13) (10, 256, 13, 13)
norm2 (10, 256, 13, 13) (10, 256, 13, 13)
conv3 (10, 384, 13, 13) (10, 384, 13, 13)
conv4 (10, 384, 13, 13) (10, 384, 13, 13)
conv5 (10, 256, 13, 13) (10, 256, 13, 13)
pool5 (10, 256, 6, 6) (10, 256, 6, 6)
fc6 (10, 4096) (10, 4096)
fc7 (10, 4096) (10, 4096)
fc8 (10, 1000) (10, 1000)
prob (10, 1000) (10, 1000)

显示有权重的层

输入命令：

for layer, param in net.params.iteritems():
    print layer + ' ' + str(param[0].data.shape) + ' ' + str(param[1].data.shape)

输出结果为：

conv1 (96, 3, 11, 11) (96,)
conv2 (256, 48, 5, 5) (256,)
conv3 (384, 256, 3, 3) (384,)
conv4 (384, 192, 3, 3) (384,)
conv5 (256, 192, 3, 3) (256,)
fc6 (4096, 9216) (4096,)
fc7 (4096, 4096) (4096,)
fc8 (1000, 4096) (1000,)

2.分析

在caffe中存在两种数据流动：
一种是需要处理的数据，从输入层输入，被各层一次处理，最后到输出层得到输出。这部分数据存储在net.blobs的data中；同时，blob中diff还保存着对应的梯度值，这是我们比较关心的两种数据。
另一种是各层计算时需要用到的参数，也就是权重weights和偏置bias项，存储在net.params[0]和net.params[1]中。

在AlexNet模型中，拥有卷积处理性质的层都会改变处理数据的大小。比如卷积层和池化层，他们都包含kernel_size这一参数，所以有可能改变数据的大小。是否能够改变数据的大小，要看的与卷积相关的参数有kernel_size,pad,stride。计算公式如下y=（x+2*pad-kernel_size）/stride+1

卷积层所存储的权重的维度由该层的kernel_size，group，num_output以及上一层的num_output决定，其值为（本层的num_output,上一层的num_output/group，kernel_size_w,kernel_size_h）。其原因和卷积是如歌进行的相关

全连接层的权重就简单很多了，因为是全链接，参数就和上一层的神经元个数，输出的神经元个数相关，为（num_output,上一层的神经元个数）

附录一：deploy.prototxt内容

name: "CaffeNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8"
  top: "prob"
}