caffe学习（5）视觉层

最新推荐文章于 2019-10-25 13:53:52 发布

Yan_Joy

最新推荐文章于 2019-10-25 13:53:52 发布

阅读量823

点赞数 1

分类专栏： caffe 文章标签： caffe

本文链接：https://blog.csdn.net/Yan_Joy/article/details/53057403

版权

caffe 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

上一篇是数据层，这一篇是视觉层（Vision Layers）。参考官网和网友博客。

Vision Layers
Caffe学习系列(3)：视觉层（Vision Layers)及参数，denny402
Caffe源码解析5：Conv_Layer，楼燚航的blog

视觉层通常将图像作为输入，产生其他图像作为输出。图像输入可以是灰度图（通道C=1），RGB图（通道C=3）。同样图像也具有二维的空间结构，其高度 $h>1$ 宽度 $w>1$ 。大多数视觉层通过对输入区域应用特定操作产生输出的相应区域，这里就有点像传统的数字图像处理的工作了。相比之下其他层常常忽略输入的空间结构，视其为具有 $chw$ 维度的大向量。

卷积层Convolution

层类型：Convolution
CPU实现：./src/caffe/layers/convolution_layer.cpp
CUDA GPU实现： ./src/caffe/layers/convolution_layer.cu
参数 (ConvolutionParameter convolution_param)：
- 必须参数
  - num_output (c_o):卷积核（filter）的个数。
  - kernel_size (or kernel_h and kernel_w): 卷积核大小，非方阵用_h _w。
- 推荐参数
  - weight_filler [default type: ‘constant’ value: 0]：卷积核的初始化，默认为全0，可以用”xavier”算法来进行初始化，也可以设置为”gaussian”。
- 可选参数
  - bias_term [default true]:是否开启偏置项，默认为true, 开启。
  - pad (or pad_h and pad_w) [default 0]: 填零操作，默认为0，不填零。是对原图进行填零，使卷积核在图像边缘能够进行卷积操作，运算后和原图的尺寸相同。扩充的时候是左右、上下对称的，比如卷积核的大小为5*5，那么pad设置为2，则四个边缘都扩充2个像素，即宽度和高度都扩充了4个像素。
  - stride (or stride_h and stride_w) [default 1]: 卷积核的移动步长，默认为1。
  - group (g) [default 1]: 分组，默认为1组。如果大于1，我们限制卷积的连接操作在一个子集内。如果我们根据图像的通道来分组，那么第i个输出分组只能与第i个输入分组进行连接。groups是代表filter 组的个数。引入gruop主要是为了选择性的连接卷基层的输入端和输出端的channels，否则参数会太多。
    
    It was there to implement the grouped convolution in Alex Krizhevsky’s paper: when group=2, the first half of the filters are only connected to the first half of the input channels, and the second half only connected to the second half.
    
    当group=2时，前半部分filter与输入的前半部分通道连接，后半部分filter与后半部分输入通道连接。
输入：n * c_i * h_i * w_i
输出：n * c_o * h_o * w_o, where h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1 and w_o likewise。

示例 (./models/bvlc_reference_caffenet/train_val.prototxt)

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  # learning rate and decay multipliers for the 
  # filters
  param { lr_mult: 1 decay_mult: 1 }
  # learning rate and decay multipliers for the biases
  param { lr_mult: 2 decay_mult: 0 }
  convolution_param {
        num_output: 96     
        # learn 96 filters
        kernel_size: 11    
        # each filter is 11x11
        stride: 4      
        # step 4 pixels between each filter 
        # application
        weight_filler {
            type: "gaussian" 
            # initialize the filters from a Gaussian
            std: 0.01        
            # distribution with stdev 0.01 (default 
            # mean: 0)
        }
        bias_filler {
            type: "constant" 
            # initialize the biases to zero (0)
            value: 0
        }
     }
}

卷积层将输入图像与一组可学习的滤波器进行卷积，每个在输出图像中产生一个特征图。

池化层Pooling

Pooling 层一般在网络中是跟在Conv卷积层之后，做采样操作，其实是为了进一步缩小feature map，同时也能增大神经元的视野。

层类型：Pooling
CPU实现：./src/caffe/layers/pooling_layer.cpp
CUDA GPU实现：./src/caffe/layers/pooling_layer.cu
参数(PoolingParameter pooling_param)：
- 必须
  - kernel_size (or kernel_h and kernel_w)：池化核大小。
- 可选参数
  - pool [default MAX]: 池化方法，默认为MAX，还有 AVE, or STOCHASTIC。
  - pad (or pad_h and pad_w) [default 0]:填零。
  - stride (or stride_h and stride_w) [default 1]:步长。
- 输入：n * c * h_i * w_i
- 输出：n * c * h_o * w_o，h_o and w_o 与卷积层计算方法相同。

示例 ( ./models/bvlc_reference_caffenet/train_val.prototxt)

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
        pool: MAX
        kernel_size: 3 # pool over a 3x3 region
        stride: 2      # step two pixels (in the 
                       # bottom blob) between pooling 
                       # regions
      }
}

局部响应归一化层RNL

局部响应归一化层通过对局部输入区域进行归一化来执行一种“横向抑制”。具体作用感觉和特征缩放有点像，使梯度下降在所有方向上具有相同的曲率。而RNL这种方法的计算相比对每个神经元输入归一化要简单。

层类型：LRN
CPU实现： ./src/caffe/layers/lrn_layer.cpp
CUDA GPU实现：./src/caffe/layers/lrn_layer.cu
参数 (LRNParameter lrn_param)：
- 可选参数
  - local_size [default 5]:需要求和的通道数数目（对于跨通道LRN），或者是方形区域求和的变长（对于通道内LRN）。
  - alpha [default 1]: 比例参数。
  - beta [default 5]: 指数参数。
  - norm_region [default ACROSS_CHANNELS]:ACROSS_CHANNELS表示在相邻的通道间求和归一化，但没有空间延伸，即大小为local_size x 1 x 1；WITHIN_CHANNEL表示在一个通道内部特定的区域内进行求和归一化，其大小为：1 x local_size x local_size。每个输入值被除以 $(1 + (\alpha/n) \sum_i x_i^2)^\beta$ ， n <script type="math/tex" id="MathJax-Element-9">n</script>是每个局部区域的大小。

示例：

layers {
  name: "norm1"
  type: LRN
  bottom: "pool1"
  top: "norm1"
  lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
}