Caffe中Layer参数的初始化方式

最新推荐文章于 2020-03-06 11:09:38 发布

ShaowenLin

最新推荐文章于 2020-03-06 11:09:38 发布

阅读量1.2w

点赞数 9

文章标签： caffe

本文链接：https://blog.csdn.net/wenlin33/article/details/53378613

版权

作者：ShaowenLin
博客地址：http://blog.csdn.net/wenlin33/article/details/53378613

在Caffe的学习或者使用过程中，我们常会使用到prototxt文件。我们经常用prototxt来写整个深度学习模型的架构还有一些训练是需要用到的超参数。而在prototxt关于模型的某些Layer的描述中，我们经常会见到关于weight_filler和bias_filler的相关描述。
如caffe中mnist的prototxt文件中关于conv_layer描述：

  layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }

还有innerproduct_layer的描述：

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

大家可以看到，这些层都涉及到了层参数初始化的有关设置。而这些都与Caffe中的Filler有关。Fillers是caffe用特定算法随机生成的值来填充网络参数在blob里面的初始值。它只作用在参数的初始化阶段，与gpu无关的操作。下面我们将一一介绍caffe中layer参数初始化的类型及其初始化的方式。
这里我们先上proto上关于FillerParameter的数据结构描述，对应的Caffe源码的头文件为filler.hpp。

message FillerParameter {
  // The filler type.
  optional string type = 1 [default = 'constant'];
  optional float value = 2 [default = 0]; // the value in constant filler
  optional float min = 3 [default = 0]; // the min value in uniform filler
  optional float max = 4 [default = 1]; // the max value in uniform filler
  optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
  optional float std = 6 [default = 1]; // the std value in Gaussian filler
  // The expected number of non-zero output weights for a given input in
  // Gaussian filler -- the default -1 means don't perform sparsification.
  optional int32 sparse = 7 [default = -1];
  // Normalize the filler variance by fan_in, fan_out, or their average.
  // Applies to 'xavier' and 'msra' fillers.
  enum VarianceNorm {
    FAN_IN = 0;
    FAN_OUT = 1;
    AVERAGE = 2;
  }
  optional VarianceNorm variance_norm = 8 [default = FAN_IN];
}

从上面我们可以看到有关的filler参数及其默认值。如默认filler填充类型为constant，value默认值为0，xavier和msra的VarianceNorm类型默认为FAN_IN等等。重点注意下sparse这个参数，它标志着初始化的数据有没有稀疏性。

Constant
作用：默认将Blob系数x初始化为0。如果定义了value的值value = a，则x = a。（不支持sparse）
Gaussian
作用：默认将Blob系数x初始化为满足mean=0，std=1的高斯分布 $x \sim N(mean, std^2)$ 。mean和std的值可自定义，支持sparse。
Positive_unitball
作用：默认将Blob系数x初始化为满足 $x \in [0, 1]$ ， $\forall i \sum_j x_{ij} = 1$ .（不支持sparse）
Uniform
作用：默认将Blob系数x初始化为满足min=0,max=1的均匀分布。 $x\sim U(min, max)$ （不支持sparse）
Xavier（不适用于inner product layers.）
作用：默认将Blob系数x初始化为满足 $x \sim U(-a, +a)$ 的均匀分布，其中 $a$ = sqrt(3 / n)。（不支持sparse）
假设输入blob的shape为（num, a, b, c）。对于n的取值，下面分三种情况：
- FAN_IN: 默认为这种类型。该类型下，n = a * b * c
- FAN_OUT: n = num * b * c
- AVERAGE: n = ( FAN_IN + FAN_OUT )/2
  参考paper [Bengio and Glorot 2010]: Understanding the difficulty of training deep feedforward neural networks.
Msra（不适用于inner product layers.）
作用：默认将Blob系数x初始化为满足 $x \sim N(0, \sigma^2)$ 的高斯分布，其中 $\sigma$ =sqrt(2 / n)。和Xavier一样，对于n的取值分为三种类型：
- FAN_IN: 默认为这种类型。该类型下，n = a * b * c
- FAN_OUT: n = num * b * c
- AVERAGE: n = ( FAN_IN + FAN_OUT )/2
  参考paper [He, Zhang, Ren and Sun 2015]: Specifically accounts for ReLU nonlinearities.
Bilinear
作用：一般用在deconvolution 层做upsampling，例子如下：

layer {
  name: "upsample", type: "Deconvolution"
  bottom: "{{bottom_name}}" top: "{{top_name}}"
  convolution_param {
    kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
    num_output: {{C}} group: {{C}}
    pad: {{ceil((factor - 1) / 2.)}}
    weight_filler: { type: "bilinear" } bias_term: false
  }
  param { lr_mult: 0 decay_mult: 0 }
}

参考源码：github链接