Caffe中Layer参数的初始化

最新推荐文章于 2024-05-30 16:33:48 发布

信大傲飞

最新推荐文章于 2024-05-30 16:33:48 发布

阅读量559

点赞数

分类专栏： caffe 文章标签： caffe

caffe 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

在Caffe的学习或者使用过程中，我们常会使用到prototxt文件。我们经常用prototxt来写整个深度学习模型的架构还有一些训练是需要用到的超参数。而在prototxt关于模型的某些Layer的描述中，我们经常会见到关于weight_filler和bias_filler的相关描述。
如caffe中mnist的prototxt文件中关于conv_layer描述：

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }

还有innerproduct_layer的描述：

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

大家可以看到，这些层都涉及到了层参数初始化的有关设置。而这些都与Caffe中的Filler有关。Fillers是caffe用特定算法随机生成的值来填充网络参数在blob里面的初始值。它只作用在参数的初始化阶段，与gpu无关的操作。下面我们将一一介绍caffe中layer参数初始化的类型及其初始化的方式。
这里我们先上proto上关于FillerParameter的数据结构描述，对应的Caffe源码的头文件为filler.hpp。

message FillerParameter {
  // The filler type.
  optional string type = 1 [default = 'constant'];
  optional float value = 2 [default = 0]; // the value in constant filler
  optional float min = 3 [default = 0]; // the min value in uniform filler
  optional float max = 4 [default = 1]; // the max value in uniform filler
  optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
  optional float std = 6 [default = 1]; // the std value in Gaussian filler
  // The expected number of non-zero output weights for a given input in
  // Gaussian filler -- the default -1 means don't perform sparsification.
  optional int32 sparse = 7 [default = -1];
  // Normalize the filler variance by fan_in, fan_out, or their average.
  // Applies to 'xavier' and 'msra' fillers.
  enum VarianceNorm {
    FAN_IN = 0;
    FAN_OUT = 1;
    AVERAGE = 2;
  }
  optional VarianceNorm variance_norm = 8 [default = FAN_IN];
}

从上面我们可以看到有关的filler参数及其默认值。如默认filler填充类型为constant，value默认值为0，xavier和msra的VarianceNorm类型默认为FAN_IN等等。重点注意下sparse这个参数，它标志着初始化的数据有没有稀疏性。

1.Constant
作用：默认将Blob系数x初始化为0。如果定义了value的值value = a，则x = a。（不支持sparse）

2.Gaussian
作用：默认将Blob系数x初始化为满足mean=0，std=1的高斯分布x∼N(mean,std2)x∼N(mean,std2)。mean和std的值可自定义，支持sparse。

3.Positive_unitball
作用：默认将Blob系数x初始化为满足x∈[0,1]x∈[0,1]，∀i∑jxij=1∀i∑jxij=1.（不支持sparse）

4.Uniform
作用：默认将Blob系数x初始化为满足min=0,max=1的均匀分布。x∼U(min,max)x∼U(min,max)（不支持sparse）

5.Xavier（不适用于inner product layers.）
作用：默认将Blob系数x初始化为满足x∼U(−a,+a)x∼U(−a,+a)的均匀分布，其中 aa = sqrt(3 / n)。（不支持sparse）
假设输入blob的shape为（num, a, b, c）。对于n的取值，下面分三种情况：

FAN_IN: 默认为这种类型。该类型下，n = a * b * c
FAN_OUT: n = num * b * c
AVERAGE: n = ( FAN_IN + FAN_OUT )/2
参考paper [Bengio and Glorot 2010]: Understanding the difficulty of training deep feedforward neural networks.
6.Msra（不适用于inner product layers.）
作用：默认将Blob系数x初始化为满足x∼N(0,σ2)x∼N(0,σ2)的高斯分布，其中σσ =sqrt(2 / n)。和Xavier一样，对于n的取值分为三种类型：

FAN_IN: 默认为这种类型。该类型下，n = a * b * c
FAN_OUT: n = num * b * c
AVERAGE: n = ( FAN_IN + FAN_OUT )/2
参考paper [He, Zhang, Ren and Sun 2015]: Specifically accounts for ReLU nonlinearities.
7.Bilinear
作用：一般用在deconvolution 层做upsampling，例子如下：

layer {
  name: "upsample", type: "Deconvolution"
  bottom: "{{bottom_name}}" top: "{{top_name}}"
  convolution_param {
    kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
    num_output: {{C}} group: {{C}}
    pad: {{ceil((factor - 1) / 2.)}}
    weight_filler: { type: "bilinear" } bias_term: false
  }
  param { lr_mult: 0 decay_mult: 0 }
}