Caffe中权值初始化方法

最新推荐文章于 2020-12-22 13:06:36 发布

faithenXX

最新推荐文章于 2020-12-22 13:06:36 发布

阅读量1.2k

点赞数

分类专栏： caffe

本文链接：https://blog.csdn.net/zyf19930610/article/details/71502169

版权

caffe 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

caffe中权值初始化方式有如下几种：

template <typename Dtype>
Filler<Dtype>* GetFiller(const FillerParameter& param) {
  const std::string& type = param.type();
  if (type == "constant") {
    return new ConstantFiller<Dtype>(param);
  } else if (type == "gaussian") {
    return new GaussianFiller<Dtype>(param);
  } else if (type == "positive_unitball") {
    return new PositiveUnitballFiller<Dtype>(param);
  } else if (type == "uniform") {
    return new UniformFiller<Dtype>(param);
  } else if (type == "xavier") {
    return new XavierFiller<Dtype>(param);
  } else if (type == "msra") {
    return new MSRAFiller<Dtype>(param);
  } else if (type == "bilinear") {
    return new BilinearFiller<Dtype>(param);
  } else {
    CHECK(false) << "Unknown filler name: " << param.type();
  }
  return (Filler<Dtype>*)(NULL);
}

}  // namespace caffe

下面来逐一讲解：

1.1 xavier初始化（归一化初始化)

当激活函数是sigmoid时，使用标准初始化往往性能比较差，收敛较慢且容易陷入局部最优。

“xavier”初始化是一种有效的神经网络舒适化方法，来自于2010年 Xavier Glorot和Yoshua Bengio两人的一篇论文<< Understanding the difficulty of training deep feedforward neural networks >>，配合tanh等函数能够获得比较好的效果。

主要思想是：尽可能保证前向传播和反向传播时每一层的方差尽量相等

推导前提：激活函数是线性的 ReLU和PReLU不满足这一条件

定义参数所在层的输入维度为n,输出维度为m,则参数讲均匀分布在

W \sim U [- 6 n + m - - - - - - \sqrt, 6 n + m - - - - - - \sqrt]

caffe中提供了3种方式：

（1）默认情况，只考虑输入

W \sim U [- 3 n - - \sqrt, 3 n - - \sqrt]

（2）FillerParameter_VarianceNorm_FAN_OUT

W \sim U [- 3 m - - - \sqrt, 3 m - - - \sqrt]

（3）FillerParameter_VarianceNorm_AVERAGE

W \sim U [- 6 n + m - - - - - - \sqrt, 6 n + m - - - - - - \sqrt]

1.2 MSRA初始化

来自于MSRA研究员何恺明2015年论文

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

传统的固定方差的高斯分布初始化，在网络变深时使得模型很难收敛。
改善的方法有用预训练的模型去初始化网络的部分层；xavier也是不错的初始化方法，但是它需要满足激活函数线性的条件。在MSRA初始化中，考虑了ReLU和PReLU。

MSRA初始化的权重分布是一个均值为0，均值为 2n 的高斯分布，初始化满足下式

W \sim G [0, 2 n - - \sqrt]

同样有3种方案：

（1）默认情况下，n是输入层的维度

（2）n取输出层的维度

（3）n取输入和输出层的均值

1.3 其他初始化

（1）“constant”: 常量初始化

（2）“gaussian”: 固定方差高斯分布初始化

（3）“positive_utiball”:每个值为在[0,1]之间，对于每个 i ：

\forall i \sum j x i j = 1

（4）“uniform”: 均匀分布初始化

（5）“bilinear”: 双线性初始化，通常用于反卷积核

faithenXX

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Caffe中权值初始化方法

caffe中权值初始化方式有如下几种：template Filler* GetFiller(const FillerParameter& param) { const std::string& type = param.type(); if (type == "constant") { return new ConstantFiller(param); } else if
复制链接

扫一扫