caffe实现Mish层的

最新推荐文章于 2024-07-18 08:25:55 发布

zhaokai5

最新推荐文章于 2024-07-18 08:25:55 发布

阅读量653

点赞数

文章标签： caffe 深度学习

本文链接：https://blog.csdn.net/zhaokai5/article/details/108263542

版权

利用caffe实现Mish激活层

Mish激活函数介绍
caffe实现Mish激活层
参考资料

Mish激活函数介绍

Mish函数的表达式如下所示：
$M i s h (x) = x * t a n h (s o f t (x))$ $tanh(x)=\frac{e^{x}-e^{-x}} {e^{x}+e^{-x}}$ $soft(x)=ln(1+e^x)$
为简化计算，将 $s o f t (x)$ 带入 $t a n h (x)$ 中去，可得到 $t a n h (s o f t (x))$ 的表达式如下所示：
$tanh(soft(x))=\frac{(1+e^x)^2-1} {(1+e^x)^2+1}=1-\frac{2} {(1+e^x)^2+1}$
所以 $M i s h (x)$ 可用如下表达式计算：
$Mish(x)=x*(1-\frac{2} {(1+e^x)^2+1})=x-\frac{2x} {(1+e^x)^2+1}$
对其求导可得到其导函数表达式如下：
$^\prime(x)=1--\frac{2} {(1+e^x)^2+1}+\frac{4x*e^x*(1+e^x)} {((1+e^x)^2+1)^2}$ $=1-\frac{2} {(1+e^x)^2+1}+4x*\frac{(1+e^x)^2-(1+e^x)} {((1+e^x)^2+1)^2}$

将 $^\prime(x)$ 写成如上形式后，可将 $e^x+1)$ 作为临时变量，从而减少运算。

caffe实现Mish激活层

caffe中的激活层如Relu,PRelu,Sigmoid等激活层继承于NeuronLayer层，NeuronLayer层主要定义了Reshape函数，我们如需实现Mish层，则同样继承于NeuronLayer。
Mish层无可学习参数，但反向传播计算梯度时需要利用到bottom的数据，而激活层常使用In-place操作，从而导致bottom数据被覆盖，故需要将bottom拷贝出来。
其中， $M i s h (x)$ 的前向和反向传播主要代码如下：

template <typename Dtype>
inline Dtype mish(Dtype x) {
  Dtype tmp=exp(x)+1;
  return x-2*x/(tmp*tmp+1);
}
template <typename Dtype>
inline Dtype mish_back(Dtype x) {
  Dtype tmp=exp(x)+1;
  Dtype tmp1=tmp*tmp+1;

  return 1-2/tmp1+4*x*(tmp*tmp-tmp)/(tmp1*tmp1);
}
template <typename Dtype>
void MishLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  const int count = bottom[0]->count();  
  Dtype* backward_buff_data=backward_buff_.mutable_cpu_data();
  if(bottom[0] == top[0]){
    // caffe_set(count,static_cast<Dtype>(0),backward_buff_data);
    caffe_copy(count,bottom_data,backward_buff_data);
  }
  for (int i = 0; i < count; ++i) {
      Dtype x_bottom=bottom_data[i];
      top_data[i]=mish(x_bottom);
      }
  }

template <typename Dtype>
void MishLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {

  const Dtype* top_diff = top[0]->cpu_diff();
  const int count = bottom[0]->count();
  const Dtype* bottom_data = bottom[0]->cpu_data();
  if (top[0] == bottom[0]) {
    bottom_data = backward_buff_.cpu_data();
  }

  Dtype* bottom_diff= bottom[0]->mutable_cpu_diff();
  
  if (this->param_propagate_down_[0]) {
   
    for(int i=0;i<count;i++){
      bottom_diff[i]=mish_back(bottom_data[i])*top_diff[i];
    }
  }
}

所有代码以上传至 https://github.com/zhaokai5/MishLayer_caffe
该版本实现的 $M i s h (x)$ 前向和反向均为原始版本，没有使用快速算法，快速算法后期有时间再优化。

参考资料

Mish: A Self Regularized Non-Monotonic Neural Activation Function

zhaokai5

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
caffe实现Mish层的

利用caffe实现Mish激活层Mish激活函数介绍Mish激活函数介绍Mish函数的表达式如下所示：Mish(x)=x∗tanh(soft(x))Mish(x)=x*tanh(soft(x))Mish(x)=x∗tanh(soft(x))tanh(x)=ex−e−xex+e−xtanh(x)=\frac{e^{x}-e^{-x}} {e^{x}+e^{-x}}tanh(x)=ex+e−xex−e−xsoft(x)=ln(1+ex)soft(x)=ln(1+e^x)soft(x)=ln(1+ex)
复制链接

扫一扫