Caffe Relu_layer.cpp 学习_caffe如何将conv和relu分开-CSDN博客

本文链接：https://blog.csdn.net/iamzhangzhuping/article/details/50733267

template <typename Dtype>
void ReLULayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  const int count = bottom[0]->count();
  Dtype negative_slope = this->layer_param_.relu_param().negative_slope();
  for (int i = 0; i < count; ++i) {
    top_data[i] = std::max(bottom_data[i], Dtype(0))
        + negative_slope * std::min(bottom_data[i], Dtype(0));
  }
}

Backward_cpu()

后向传播函数：

template <typename Dtype>
void ReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  propagate_down的妙处于此！caffe.proto里面也有一个相同名字的定义
  if (propagate_down[0]) {
    const Dtype* bottom_data = bottom[0]->cpu_data();
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    const int count = bottom[0]->count();
    Dtype negative_slope = this->layer_param_.relu_param().negative_slope();
    for (int i = 0; i < count; ++i) {
      bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)
          + negative_slope * (bottom_data[i] <= 0));
    }
  }
}

关于Backward_cpu()作几点说明：
propagate_down：在caffe.proto里有一段说明：Specifies on which bottoms the backpropagation should be skipped.The size must be either 0 or equal to the number of bottoms. propagate_down是与计算关于bottom的梯度相关的。大家在通常的理解上，求梯度都是相对于参数weights而言的，但是，在caffe里为什么还有求“bottom的导数”一说呢？？？原因在于caffe的实现。公式（1）中， $\delta^{(l)}$ 其实就是损失函数关于当前层的输入（bottom）的偏导数，而这个 propagate_down则是计算这个 $\delta^{(l)}$ 的控制条件，由公式就可以知道，这个 $\delta^{(l)}$ 在caffe的BP实现中是非常重要。

caffe中BP的实现： caffe的模块化非常强，它将W与X的线性求和，激励函数以及pooling都分开了，也就是说caffe里的conv_layer只是一个线性求和运算，并没有激励运算，而且分别对应了caffe里的conv_layer, Relu_layer, 以及pooling_layer。与公式（3）对应，我们以Relu_layer为例来说明一下caffe的BP实现，公式中的 $\delta^{(l+1)}$ 其实与top_diff对应， $\delta^{(l)}$ 其实与bottom_diff对应。而top_diff = top[0]->cpu_diff()，bottom_diff = bottom[0]->mutable_cpu_diff()，又因为Relu_layer并没有weight（卷积核），所以公式（3）化简为 $\delta^{(l)}=\delta^{(l+1)} \cdot f'(z^{(l)})$ ， $\delta^{(l+1)}$ 来自上一层, $f'(z^{(l)})$ 与代码中的for循环语句块对应。而对于conv_layer，公式（3）化简为 $\delta^{(l)}=(W^{(l)})^T\delta^{(l+1)}$ ，可以查看ConvolutionLayer::Backward_cpu()以验证。对应于公式（4），求的是关于weights卷积核的梯度， $a^{(l)}$ 与当前卷积层的bottom相关。总之，基于caffe的高度模块性，其BP实现的梯度有关于bottom的，也有关于weights的。

公式（2）其实是公式（1）中J取EuclideanLoss的特例， $(y-a^{(n_l)})$ 就是top_diff。基于此，可以理解happynear大神所说的：CNN特征 = FP（输入图像）， diff = 原始特征 - CNN特征， loss = lossfunction（diff），新的grad=BP（loss）

数学公式

反向求导公式，来自UFLDL：

$δ (l) = \partial J ( W , b ; x , y ) \partial z ( l ) (1)$ $\delta^{(l)} = \frac{\partial{J(W,b;x,y)}}{\partial{z^{(l)}}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$
$δ (n l) = - (y - a (n l)) \cdot f' (z n l) (2)$ $\delta^{(n_l)} = - (y-a^{(n_l)}) \cdot f'(z^{n_l}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$
$δ (l) = ((W (l)) T δ (l + 1)) \cdot f' (z (l)) (3)$ $\delta^{(l)}=((W^{(l)})^T\delta^{(l+1)}) \cdot f'(z^{(l)}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$
$\nabla W (l) J (W, b; x, y) = δ (l + 1) (a (l)) T (4)$ $\nabla_{W^{(l)}}J(W,b;x,y) = \delta^{(l+1)}(a^{(l)})^{T}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$
$\nabla (b (l)) J (W, b; x, y) = δ (l + 1) (5)$ $\nabla_(b^{(l)})J(W,b;x,y) = \delta^{(l+1)}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)$