[Caffe]:关于ReLU、LeakyReLU 、PReLU layer

本文详细介绍了ReLU激活函数及其变种LeakyReLU和PReLU在深度学习中的应用。包括它们的数学定义、在caffe框架中的实现方式及参数设置。特别讨论了LeakyReLU如何通过引入非零斜率解决梯度消失问题，以及PReLU如何进一步允许斜率成为可学习参数。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

ReLU、LeakyReLU

ReLU作为激活函数被广泛应用于各种深度神经网络中。在这篇博客中，我主要记录一下它和它的变种在caffe中的实现。
先看下来自wikipedia的一张示意图，图中蓝色的线表示的就是ReLU函数。

ReLU激活函数极为 $f(x)=\max(0,x)$ 。而LeakyReLU则是其变体 $f(x)=\max(0,x)+negative\_slope\times\min(0,x)$ ,其中， $negative\_slope$ 是一个小的非零数。

综上，在caffe中，ReLU和LeakyReLU都包含在relu_layer中。
在后向传播过程中，ReLU做如下运算：

\partial E \partial x = ⎧ ⎩ ⎨ ⎪ ⎪ 0 \partial E \partial y i f x \leq 0 i f x > 0

$\frac{\partial E}{\partial x} = \left\{ \begin{array}{lr} 0 & \mathrm{if} \; x \le 0 \\ \frac{\partial E}{\partial y} & \mathrm{if} \; x > 0 \end{array} \right.$
而变体的LeakyReLU则做：

\partial E \partial x = ⎧ ⎩ ⎨ ⎪ ⎪ ν \partial E \partial y \partial E \partial y i f x \leq 0 i f x > 0

$\frac{\partial E}{\partial x} = \left\{ \begin{array}{lr} \nu \frac{\partial E}{\partial y} & \mathrm{if} \; x \le 0 \\ \frac{\partial E}{\partial y} & \mathrm{if} \; x > 0 \end{array} \right.$
接下来，我们来看看ReLU层的参数。

// Message that stores parameters used by ReLULayer
message ReLUParameter {
  // Allow non-zero slope for negative inputs to speed up optimization
  // Described in:
  // Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities
  // improve neural network acoustic models. In ICML Workshop on Deep Learning
  // for Audio, Speech, and Language Processing.
  optional float negative_slope = 1 [default = 0]; //如之前分析的，默认值0即为ReLU，非零则为LeakyReLU
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 2 [default = DEFAULT]; //运算引擎选择，一般选择默认
}

PReLU

PReLU，即Parametric ReLU，是何凯明组提出的一种改进ReLU。它的数学表示为 $y_i = \max(0, x_i) + a_i\times\min(0, x_i)$ ，其中 $a$ 是可学习参数。当 $a$ 为固定的非零较小数时，它等价于LeakyReLU；当它为0时，PReLU等价于ReLU。它的后向传播进行如下计算：

\partial E \partial x i = ⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ a i \partial E \partial y i \partial E \partial y i i f x i \leq 0 i f x i > 0

$\frac{\partial E}{\partial x_i} = \left\{ \begin{array}{lr} a_i \frac{\partial E}{\partial y_i} & \mathrm{if} \; x_i \le 0 \\ \frac{\partial E}{\partial y_i} & \mathrm{if} \; x_i > 0 \end{array} \right.$
参数

a $a$ 的更新公式如下：

\partial E \partial a i = ⎧ ⎩ ⎨ ⎪ ⎪ \sum x i x i \partial E \partial y i 0 i f x i \leq 0 i f x i > 0

$\frac{\partial E}{\partial a_i} = \left\{ \begin{array}{lr} \sum_{x_i} x_i \frac{\partial E}{\partial y_i} & \mathrm{if} \; x_i \le 0 \\ 0 & \mathrm{if} \; x_i > 0 \end{array} \right.$
PReLU的实现不包含在ReLU中，主要是有可学习参数

a <script type="math/tex" id="MathJax-Element-12">a</script>，它的实现包含在prelu_layer中。

message PReLUParameter {
  // Parametric ReLU described in K. He et al, Delving Deep into Rectifiers:
  // Surpassing Human-Level Performance on ImageNet Classification, 2015.

  // Initial value of a_i. Default is a_i=0.25 for all i.
  optional FillerParameter filler = 1;  //默认填充，a_i的初始值为0.25
  // Whether or not slope parameters are shared across channels.
  optional bool channel_shared = 2 [default = false]; //是否通道共享参数，默认为不共享
}