Rectified Linear Unit (ReLU)

转载 2015年11月18日 15:57:21

ReLUThe Rectified Linear Unit (ReLU) computes the function f(x)=max(0,x), which is simply thresholded at zero.

There are several pros and cons to using the ReLUs:

  1. (Pros) Compared to sigmoid/tanh neurons that involve expensive operations (exponentials, etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero. Meanwhile, ReLUs does not suffer from saturating.
  2. (Pros) It was found to greatly accelerate the convergence of stochastic gradient descent compared to the sigmoid/tanh functions. It is argued that this is due to its linear, non-saturating form.
  3. (Cons) Unfortunately, ReLU units can be fragile during training and can “die”. For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold. For example, you may find that as much as 40% of your network can be “dead” (i.e., neurons that never activate across the entire training dataset) if the learning rate is set too high. With a proper setting of the learning rate this is less frequently an issue.

Leaky ReLU

Leaky ReLU Leaky ReLUs are one attempt to fix the “dying ReLU” problem. Instead of the function being zero when x<0, a leaky ReLU will instead have a small negative slope(of 0.01, or so). That is, the function computes f(x)=ax if x<0 and f(x)=x if x0, where a is a small constant. Some people report success with this form of activation function, but the results are not always consistent.

Parametric ReLU

rectified unit family
The first variant is called parametric rectified linear unit (PReLU). In PReLU, the slopes of negative part are learned from data rather than pre-defined.

Randomized ReLU

In RReLU, the slopes of negative parts are randomized in a given range in the training, and then fixed in the testing. As mentioned in [B. Xu, N. Wang, T. Chen, and M. Li. Empirical Evaluation of Rectified Activations in Convolution Network. In ICML Deep Learning Workshop, 2015.], in a recent Kaggle National Data Science Bowl (NDSB) competition, it is reported that RReLU could reduce overfitting due to its randomized nature. Moreover, suggested by the NDSB competition winner, the random ai in training is sampled from 1/U(3,8) and in test time it is fixed as its expectation, i.e., 2/(l+u)=2/11.

In conclusion, three types of ReLU variants all consistently outperform the original ReLU in these three data sets. And PReLU and RReLU seem better choices.



ReLU函数就不多说了,我们直接看实现部分: #include #include #include "caffe/layers/relu_layer.hpp" namespace caffe...


导语在深度神经网络中,通常使用一种叫修正线性单元(Rectified linear unit,ReLU)作为神经元的激活函数。

【ReLU】Rectified Linear Units, 线性修正单元激活函数

在神经网络中,常用到的激活函数有sigmoid函数、双曲正切函数,而本文要介绍的是另外一种激活函数,Rectified Linear Unit Function(ReLU, 线性激活函数)

我最喜欢的9个 Python深度学习库

本文为数盟原创译文 如果你对深度学习和卷积神经网络感兴趣,但是并不知道从哪里开始,也不知道使用哪种库,那么这里就为你提供了许多帮助。 在这篇文章里,我详细解读了9个我最喜欢的Python深...

ReLu(Rectified Linear Units)激活函数

起源:传统激活函数、脑神经元激活频率研究、稀疏激活性 传统Sigmoid系激活函数 传统神经网络中最常用的两个激活函数,Sigmoid系(Logistic-Sigmoid、Tanh...

零基础学caffe源码 ReLU激活函数

1、如何有效阅读caffe源码     1、caffe源码阅读路线最好是从src/cafffe/proto/caffe.proto开始,了解基本数据结构内存对象和磁盘文件的一一映射关系,中间...
  • 2016-08-03 17:30
  • 4150

ImageNet Classification with Deep Convolutional Neural Networks阅读笔记





论文参考:Deep Sparse Rectifier Neural Networks 网页参考:  ...


优点1:Krizhevsky et al. 发现使用 ReLU 得到的SGD的收敛速度会比 sigmoid/tanh 快很多(如上图右)。有人说这是因为它是linear,而且梯度不会饱和 优点2:相...