TensorFlow提供了多种激活函数:
1. sigmoid函数
tf.sigmoid(x, name = None) == tf.nn.sigmoid(x, name = None)
# y = 1 / (1 + exp(-x))
Computes sigmoid of x element-wise.
Specifically, y = 1 / (1 + exp(-x)).
x: A Tensor with type float, double, int32, complex64, int64, or qint32.
name: A name for the operation (optional).
y = 1 / ( 1 + e x p ( − x ) ) y = 1/(1 + exp(-x)) y=1/(1+exp(−x))
- sigmoid函数优缺点:
- 优点:
可以把输入映射到(0, 1)区间,可以用来表示概率(eg:logistic regression)
在物理意义上最为接近生物神经元 - 缺点:
- 梯度消失问题
- 首先明确一点:误差反向传播时,梯度包含了f′(zl)和上一层的误差项(又包含了f′(zl+1):z 为权重加权和)两个乘法因子,反向传播推导
- 由于 sigmoid 的导数f′(zl)区间为(0, 0.25],所以其极易落入饱和区,导致梯度非常小,权重接近不变,无法正常更新
- 误差不断向底层传递的过程中,f′(zl)会呈指数倍增加,而其值域为(0, 0.25],所以梯度越往后传递值越小,最终导致权重无法正常更新
- sigmoid的输出并不是均值为0的,所有输出数据的大于0,会增加梯度的不稳定性
- 当输出接近饱和或剧烈变化时,对输出范围的这种缩减往往会带来一些不利影响
2. tanh函数
tf.tanh(x, name = None) == tf.nn.tanh(x, name = None)
# y = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
Computes hyperbolic tangent of x element-wise. x: A Tensor with type
float, double, int32, complex64, int64, or qint32. name: A name for
the operation (optional).
t a n h ( x ) = s i n h ( x ) / c o s h ( x ) = ( e x p ( x ) − e x p ( − x ) ) / ( e x p ( x ) + e x p ( − x ) ) tanh(x) = sinh(x)/cosh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)) tanh(x)=sinh(x)/cosh(x)=(exp(x)−exp(−x))/(exp(x)+exp(−x))
tanh函数的优缺点:
-
优点:
Tanh outputs are zero-centered,把输入映射到(-1, 1)区间
-
缺点:
虽然 tanh 的导数f′(zl)区间为(0, 1],但仍然会导致梯度消失问题!
3. ReLU函数
tf.nn.relu(features, name=None)
# y = max(features, 0)
Computes rectified linear: max(features, 0).
features: A Tensor. Must be one of the following types: float32, float64, int32, int64,uint8, int16, int8.
name: A name for the operation (optional).
R e L U ( x ) = m a x ( 0 , x ) ReLU(x) = max(0, x) ReLU(x)=max(0,x)
ReLU函数的优缺点:
- 优点:
- 比 sigmoid/tanh 收敛的更快(6x),creating sparse representations with true zeros( more likely to be linearly separable)
- 其导数在其权重和(z) 大于 0 的时候为 1,从而误差可以很好的传播,权重可以正常更新
- 缺点:
- 其导数在其权重和(z) 小于 0 的时候为 0,会导致梯度值为0,从而权重无法正常更新
- 输出具有偏移现象,即输出均值恒大于零
- 当使用了较大的学习速率时,易受到饱和的神经元的影响。
4. ReLU函数的演变形式
- Leaky-ReLU
- ReLU6
tf.nn.relu6(features, name=None)
# y = min(max(features, 0), 6)
Computes Rectified Linear 6: min(max(features, 0), 6).
features(x): A Tensor with type float, double, int32, int64, uint8, int16, or int8.
name: A name for the operation (optional).
r e l u 6 ( x ) = m i n ( m a x ( x , 0 ) , 6 ) relu6(x) = min(max(x, 0), 6) relu6(x)=min(max(x,0),6)
- ELU
tf.nn.elu(features, name=None)
# exp(features) - 1 if < 0, features otherwise
5. softplus函数
tf.nn.softplus(features, name=None)
# y = log(exp(features) + 1)
Computes softplus: log(exp(features) + 1).
features: A Tensor. Must be one of the following types: float32, float64, int32, int64,uint8, int16, int8.
name: A name for the operation (optional).
softplus(x) = log(exp(feature) + 1)
6. softsign函数
tf.nn.softsign(features, name=None)
# y = features / (abs(features) + 1)