Neural Networks and Deep Learning

3.6 Activation Function

这里写图片描述
这里写图片描述

sigmoid: a=11+ez a = 1 1 + e − z
取值在(0,1)之间,除非是二分类的输出层,一般不选用,因为 tanh t a n h 比sigmoid表现要好。

tanh: a=ezezez+ez a = e z − e − z e z + e − z
取值在(-1,1),有数据中心化效果,使得网络的训练更容易,因此表现比sigmoid要好。

z z 很大或者z很小的时候,sigmoid(z)和tanh(z)的导数都接近于0,会拖慢训练速度。
One of the downsides of both the sigmoid s i g m o i d function and the tanh t a n h function is that if z z is either very large or very small then the gradient of the derivative or the slope of this function becomes very small so Z is very large or z is very small the slope of the function you know ends up being close to zero and so this can slow down gradient descent.

Relu: max(0,z) m a x ( 0 , z )
当不知道选什么的时候,选Relu(rectified linear unit)。

Leaky Relu: max(0.01z,z) m a x ( 0.01 z , z )

Relu的优点是,在 z z 的大部分取值空间,其斜率都是远大于1的,使得网络训练速度较快。
The advantage of Relu is that for a lot of the space of z the derivative of the activation function, the slope of the activation function is very different from zero and so in practice using the regular activation function your new network will often learn much faster than using the tanh or the sigmoid activation function. I know that for half of the range of z z the slope of relu is zero but in practice enough of your hidden units will have z greater than zero so learning can still be quite mask for most training examples.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值