Neural Networks and Deep Learning

最新推荐文章于 2022-05-14 16:01:54 发布

布纸所云

最新推荐文章于 2022-05-14 16:01:54 发布

阅读量233

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/XindiOntheWay/article/details/82285657

版权

深度学习专栏收录该内容

22 篇文章 0 订阅

订阅专栏

3.6 Activation Function

这里写图片描述

sigmoid: $a=\frac{1}{1+e^{-z}}$
取值在(0,1)之间，除非是二分类的输出层，一般不选用，因为 $tanh$ 比sigmoid表现要好。

tanh: $a=\frac{e^z-e^{-z}}{e^{z}+e^{-z}}$
取值在(-1,1)，有数据中心化效果，使得网络的训练更容易，因此表现比sigmoid要好。

在 $z$ 很大或者 $z$ 很小的时候，sigmoid(z)和tanh(z)的导数都接近于0，会拖慢训练速度。
One of the downsides of both the $sigmoid$ function and the $tanh$ function is that if $z$ is either very large or very small then the gradient of the derivative or the slope of this function becomes very small so Z is very large or $z$ is very small the slope of the function you know ends up being close to zero and so this can slow down gradient descent.

Relu: $max(0,z)$
当不知道选什么的时候，选Relu（rectified linear unit）。

Leaky Relu: $max(0.01z,z)$

Relu的优点是，在 $z$ 的大部分取值空间，其斜率都是远大于1的，使得网络训练速度较快。
The advantage of Relu is that for a lot of the space of $z$ the derivative of the activation function, the slope of the activation function is very different from zero and so in practice using the regular activation function your new network will often learn much faster than using the tanh or the sigmoid activation function. I know that for half of the range of $z$ the slope of relu is zero but in practice enough of your hidden units will have $z$ greater than zero so learning can still be quite mask for most training examples.

布纸所云

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Neural Networks and Deep Learning

3.6 Activation Function sigmoid: a=11+e−za=11+e−za=\frac{1}{1+e^{-z}} 取值在(0,1)之间，除非是二分类的输出层，一般不选用，因为tanhtanhtanh比sigmoid表现要好。tanh: a=ez−e−zez+e−za=ez−e−zez+e−za=\frac{e^z-e^{-z}}{e^{z}+e^{-z}}...
复制链接

扫一扫