Coursera | Andrew Ng (01-week-3-3.8)—激活函数的导数

最新推荐文章于 2022-08-10 18:43:01 发布

ZJ_Improve

最新推荐文章于 2022-08-10 18:43:01 发布

阅读量408

点赞数 1

分类专栏：深度学习 | 吴恩达-01.神经网络和深度学习深度学习 | 吴恩达文章标签：吴恩达深度学习 coursera

本文链接：https://blog.csdn.net/junjun_zhao/article/details/79001761

版权

深度学习 | 吴恩达同时被 2 个专栏收录

129 篇文章 19 订阅

订阅专栏

深度学习 | 吴恩达-01.神经网络和深度学习

40 篇文章 2 订阅

订阅专栏

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处：ZJ 微信公众号-「SelfImprovementLab」

知乎：https://zhuanlan.zhihu.com/c_147249273

CSDN：http://blog.csdn.net/junjun_zhao/article/details/79001761

3.8 Derivatives of activation functions (激活函数的导数 )

(字幕来源：网易云课堂)

这里写图片描述

When you implement back-propagation for your neural network,you need to really compute the slope or the derivative of the activation functions,So let’s take a look at our choices of activation functions,and how you can compute the slope of these functions,you can see familiar $sigmoid$ activation function,and so for any given value of z,maybe this value of z this function will have,some slope or some derivative corresponding to,if you draw a little line there you know the height over width,there’s a little triangle here,so if $g(z)$ is the $sigmoid$ function,then the slope of the function is d/dz $g(z)$ ,and so we know from calculus that the this slope of g(x) at z,and if you are familiar with calculus and know how to take derivatives.

当你对你的神经网络使用反向传播的时候，你真的需要计算激活函数的斜率或者导数，我们看看激活函数的选择，以及如何计算这些函数的斜率，你可以看到很熟悉的 $sigmoid$ 激活函数，所以对于任意给定的 z 值，也许这个 z 的函数会有，某个斜率和导数对应于，如果你这里画一条线用高度除以宽度，这里有个小三角形，所以如果 $g(z)$ 是 $sigmoid$ 函数，那么函数的斜率就是 $d/dz$ $g(z)$ ，我们从微积分知道这就是 $g(x$ )在 $z$ 处的斜率，如果你很熟悉微积分知道怎么求导。

if you take the derivative of the $sigmoid$ function,it is possible to show that it is equal to this formula,and again I’m not going to do the calculus steps,but if you’re familiar with calculus be free to pause the video,and try to prove this yourself,and so this is equal to just $g(z)$ times 1 minus $g(z)$ ,so let’s just sanity check that this expression makes sense,first if z is very large, so say z is equal to 10,then $g(z)$ will be close to 1,and so the formula we have on the Left tells us that,d/dz $g(z)$ must be close to $g(z)$ which is equal to,1 times 1 minus 1 which is therefore very close to 0,And this is indeed correct because when z is very large,the slope is close to 0, conversely if z is equal to minus 10,so that’s way out there, then $g(z)$ is close to 0,so the following on the left tells us d/dz $g(z)$ will be close to $g(z)$ ,which is 0 times 1 minus zero,and so this is also very close to 0 which is correct,finally at z is equal to 0 then $g(z)$ is equal to 1/2,that’s a $sigmoid$ function right here.

如果你对 $sigmoid$ 函数求导，那么你可以证明就等于这个公式，再次我不打算具体去算微积分，如果你对微积分很熟可以暂停视频，自己去证明，所以这等于 $g(z)·(1-g(x))$ ，我们看看这个式子是否合理，首先如果z非常大比如说z=10，那么 $g(z)$ 就很接近 1，左边的公式告诉我们， $\dfrac{d}{dz} g(z)$ 必须很接近.. $g(z)$ ，这等于 $1·(1-1)$ 所以很接近0，这实际上是对的因为当z很大的时候，斜率接近 0 相反如果 $z$ 等于负 10，在很远的位置那么 $g(z)$ 很接近 0，所以按左边的公式告诉我们 $\dfrac{d}{dz} g(z)$ 很接近 $g(x)$ ，就是 0·(1-0)，所以这边也很接近 0 所以是正确的，最后在 z=0 处 $g(z)=1/2$ ，这就是 $sigmoid$ 函数。

这里写图片描述

and so the derivative is on equal to 1/2 times 1 minus 1/2,which is equal to 1/4,and that actually is turns out to be the correct value of the derivative,or the slope of this function when z is equal to 0.Finally just to introduce one more piece of notation,sometimes instead of writing this thing,the shorthand for the derivative is g prime of z,so g prime of z in calculus the, the little dash on top is called prime,but so g prime of z is a shorthand for the,in calculus for the derivative of the function of g with respect to the input variable z,um and then in a neural network,we have a equals $g(z)$ right equals this,then this formula also simplifies to a times 1 minus a,so sometimes the implementation you might see something like,g prime of z equals a times 1 minus a,and that just refers to you,know the observation that g prime which is this derivative,is equal to this over here,and the advantage of this formula is that,if you’ve already computed the value for a,then by using this expression,you can very quickly compute the value for the slope for g prime z,alright so that was the $sigmoid$ activation function.

所以导数就等于 1/2·(1-1/2) ，这等于 1/4，可以证明这是正确的导数值，或者 z=0 时正确的函数斜率，最后再介绍一个符号约定，有时我们不这样写，导数可以用 $g'(z)$ 表示，所以在微积分中 $g'(z)$ 上面这一撇叫 prime，所以 $g'(z)$ 就表示，在微积分中表示函数g对输入变量z的导数，然后在神经网络中，我们有 $a=g(z)$ 等于这个，这个公式就化简成 $a·(1-a)$ ，所以有时在实现的时候你可能会见到这种，式子 $g'(z)=a·(1-a)$ ，那就表示，你知道 $g'$ 表示导数，就等于这边的式子，然后这个公式的优点在于，如果你已经计算出a值了，那么用这个式子，就可以很快算出 $g(z)$ 的斜率，好所以这是 $sigmoid$ 激活函数的导数。

这里写图片描述

let’s now look at the tanh activation function,similar to what we had previously,the definition of d/dz $g(z)$ is the slope of $g(z)$ at a particular point of z,and if you look at the formula for the hyperbolic tangent function,and if you know calculus you can take derivatives,and show that this simplifies to this formula,and using the shorthand we have previously,when we call this g prime of z, again,so if you want you can sanity check that,this formula make sense so for example,if z is equal to 10, tanh(z) will be very close to 1,this goes from plus 1 to minus 1,and then g prime of z according to this formula will be about 1 minus 1 squared,so $g'(z)$ closes zero so that was a z is very large,the slope is close to zero,conversely a z is very small, say z is equal to minus 10,then tanh(z) will be close to minus 1,and so g prime of z will be close to 1 minus negative 1 squared,so it’s close to 1 minus 1 which is also close to 0,and finally is equal to 0 then tanh(z) is equal to 0,and then the slope is actually equal to 1,which is which is just the slope point on z is equal to 0.So just to summarize, if a is equal to $g(z)$ ,so if a is equal to this tanh(z),then the derivative g prime of z is equal to 1 minus a squared,so once again if you’ve already computed the value of a,you can use this formula to very quickly compute the derivative as well.

我们来看看tanh激活函数，和之前的讨论类似， $\dfrac{d}{dz} g(z)$ 就是在特定点 $z$ 上 $g(z)$ 的斜率，如果你观察一下双曲正切函数的式子，如果你微积分学得不错你就可以求导，并证明这个式子可以简化成..，我们可以用之前说的写法，我们将这个称为 $g'(z)$ ，如果你想检查一下，这个公式有没有错比如，如果 z 等于 10 那么 $tanh(z)$ 会很接近 1，这是从 -1 到 1 的函数，然后根据这个式子 $g'(z)$ 大概就是 $1-1^2$ ，所以 $g'(z)$ 很接近0 .. 所以当z很大的时候，斜率接近 0，相对来说如果 z 很小比如说 z=-10，那么 $tan(z)$ 就很接近 -1，所以 $g'(z)$ 就很接近 $1-(-1)^2$ ，所以很接近1-1 很接近 0，最后 z=0 处 $tanh(z)$ 就等于 0，然后斜率实际上等于1，在 z=0 处 $tanh()$ 函数斜率为1。所以总结一下如果 a= $g(z)$ ，如果 a 等于这个 $tanh(z)$ ，那么导数 $g'(z)$ 就等于 $1-a²$ ，再次如果你已经算出a的值了，那就可以用这个公式快速计算导数。

这里写图片描述

Finally here’s how you compute the derivatives for the, $ReLU$ and leaky $ReLU$ activation functions,for the value g(x) is equal to max of 0 comma z,so the derivative is equal to you turns out to be 0 if z is less than 0,and 1 if z is greater than 0,and is actually our undefined technically undefined,the z is equal to exactly 0,but um if you’re implementing this in software,it might not be a hundred percent mathematically correct,but the work just fine if z is exactly really zero,if you set the derivative to be equal to 1,or set it to be 0,it kind of doesn’t matter if you’re an experienced optimization technically,g prime then becomes what’s called a sub gradient of the activation function $g(z)$ ,which is why gradient descent still works.

最后我们看看如何计算， $ReLU$ 和带泄漏的 $ReLU$ 激活函数的导数，对于 $ReLU$ $g(x)=max(0,z)$ ，如果 z<0 导数就等于 0，z>0 导数就等于 1，然后在 z 精确等于 0 处，斜率是没有定义的，但如果你在软件中实现这个算法，可能数学上不是百分之百正确，但实际是能行的如果z刚好在 0，你可以令导数为1，或者令导数为 0，这其实无关紧要如果你对优化术语很熟悉， $g’$ 就变成所谓的激活函数 $g(z)$ 的次梯度，这样梯度下降法仍然有效。

这里写图片描述

but you can think of it as that,the chance of z being you know zero point exactly 0 0 0 0 0 is so small,that it almost doesn’t matter what you set the derivative to be equal to,when z is equal to zero.So in practice this is what people implement for the derivative of z,and finally if you are training on your own network with,the leaky $ReLU$ activation function,then $g(z)$ is going to be max of say 0.01 z comma z,and so g prime of z is equal to 0.01 if z is less than 0,and 1, if z is greater than 0,and once again the gradient is technically not defined when z is exactly equal to 0,but some maybe implement a piece of code that sets the derivative,or the sets g prime z either a 0.01 or to 1,either way it doesn’t really matter when z is exactly 0,your code would work just fine.So arm with of these formulas,you should either compute the slopes or the derivatives of your activation functions,now we have this building blocks,you’re ready to see how to implement gradient descent for your neural network,let’s go onto the next video to see that.

但是你可以这么想，z 精确为 0 的概率非常非常小，所以你将 z=0 处的导数设成哪个值，实际无关紧要，所以在实践中人们一般这么定 z 的导数，最后如果你在训练自己的网络时，带泄漏的 $ReLU$ 激活函数的网络，那么 $g(z)$ 就是 $max(0.01z,z)$ ，所以 z 小于 0 时 $g'(z)$ 就等于 0.01，z>0 $g'(z)$ 就等于 1，再次 z 精确为 0 时的梯度技术上是没定义的，但你可以写一段代码去定义这个梯度，在 z=0 处令 $g'(z)$ 为 0.01 或 1，用哪个值其实无关紧要，你的程序也是能工作的，掌握了这些式子，你应该计算这些激活函数的斜率或导数，现在我们有了这个基础工具，你就已经准备好如何在你的神经网络上实现梯度下降算法了，让我们来看下一个视频。