Coursera | Andrew Ng (01-week-3-3.6)—激活函数

最新推荐文章于 2023-02-16 16:58:50 发布

ZJ_Improve

最新推荐文章于 2023-02-16 16:58:50 发布

阅读量706

点赞数 1

分类专栏：深度学习 | 吴恩达-01.神经网络和深度学习深度学习 | 吴恩达文章标签：吴恩达深度学习网易

本文链接：https://blog.csdn.net/JUNJUN_ZHAO/article/details/78998236

版权

深度学习 | 吴恩达同时被 2 个专栏收录

129 篇文章 19 订阅

订阅专栏

深度学习 | 吴恩达-01.神经网络和深度学习

40 篇文章 2 订阅

订阅专栏

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处：ZJ 微信公众号-「SelfImprovementLab」

知乎：https://zhuanlan.zhihu.com/c_147249273

CSDN： http://blog.csdn.net/junjun_zhao/article/details/78998236

3.6 Activation functions （激活函数）

(字幕来源：网易云课堂)

这里写图片描述

When you build a neural network,one of the choices you get to make is what activation functions use in the hidden layers,as well as at the output unit of your neural network.So far we’ve just been using the sigmoid activation function,but sometimes other choices can work much better,let’s take a look at some of the options,in the forward propagation steps for the neural network,we have these two steps where we use the sigmoid function here,so that sigmoid is called an activation function,and here is the familiar sigmoid function,a equals 1 over 1 plus e to negative z.So in the more general case we can have a different function g of z,which I’m gonna write here,where g could be a nonlinear function that may not be the sigmoid function

要搭建一个神经网络，你可以选择的是，选择隐层里用那一个激活函数，还有神经网络的输出单元用什么激活函数，到目前为止我们一直用的是 $σ$ 激活函数，但有时其他函数效果要好得多，我们看看一些可供选择的函数，在神经网络的正向传播步骤中，我们有这两步用的是 $σ$ 函数，这个 $σ$ 就是所谓的激活函数，这是大家很熟悉的 $σ$ 函数，就是 $a = \frac{1}{(1+e^{-z})}$ ，所以在更一般的情况下我们可以使用不同的函数 $g(z)$ ，我会在这里写出来，其中 $g$ 可以是非线性函数不一定是 $σ$ 函数。

so for example the sigmoid function goes between 0 and 1,an activation function that almost always works better than the sigmoid function is,the tanh function or the hyperbolic tangent function,so this is z, this is a, this is a equals tanh of z,and this goes between plus 1 and minus 1,the formula for the tanh function is e to the z minus e to negative z over their sum,and it’s actually mathematically a shifted version of the sigmoid function,so as a you know sigmoid function just like that,but shift it so that it now crosses a zero zero point and rescale,so it goes from minus one to plus one,and it turns out that for hidden units,if you let the function g(z) be equal to $tanh$ (z).

比如说 $σ$ 函数介于 0 和 1 之间，有个激活函数几乎总比 $σ$ 表现更好，就是 $tanh$ 函数或者叫双曲正切函数，所以这是 $z$ 这是 $a$ 这是 $a=tanh(z)$ ，这函数介于 -1 和 1 之间， $tanh$ 函数的公式是 $\frac{e^z-e^{(-z)}}{e^z+e^{(-z)}}$ 再除以它们之和，数学上这实际上是 $σ$ 函数平移后的版本，所以你知道 $σ$ 函数是这样的，但把它移动一下让它穿过零点然后重新标度，让它介于 -1 和 1 之间，事实证明对于隐藏单元，如果你让函数 $g(z)$ 等于 $tanh(z)$ 。

这里写图片描述

this almost always works better than the sigmoid function,because with values between plus one and minus one,the mean of the activations that come out of your head in there,are closer to having a zero mean,and so just as sometimes when you train a learning algorithm,you might Center the data and have your data have zero mean,using a tanh instead of a sigmoid function kind of has the effect of centering your data,so that the mean of the data is close to the zero,rather than maybe a 0.5,and this actually makes learning for the next layer a little bit easier,we’ll say more about this in the second course,when we talk about optimization algorithms as well,but one takeaway is that I pretty much never use the sigmoid activation function anymore,the tanh function is almost always strictly superior.

这几乎总比 $σ$ 函数效果更好，因为现在函数输出介于 -1 和 1 之间，激活函数的平均值，就更接近 0，就像有时在你训练学习算法时，你可能需要平移所有数据让数据平均值为 0，使用 $tanh$ 而不是 $σ$ 函数也有类似数据中心化的效果，使得数据的平均值接近 0，而不是 0.5，这实际上让下一层的学习更方便一点，我会在第二门课程里详细讨论这点，那时我们也会介绍算法优化，但这里要记住一点我几乎不用 $σ$ 激活函数了， $tanh$ 函数几乎在所有场合都更优越。

the one exception is for the output layer,because if y is either 0 or 1,then it makes sense for y hat to be a number that you want to output just between 0 and 1,rather than between minus 1 and 1,so the one exception where I would use the sigmoid activation function is,when you’re using binary classification,in which case you might use the sigmoid activation function for the output layer,so g(z^[2]) here is equal to Sigma(z^[2]),and so what you see in this example is,where you might have a tanh activation function for the hidden layer,and sigmoid for the output layer,so the activation functions can be different for different layers,and sometimes to denote that the different activation functions are different for different layers,we might use these square brackets super scripts as well,to indicate that g of square bracket 1 may be different than g square bracket 2,to indicate, square bracket 1 superscript refers to this layer,and superscript square bracket 2 refers to the output layer.

一个例外是输出层，因为如果 y 是 0 或 1，那么你希望 y_帽介于 0 到 1 之间更合理，而不是 -1 和 1 之间，我会用 $σ$ 激活函数的一个例外场合是，使用二元分类的时候，在这种情况下你可以用 $σ$ 激活函数作为输出层，所以 $g(z^{[2]})$ 等于 $σ(z^{[2]})$ ，所以在这个例子中你可以，在隐层里用 $tanh$ 激活函数，输出层用 $σ$ 函数，所以不同层的激活函数可以不一样，有时候为了表示不同层的不同激活函数，我们可能会用这些方括号上标，来表示 $g^{[1]}$ 可能和 $g^{[2]}$ 不同，上标方括号 1 表示这一层，上标方括号 2 表示输出层。

now one of the downsides of both the sigmoid function and the tanh function is,that if z is either very large or very small,then the gradient of the derivative or the slope of this function becomes very small,so z is very large or z is very small,the slope of the function you know ends up being close to zero,and so this can slow down gradient descent,so one of the toys that is very popular in machine learning is,what’s called the rectified linear unit,so the ReLU function looks like this,and the formula is a equals max of 0 comma z,so the derivative is 1 so long as z is positive,and derivative or the slope is 0 when z is negative,if you’re implementing this technically,the derivative when z is exactly 0 is not well-defined,but when you implement is in the computer the,odds that you get exactly z equals 0 0 0 0 0 0 0 0 0 0 is very small.

现在 $σ$ 函数和 $tanh$ 函数都有的一个缺点就是，如果 z 非常大或非常小，那么导数的梯度或者说这个函数的斜率可能就很小，所以 z 很大或很小的时候，函数的斜率很接近 0，这样会拖慢梯度下降算法，在机器学习里最受欢迎的一个玩具是，所谓的修正线性单元 $(ReLU)$ ，所以一个ReLU函数长这样，公式就是 $a=max(0,z)$ ，只要 z 为正导数就是 1，当 z 为负时斜率为 0，如果你实际使用这个函数，z 刚好为 0 时导数是没有定义的，但如果你编程实现，那么你得到 z 刚好等于 0 0 0 0 0 0 的概率很低。

so you don’t need to worry about it in practice,you could pretend a derivative when z is equal to 0,you can pretend is either 1 or 0 and you can work just fine,so the fact that is not differentiable the fact that,so here are some rules of thumb for choosing activation functions,if your output is 0 1 value,if you’re I’m using binary classification,then the sigmoid activation function is very natural for the output layer,and then for all other units on ReLU,or the rectified linear unit,is increasingly the default choice of activation function,so if you’re not sure what to use for your hidden layer.

所以实践中不用担心这点，你可以在 z=0 时给导数赋值，你可以赋值成 1 或 0 那样也是可以的，尽管事实上这个函数不可微，在选择激活函数时有一些经验法则，如果你的输出值是 0 和1，如果你在做二元分类，那么 $σ$ 函数很适合作为输出层的激活函数，然后其他所有单元都用ReLU，所谓的修正线性单元，现在已经变成激活函数的默认选择了，如果你不确定隐层应该用哪个。

I would just use the ReLU activation function,that’s what you see most people using these days,although sometimes people also use,the tanh activation function,one disadvantage of the ReLU is that,the derivative is equal to zero when z is negative,in practice this works just fine,but there is another version of the ReLU called the leaky ReLU,we will give you the formula on the next slide,but instead of it being zero when z is negative,it just takes a slight slope like so,so this is called the Leaky ReLU,this usually works better than the ReLU activation function,although it’s just not used as much in practice,either one should be fine although if you had to pick one.

我就用ReLU作为激活函数，这是今天大多数人都在用的，虽然人们有时候也会用， $tanh$ 激活函数，而ReLU的一个缺点是，当z为负时导数等于零，在实践中这没什么问题，但ReLU还有另一个版本叫做带泄漏的ReLU，我们会在下一页给出公式，当z为负时函数不再为 0，它有一个很平缓的斜率，这就是所谓的带泄漏 ReLU，这通常比ReLU激活函数更好，不过实际中使用的频率没那么高，这些选一个就好了如果你一定要选一个。

I usually just use the ReLU,and the advantage of both the ReLU and the leaky ReLU is that,for a lot of the space of z,the derivative of the activation function,the slope of the activation function is very different from zero,and so in practice using the ReLU activation function,your neural network will often learn much faster,than using the tanh or the sigmoid activation function,and the main reason is that,there is less of this effect of the slope of the function going to zero,which slows down learning,and I know that for half of the range of z,the slope of ReLU is zero but in practice,enough of your hidden units will have z greater than zero,so learning can still be quite mask for most training examples.

我通常只用ReLU，ReLU和带泄漏的ReLU好处在于，对于很多 z 空间，激活函数的导数，激活函数的斜率和 0 差很远，所以在实践中使用ReLU激活函数，你的神经网络的学习速度通常会快得多，比使用 $tanh$ 或 $σ$ 激活函数快得多，主要原因在于，ReLU没有这种函数斜率接近 0 时，减慢学习速度的效应，我知道对于 z 的一半范围来说，ReLU的斜率为零但在实践中，有足够多的隐藏单元令 z 大于0，所以对大多数训练样本来说还是很快的。

so let’s just quickly recap,there are pros and cons of different activation functions,here’s the sigmoid activation function.I would say never use this except for the output layer if you are doing binary classification,or maybe almost never use this,and the reason I almost never use this is because,the tanh is pretty much strictly superior,so the tanh activation function is this,and then the default the most commonly used activation function is the ReLU,which is this,so you’re not sure what else to use, use this one,and maybe you know feel free also to try the leaky ReLU,where um might be (0.01z,z) right,so a is the max of 0.01 times z and z,so that gives you this some Bend in the function.

我们快速回顾一下，不同激活函数的利弊，这里是 $σ$ 激活函数，除非用在二元分类的输出层不然绝对不要用，或者几乎从来不会用，我几乎从来没用过原因在于， $tanh$ 几乎在所有场合都更优越，所以 $tanh$ 函数是这样的，还有最常用的默认激活函数是ReLU，就是这个，如果你不确定要用哪个就用这个，或者你想用的话也可以试试带泄漏的ReLU，公式可能是 $max(0.01z,z)$ 对吧，所以 a 是 0.01*z 和 z 两者的最大值，这样函数会这样拐一下。

and you might say you know why is that constant 0.01,well you can also make that another parameter of the learning algorithm,and some people say that works even better,but I hardly see people do that so,but if you feel like trying it in your application,you know please feel free to do so and,and you can just see how it works and how well it works,and stick with it if it gives you good result,so I hope that gives you a sense of,some of the choices of activation functions you can use in your network,one of the themes we’ll see in deep learning is that,you often have a lot of different choices in how you build your neural network,ranging from number of hidden units,to the choice of activation function,to how you initialize the weights which we’ll see later,a lot of choices like that and it turns out that is sometimes difficult,to get good guidelines for exactly what will work best for your problem.

你可能会问为什么那个常数是 0.01?，你也可以把它设成学习函数的另一个参数，有人说这样效果更好，但我很少见到有人这么做所以，如果你想在你的应用里试试，自己喜欢就好，你可以看看效果如何有多好，如果结果很好那么就一直用它，我希望这样你就对，如何在你的网络里选择激活函数有概念，深度学习其中一个特点是，在建立神经网络时经常有很多不同的选择，比如隐藏单元数，激活函数，还有如何初始化权重这个我们接下来会讲，有很多这样的选择有时真的很难，去定下一个准则来确定什么参数最适合你的问题。

so throughout all these three courses.I’ll keep on giving you a sense of what I see in the industry,in terms of what’s more or less popular,but for your application with your applications idiosyncrasy,it’s actually very difficult to know in advance exactly what will work best,so a piece of advice would be,if you’re not sure which one of these activation functions work best,you know try them all and then evaluate on like a holdout validation set,or like a development set which we’ll talk about later,and see which one works better and then go with that,and I think that by testing these different choices for your application,you’d be better at future proofing your neural network architecture,against the the idiosyncrasy in our problem,as well evolutions of the algorithms,rather than you know if I were to tell you always use a ReLUactivation,and don’t use anything else that that just,may or may not apply for whatever problem you end up working on,you know either either in the near future on the distant future.

所以在这三门课程中，我会让你大概了解我在行业里见到的，热门选择或者冷门选择，但是对于你的应用你的应用的特质，事实上很难预先准确地知道什么参数最有效，所以一个建议是，如果你不确定哪种激活函数最有效，你可以先试试在你的保留交叉验证数据集上跑跑，或者在开发集上跑跑，看看哪个参数效果更好就用那个，我想通过在你的应用中测试这些不同的选择，你可以搭建具有前瞻性的神经网络架构，可以对你问题的特质更有针对性，让你的算法迭代更流畅，我这里不会告诉你一定要用ReLU激活函数，而不用其他的..，那对你现在或者未来要处理的问题而言，可能管用也可能不管用。

all right so that was a choice of activation functions,you’ve seen the most popular activation functions,there’s one other question that sometimes is ask which is,why do you even need to use an activation function at all,why not just do away with that,so let’s talk about that in the next video,and where you will see why neural network,do need some sort of nonlinear activation function.

好这就是激活函数的选择，你们看到了最热门的激活函数，还有另一个问题经常有人会问，为什么你需要激活函数呢?，为什么不直接去掉，在下一个视频我们会谈到为什么，神经网络确实需要，某种非线性激活函数。

重点总结：

激活函数的选择

几种不同的激活函数 $g(x)$ ：

sigmoid： $a = \dfrac{1}{1+e^{-z}}$

导数： $a' = a(1-a)$

tanh： $a=\dfrac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$

导数： $a'=1-a^{2}$

ReLU（修正线性单元）： $a=max(0,z)$

Leaky ReLU：a= $max(0.01z,z)$

激活函数的选择：

sigmoid 函数和 tanh 函数比较：

隐藏层：tanh 函数的表现要好于 sigmoid 函数，因为 tanh 取值范围为[−1,+1]，输出分布在 0 值的附近，均值为 0，从隐藏层到输出层数据起到了归一化（均值为 0）的效果。

输出层：对于二分类任务的输出取值为 {0,1}，故一般会选择sigmoid函数。
然而sigmoid和 $tanh$ 函数在当 $|z|$ 很大的时候，梯度会很小，在依据梯度的算法中，更新在后期会变得很慢。在实际应用中，要使 $|z|$ 尽可能的落在 0 值附近。

ReLU弥补了前两者的缺陷，当 z>0 时，梯度始终为 1，从而提高神经网络基于梯度算法的运算速度。然而当 z<0 时，梯度一直为0，但是实际的运用中，该缺陷的影响不是很大。

$Leaky ReLU$ 保证在 z<0 的时候，梯度仍然不为 0。

在选择激活函数的时候，如果在不知道该选什么的时候就选择ReLU，当然也没有固定答案，要依据实际问题在交叉验证集合中进行验证分析。

参考文献：

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记（1-3）– 浅层神经网络

PS: 欢迎扫码关注公众号：「SelfImprovementLab」！专注「深度学习」，「机器学习」，「人工智能」。以及「早起」，「阅读」，「运动」，「英语」「其他」不定期建群打卡互助活动。

确定要放弃本次机会？
福利倒计时
: :

立减 ¥
普通VIP年卡可用
立即使用

ZJ_Improve

关注关注

1
点赞

踩

0

收藏

觉得还不错? 一键收藏

0
评论

Coursera | Andrew Ng (01-week-3-3.6)—激活函数

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ Coursera 课程 |deeplearning.ai |网易云课堂
复制链接

扫一扫

专栏目录

神经网络与深度学习(Quiz 3)

nichengwuxiao的博客

12-05 1596

神经网络与深度学习(Quiz 3)

Neural Networks and Deep Learning week3 Shallow Neural Networks

QQ1845517170的博客

11-24 132

看别人见解违法coursera荣誉，看懂和做对是两码事 Which of the following are true? (Check all that apply.) denotes activation vector of thelayer on thetraining example. denotes the activation vector of thelayer for thetraining example. Xis a matrix in which each c...

参与评论您还未登录，请先登录后发表或查看评论

【深度学习】常用的激活函数

小马日记

11-07 3399

一、什么是激活函数 神经网络中的每个神经元节点接受上一层神经元的输出值作为本神经元的输入值，并将输入值传递给下一层，输入层神经元节点会将输入属性值直接传递给下一层（隐层或输出层）。在多层神经网络中，上层节点的输出和下层节点的输入之间具有一个函数关系，这个函数称为激活函数（又称激励函数） 1、Sigmoid函数（1）数学公式 f(x)=11+ex\displaystyle f( x) =\frac...

【深度学习】1.3浅层神经网络

糊涂懿的博客

09-26 667

神经网络表示计算神经网络的输出多个例子中的向量化向量化实现的解释 激活函数 为什么需要非线性激活函数 激活函数的导数神经网络的梯度下降法直观理解反向传播随机初始化如何改变文本的样式强调文本强调文本加粗文本加粗文本标记文本删除文本引用文本 H2O is是液体。 210 运算结果是 1024. 插入居中的图片: 居中并且带尺寸的图片: 如何插入一段漂亮的代码片 // An highlighted block var foo = 'bar'; 创建一个表格一个简单的表格是这

数论模板总结 -- 持续更新

EricJeffrey的博客

08-21 175

一些常用的简单数论模板以及书中的定理组合数取模 1：N M < 1000，杨辉三角双循环，边加边取模（代码未添加取模） 1 c[1][1] = c[1][0] = 1; 2 for(int i = 2; i <= 50; i++){ 3 c[i][0] = 1; 4 for(int j = 1; j <= i; j++){ 5...

【深度学习】激活函数 (Activation Functions)

JNing

01-30 8937

【深度学习】激活函数 (Activation Functions)

Coursera-ML-AndrewNg-master.zip

07-14

《Coursera-ML-AndrewNg-master.zip》这个压缩包文件包含了由吴恩达(Andrew Ng)教授在Coursera平台上的机器学习课程的核心内容。这门课程是人工智能（AI）领域的重要基石，特别是机器学习（ML）部分，涵盖了广泛的...

Coursera-Ng-Deep-Learning-Specialization:笔记本快速搜索

05-09

熟悉Python和Numpy 使用iPython笔记本能够在多个训练示例中实现矢量化第三周：了解隐藏的单位和隐藏的图层能够在神经网络中应用各种激活函数。使用隐藏层构建您的第一个正向和反向传播将随机初

Coursera-ML-AndrewNg-Notes-master.rar

09-08

《Coursera-ML-AndrewNg-Notes-master》是一份源自GitHub的开源学习资源，它主要涵盖了斯坦福大学教授Andrew Ng在Coursera平台上开设的机器学习课程的笔记。这份资料是学习者们共享和交流的产物，旨在帮助学员更好地...

Coursera-ML-AndrewNg-Notes markdown.rar

06-27

Coursera-ML-AndrewNg-Notes markdown.rar 总结的很不错，电脑需要安装看markdown的工具，配合Coursera课程，深度学习

Coursera-ML-AndrewNg-Notes-master.zip

05-31

"Coursera-ML-AndrewNg-Notes-master.zip"这个压缩包文件，正是对这门课程的深度学习笔记与Python实现的集合，旨在帮助学习者更高效地理解和掌握机器学习的核心概念与算法。首先，我们要明白机器学习的基本定义：...

激活函数（Activation Function）

热门推荐

weixin_39910711的博客

03-17 4万+

深度学习领域最常用的10个激活函数，一文详解数学原理及优缺点：https://mp.weixin.qq.com/s/bleTRzA_1X3umR5UXSpuHg 深度学习中几种常见的激活函数理解与总结：https://www.cnblogs.com/XDU-Lakers/p/10557496.html 神经网络梯度消失和梯度爆炸及解决办法：https://blog.csdn.net/program_developer/article/details/80032376 常用激活函...

【深度学习】激活函数

milu_ELK的博客

02-16 542

新课P54介绍了强人工智能概念，P55到P58解读了矩阵乘法在代码中的应用，P59，P60介绍了在Tensflow中实现神经网络的代码及细节，详细的内容可以自行观看，专栏中就不再赘述。

深度学习记录（2） - 激活函数与参数初始化

行仔ovo的博客

03-06 641

文章目录0. 神经元的工作方式1. 激活函数1.1 sigmoid1.2 tanh1.3 relu1.4 leakyrelu1.5 softmax1.6 其他激活函数1.7 如何选择激活函数隐藏层输出层2. 参数初始化2.1 随机初始化2.2 标准初始化2.3 Xavier（Glorot）初始化2.4 He初始化3. 神经网络的构建方式3.1 Sequential构建方式3.2 利用function API构建3.3 通过Model的子类构建4. 损失函数4.1 分类任务中的损失函数多分类任务二分类任务4.

吴恩达神经网络和深度学习-学习笔记-2-激活函数

真理无穷，进一步有进一步的欢喜

07-13 530

sigmoid 除非在二元分类的输出层，否则千万不要用吴老师几乎没用过，因为tanh几乎在所有场合都更优越 tanh 激活函数tanh()的效果几乎总比sigmoid函数好，因为tanh的输出位于[-1，1]之间，激活函数tanh的平均值就更接近0. 你可能需要平移所有数据，让数据平均值为0。使用tanh函数而不是sigmoid函数也有类似数据中心化的作用（使数据的平均值接近0）。而这实际上...

吴恩达深度学习 —— 3.6 激活函数

然后就去远行

11-27 335

要搭建一个神经网络，可以选择的是选择隐层里用哪一个激活函数，还有神经网络的输出单元用什么激活函数。到目前为止，我们一直用的是sigmoid激活函数，但有时候其它函数效果要好得多，我们看看一些可供选择的函数。在神经网络的正向传播步骤中，有两步用了sigmoid函数，就是a=11+e−za = \frac{1}{1+e^{-z}}a=1+e−z1。在更一般的情况下，我们可以使用不同的函数g(z)...

吴恩达深度学习学习笔记-3激活函数activation functions

qq_27560349的博客

06-22 156

1. sigmoid函数现实中不建议使用sigmoid函数。 2. tanh函数（双曲正切函数） tanh函数更像是sigmoid函数平移后的结果。相比于sigmoid函数，tanh总是效果更好。因为tanh函数的值介于-1至1之间，平均值更接近0.使用tanh函数，也有类似与数据中心化的效果。几乎所有的情况下，tanh都比sigmoid函数效果更好，只有一个例外，就是输出层。因为如果y是0或1，那么y_帽介于0到1之间更合理。所以，对于二分类问题，最后的输出层可以用sigmoid函数，隐藏层建议使用

sigmoid函数_温故知新——激活函数及其各自的优缺点

weixin_39945445的博客

11-22 4965

1.什么是激活函数？所谓激活函数（Activation Function），就是在人工神经网络的神经元上运行的函数，负责将神经元的输入映射到输出端。激活函数对于人工神经网络模型去学习、理解非常复杂和非线性的函数来说具有十分重要的作用。它们将非线性特性引入到我们的网络中。如图，在神经元中，输入（inputs ）通过加权，求和后，还被作用在一个函数上，这个函数就是激活函数。2.为什么要用激活函数？如果...

Yolo系列目标检测算法知识点总结

胖胖大海的博客

03-12 7140

下面是YoloV4论文中给出的目标检测算法的整体架构：主要包含以下几个部分：输入层用来处理输入数据，如数据增强 Backbone主干网络用来提取特征 Neck层用来做多尺度特征融合，提升特征的表达能力，如SPP、FPN、BiFPN、PAN等预测输出层用来预测输出结果，输出层又分为密集预测（如RPN，SSD，Yolo）和稀疏预测（如R-CNN系列） Yolov1: 参考：<机器爱学习>YOLO v1深入理解 - 知乎参考：https://zhuanla...

coursera-ml-andrewng-notes-master.zip

最新发布

06-27

Coursera-ml-andrewng-notes-master.zip是一个包含Andrew Ng的机器学习课程笔记和代码的压缩包。这门课程是由斯坦福大学提供的计算机科学和人工智能实验室（CSAIL）的教授Andrew Ng教授开设的，旨在通过深入浅出的...