deeplearning—book—整理——mlp

最新推荐文章于 2023-11-13 08:24:44 发布

qq924178473

最新推荐文章于 2023-11-13 08:24:44 发布

阅读量387

点赞数

分类专栏：读书笔记深度学习-理论文章标签： deeplearning

本文链接：https://blog.csdn.net/h_jlwg6688/article/details/54602062

版权

读书笔记同时被 2 个专栏收录

30 篇文章 1 订阅

订阅专栏

深度学习-理论

18 篇文章 0 订阅

订阅专栏

1、前馈神经网络与RNN的关系：

--There are no feedback connections in which outputs of the model are fed back into itself. When feedforward neural networksare extended to include feedback connections, they are calledrecurrent neuralnetworks, presented in chapter 10.

2、如何理解当前前馈神经网络的研究方向：

--However, modern neural networkresearch is guided by many mathematical and engineering disciplines, and thegoal of neural networks is not to perfectly model the brain. It is best to think offeedforward networks as function approximation machines that are designed toachieve statistical generalization, occasionally drawing some insights from what weknow about the brain, rather than as models of brain function.

3、贯穿神经网络设计反复出现的主题：

--One recurring theme throughout neural network design is that the gradient ofthe cost function must be large and predictable enough to serve as a good guidefor the learning algorithm. Functions that saturate (become very ﬂat) underminethis objective because they make the gradient become very small. In many casesthis happens because the activation functions used to produce the output of thehidden units or the output units saturate. The negative log-likelihood helps toavoid this problem for many models. Many output units involve anexpfunctionthat can saturate when its argument is very negative. Thelogfunction in thenegative log-likelihood cost function undoes theexpof some output units.

4、代价函数和输出单元的选择之间的关系：

--The choice of cost function is tightly coupled with the choice of output unit. Mostof the time, we simply use the cross-entropy between the data distribution and themodel distribution. The choice of how to represent the output then determinesthe form of the cross-entropy function.

//2017/1/19

1、纠正线性单元及其三个扩展：

--Rectiﬁed linear units use the activation function g(z) = max{0, z}.

--Three generalizations of rectiﬁed linear units are based on using a non-zeroslopeαiwhenzi<0:hi=g(z, α)i=max(0, zi) +αimin(0, zi).Absolute valuerectiﬁcationﬁxesαi=−1 to obtaing(z) =|z|. It is used for object recognitionfrom images (Jarrett et al., 2009), where it makes sense to seek features that areinvariant under a polarity reversal of the input illumination. Other generalizationsof rectiﬁed linear units are more broadly applicable. Aleaky ReLU(Maas et al.,2013) ﬁxesαito a small value like 0.01 while aparametric ReLUorPReLUtreats αias a learnable parameter (He et al., 2015).

2、纠正线性单元的另一个扩展：

--Maxout units(Goodfellow et al., 2013a) generalize rectiﬁed linear unitsfurther. Instead of applying an element-wise functiong(z), maxout units dividezinto groups ofkvalues. Each maxout unit then outputs the maximum element of one of these groups.

--A maxout unit can learn a piecewise linear, convex function with up tokpieces.Maxout units can thus be seen as learning the activation function itself ratherthan just the relationship between units. With large enoughk, a maxout unit canlearn to approximate any convex function with arbitrary ﬁdelity. In particular,a maxout layer with two pieces can learn to implement the same function of theinputxas a traditional layer using the rectiﬁed linear activation function, absolutevalue rectiﬁcation function, or the leaky or parametric ReLU, or can learn toimplement a totally diﬀerent function altogether.

//2017/1/20

1、对back-propagation术语的正确理解：

--The term back-propagation is often misunderstood as meaning the wholelearning algorithm for multi-layer neural networks. Actually, back-propagationrefers only to the method for computing the gradient, while another algorithm,such as stochastic gradient descent, is used to perform learning using this gradient.Furthermore, back-propagation is often misunderstood as being speciﬁc to multi-layer neural networks, but in principle it can compute derivatives of any function.