Applied Math and Machine Learning Basics 摘要笔记

最新推荐文章于 2024-09-16 16:34:53 发布

loveszn

最新推荐文章于 2024-09-16 16:34:53 发布

阅读量416

点赞数

分类专栏：数学基础文章标签： math 深度学习

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_34778011/article/details/51280196

版权

数学基础专栏收录该内容

1 篇文章 0 订阅

订阅专栏

deep learning by Bengio part1

Information theory

• Likely events should have low information content, and in the extreme case,

events that are guaranteed to happen should have no information content

whatsoever.

• Less likely events should have higher information content.

• Independent events should have additive information.

In order to satisfy all three of these properties, we define the self-information

of an event x = x to be

I( x) = − log P(x).

the Shannon entropy of a distribution is the

expected amount of information in an event drawn from that distribution

the Kullback-Leibler (KL) divergence:

Because the KL divergence is non-negative and measures the difference

between two distributions, it is often conceptualized as measuring some sort of

distance between these distributions. However, it is not a true distance measure

because it is not symmetric:

rounding error:

Underflow occurs when numbers near zero are rounded to zero

Overflow occurs when numbers with large magnitude are approximated as ∞ or −∞ .

One example of a function that must be stabilized against underflow and

overflow is the softmax function

poor conditioning

gradient-based optimization

Beyond the Gradient: Jacobian and Hessian Matrices

Equivalently, the Hessian is the Jacobian of the gradient.

Most of the functions we encounter in the context of deep learning have a symmetric

Hessian almost everywhere.

The (directional) second derivative tells us how well we can expect a gradient descent step to perform.

We can make a second-order Taylor series approximation

to the function around f(x) the current point x (0) :

The second derivative can be used to determine whether a critical point is a

local maximum, a local minimum, or saddle point

In more than one dimension, it is not necessary to have an eigenvalue

of 0 in order to get a saddle point: it is only necessary to have both positive and negative eigenvalues.

Constrained Optimization

wish to find the maximal or minimal value of f ( x) for values of x in some set S.

The Karush–Kuhn–Tucker (KKT) approach 1 provides a very general solution

to constrained optimization. With the KKT approach, we introduce a new function

called the generalized Lagrangian or generalized Lagrange function.拉格朗日乘数法

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。