《Deep Learning》花书学习笔记4 - Numerical Computation

Numerical Computation

1 Overflow and Underflow

overflow: when numbers with large magnitude are approximated as ∞ or −∞.
underflow: when numbers near zero are rounded to zero

against underflow and overflow: softmax function:
请添加图片描述
use softmax(z) instead of softmax(x)
请添加图片描述
make sure the max argument to exp() is 0, no possibility of overflow.
at least one term in the denominator has a value of 1, no possibility of underflow.

2 Poor Conditioning 病态条件

condition number: how rapidly the function changes with respect to small changes in its input.

3 Gradient-Based Optimization 基于梯度的优化方法

most optimization algorithms are minimization algorithm
objective function / criterion: the function we want to minimize or maximize
cost function / loss function / error function: minimize objective function
gradient descent: reduce f by moving x in small steps with opposite sign of derivative

critical points / stationary points: derivative of f(x) equals to 0
to be divided into three classes: local minimum / local maximum / saddle point

global minimum: a point that obtains the absolute lowest value of f(x)

For functions with multiple inputs:
partial derivatives: measures how f changes as only the variable x increases
gradient: the vector containing all of the partial derivatives
directional derivative in direction u: the slope of the function f in direction u
u is a unit vector
we have to find the direction if which f decreases the fastest:
请添加图片描述
the method of steepest descent / gradient descent请添加图片描述
where epsilon is the learning rate, a positive scalar determining the size of the step
popular approach to choose epsilon:
set it to a small constant / linear search

4 Beyond the gradient: Jacobian and Hessian Matrices

Jacobian matrix: The matrix containing all partial derivatives of function f
Hessian matrix:
请添加图片描述
Equivalently, the Hessian is the Jacobian of the gradient.
Anywhere that the second partial derivatives are continuous, the differentialoperators are commutative,Hi,j = H j,i, so the Hessian matrix is symmetric at such points.

5 Constrained Optimization

  • 8
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值