dm pieces

stochastic gradient descent

The score (or score function, efficient score) is the gradient of the log-likelihood. If the observation is X and its likelihood is L(θ;X), then the score V can be found through the chain rule:

VV(θ,X)=θlogL(θ;X)=1L(θ;X)L(θ;X)θ
Thus the score V indicates the sensitivity of L(θ;X) (its derivative normalized by its value). Note that V is a function of \theta and the observation X , so that, in general, it is not a statistic. However in certain applications, such as the score test, the score is evaluated at a specific value of θ (such as a null-hypothesis value, or at the maximum likelihood estimate of θ ), in which case the result is a statistic.

gradient methods and more

stochastic gradient method
from gradient method

The gradient descent can be combined with a line search, finding the locally optimal step size \gamma on every iteration. Performing the line search can be time-consuming. Conversely, using a fixed small \gamma can yield poor convergence.

Methods based on Newton’s method and inversion of the Hessian using conjugate gradient techniques can be better alternatives. Generally, such methods converge in fewer iterations, but the cost of each iteration is higher. An example is the BFGS method which consists in calculating on every step a matrix by which the gradient vector is multiplied to go into a “better” direction, combined with a more sophisticated line search algorithm, to find the “best” value of γ . For extremely large problems, where the computer memory issues dominate, a limited-memory method such as L-BFGS should be used instead of BFGS or the steepest descent.

Gradient descent can be viewed as Euler’s method for solving ordinary differential equations x(t)=f(x(t)) of a gradient flow.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值