dm pieces

最新推荐文章于 2022-08-05 09:47:22 发布

qeatzy

最新推荐文章于 2022-08-05 09:47:22 发布

阅读量290

点赞数

分类专栏： dm

本文链接：https://blog.csdn.net/qeatzy/article/details/51220986

版权

dm 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

stochastic gradient descent

The score (or score function, efficient score) is the gradient of the log-likelihood. If the observation is $X$ and its likelihood is $L(\theta;X)$ , then the score $V$ can be found through the chain rule:

$V \equiv V(\theta, X) = \frac{\partial}{\partial\theta} \log L(\theta;X) = \frac{1}{L(\theta;X)} \frac{\partial L(\theta;X)}{\partial\theta}$
Thus the score $V$ indicates the sensitivity of $L(\theta;X)$ (its derivative normalized by its value). Note that V is a function of \theta and the observation $X$ , so that, in general, it is not a statistic. However in certain applications, such as the score test, the score is evaluated at a specific value of $\theta$ (such as a null-hypothesis value, or at the maximum likelihood estimate of $\theta$ ), in which case the result is a statistic.

gradient methods and more

stochastic gradient method
from gradient method

The gradient descent can be combined with a line search, finding the locally optimal step size \gamma on every iteration. Performing the line search can be time-consuming. Conversely, using a fixed small \gamma can yield poor convergence.

Methods based on Newton’s method and inversion of the Hessian using conjugate gradient techniques can be better alternatives. Generally, such methods converge in fewer iterations, but the cost of each iteration is higher. An example is the BFGS method which consists in calculating on every step a matrix by which the gradient vector is multiplied to go into a “better” direction, combined with a more sophisticated line search algorithm, to find the “best” value of $\gamma$ . For extremely large problems, where the computer memory issues dominate, a limited-memory method such as L-BFGS should be used instead of BFGS or the steepest descent.

Gradient descent can be viewed as Euler’s method for solving ordinary differential equations $x'(t)=-\nabla f(x(t))$ of a gradient flow.