A review of gradient descent optimization methods

Suppose we are going to optimize a parameterized function \(J(\theta)\), where \(\theta \in \mathbb{R}^d\), for example, \(\theta\) could be a neural net.

More specifically, we want to \(\mbox{ minimize } J(\theta; \mathcal{D})\) on dataset \(\mathcal{D}\), where each point in \(\mathcal{D}\) is a pair \((x_i, y_i)\).

There are different ways to apply gradient descent.

Let \(\eta\) be the learning rate.

  1. Vanilla batch update
    \(\theta \gets \theta - \eta \nabla J(\theta; \mathcal{D})\)
    Note that \(\nabla J(\theta; \mathcal{D})\) computes the gradient on of the whole dataset \(\mathcal{D}\).
    for i in range(n_epochs): 
        gradient = compute_gradient(J, theta, D)
        theta = theta - eta * gradient
        eta = eta * 0.95

It is obvious that when \(\mathcal{D}\) is too large, this approach is unfeasible.

  1. Stochastic Gradient Descent
    Stochastic Gradient, on the other hand, update the parameters example by example.
    \(\theta \gets \theta - \eta *J(\theta, x_i, y_i)\), where \((x_i, y_i) \in \mathcal{D}\).
    for n in range(n_epochs):
        for x_i, y_i in D: 
            gradient=compute_gradient(J, theta, x_i, y_i)
            theta = theta - eta * gradient 
        eta = eta * 0.95 
  1. Mini-batch Stochastic Gradient Descent
    Update \(\theta\) example by example could lead to high variance, the alternative approach is to update \(\theta\) by mini-batches \(M\) where \(|M| \ll |\mathcal{D}|\).
    for n in range(n_epochs):
        for M in D: 
            gradient = compute_gradient(J, M)
            theta = theta - eta * gradient 
        eta = eta * 0.95

Question? Why decaying the learning rate leads to convergence?
why \(\sum_{i=1}^{\infty} \eta_i = \infty\) and \(\sum_{i=1}^{\infty} \eta_i^2 < \infty\) is the condition for convergence? Based on what assumption of \(J(\theta)\)?

转载于:https://www.cnblogs.com/gaoqichao/p/9153675.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值