优化算法 - 梯度下降

最新推荐文章于 2024-03-15 12:12:54 发布

未来影子

最新推荐文章于 2024-03-15 12:12:54 发布

阅读量287

点赞数

分类专栏：深度学习文章标签：算法 python 深度学习

本文链接：https://blog.csdn.net/mynameisgt/article/details/126860805

版权

深度学习专栏收录该内容

71 篇文章 34 订阅

订阅专栏

文章目录

随机梯度下降

随机梯度下降

但是，在前面的章节中，我们一直在训练过程中使用随机梯度下降，但没有解释它为什么起作用。在本节中，我会更详细地说明随机梯度下降（stochastic gradient descent）

%matplotlib inline
import math
import torch
from d2l import torch as d2l

1 - 随机梯度更新

def f(x1,x2): # 目标函数
    return x1 ** 2 + 2 * x2 ** 2

def f_grad(x1,x2): # 目标函数的梯度
    return 2 * x1,4 * x2

def sgd(x1,x2,s1,s2,f_grad):
    g1,g2 = f_grad(x1,x2)
    # 模拟有噪声的梯度
    g1 += torch.normal(0.0,1,(1,))
    g2 += torch.normal(0.0,1,(1,))
    eta_t = eta * lr()
    return (x1 - eta_t * g1,x2 - eta_t * g2,0,0)

def constant_lr():
    return 1

eta = 0.1
lr = constant_lr # 常数学习速度
d2l.show_trace_2d(f,d2l.train_2d(sgd,steps=50,f_grad=f_grad))

epoch 50, x1: 0.022305, x2: 0.014646


C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\numpy\core\shape_base.py:65: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ary = asanyarray(ary)
C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-n5QOhw9L-1663162229199)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209142121675.svg)]

2 - 动态学习率

def exponential_lr():
    # 在函数外部定义，而在内部更新的全局变量
    global t
    t += 1
    return math.exp(-0.1 * t)

t = 1
lr = exponential_lr
d2l.show_trace_2d(f,d2l.train_2d(sgd,steps=1000,f_grad=f_grad))

epoch 1000, x1: -0.866794, x2: 0.028221


C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\numpy\core\shape_base.py:65: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ary = asanyarray(ary)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-piTwrwjV-1663162229200)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209142121677.svg)]

正如预期的那样，参数的⽅差⼤⼤减少。但是，这是以未能收敛到最优解x = (0, 0)为代价的。即使经过1000个迭代步骤，我们仍然离最优解很远。事实上，该算法根本⽆法收敛。另⼀⽅⾯，如果我们使⽤多项式衰减，其中学习率随迭代次数的平⽅根倒数衰减，那么仅在50次迭代之后，收敛就会更好

def polynomial_lr():
    # 在函数外部定义，而在内部更新的全局变量
    global t
    t += 1
    return (1 + 0.1 * t) ** (-0.5)

t = 1
lr = polynomial_lr
d2l.show_trace_2d(f, d2l.train_2d(sgd, steps=50, f_grad=f_grad))

epoch 50, x1: 0.064155, x2: 0.037703


C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\numpy\core\shape_base.py:65: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ary = asanyarray(ary)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qFQmiPqL-1663162229200)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209142121678.svg)]

关于如何设置学习率，还有更多的选择。例如，我们可以从较小的学习率开始，然后使其迅速上涨，再让它降低，尽管这会更慢。我们甚至可以在较小和较大的学习率之间切换。这样的计划各种各样。

现在，让我们专注于可以进行全面理论分析的学习率计划，即凸环境下的学习率。对于一般的非凸问题，很难获得由意义的收敛保证，因为总的来说，最大限度地减少非线性非凸问题是NP困难的