pytorch.optimizer 优化算法

最新推荐文章于 2023-11-10 18:06:08 发布

东东就是我

最新推荐文章于 2023-11-10 18:06:08 发布

阅读量998

点赞数 1

分类专栏： AI框架文章标签： pytorch 算法深度学习

本文链接：https://blog.csdn.net/qq_33228039/article/details/122089254

版权

AI框架专栏收录该内容

20 篇文章 2 订阅

订阅专栏

https://zhuanlan.zhihu.com/p/346205754
https://blog.csdn.net/google19890102/article/details/69942970
https://zhuanlan.zhihu.com/p/32626442
https://zhuanlan.zhihu.com/p/32230623

1.优化器optimizer

在这里插入图片描述

import torch
import numpy as np
import warnings
warnings.filterwarnings('ignore') #ignore warnings

x = torch.linspace(-np.pi, np.pi, 2000)
y = torch.sin(x)

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(1, 1001):
    y_pred = model(xx)
    loss = loss_fn(y_pred, y)
    if t % 100 == 0:
        print('No.{: 5d}, loss: {:.6f}'.format(t, loss.item()))
    optimizer.zero_grad() # 梯度清零
    loss.backward() # 反向传播计算梯度
    optimizer.step() # 梯度下降法更新参数

1.1step才是更新参数

    def step(self, closure=None):
        """Performs a single optimization step.

        Arguments:
            closure (callable, optional): A closure that reevaluates the model
                and returns the loss.
        """
        loss = None
        if closure is not None:
            with torch.enable_grad():
                loss = closure()

        for group in self.param_groups:
            weight_decay = group['weight_decay']
            momentum = group['momentum']
            dampening = group['dampening']
            nesterov = group['nesterov']

            for p in group['params']:
                if p.grad is None:
                    continue
                d_p = p.grad  #获取参数梯度
                if weight_decay != 0:
                    d_p = d_p.add(p, alpha=weight_decay) 
                if momentum != 0:
                    param_state = self.state[p]
                    if 'momentum_buffer' not in param_state:
                        buf = param_state['momentum_buffer'] = torch.clone(d_p).detach()
                    else:
                        buf = param_state['momentum_buffer']
                        buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
                    if nesterov:
                        d_p = d_p.add(buf, alpha=momentum)
                    else:
                        d_p = buf

                p.add_(d_p, alpha=-group['lr']) #更新参数

        return loss

1.0 常见优化器变量定义

模型参数
$\theta$ ,
目标函数
$J(\theta)$
每一个时刻t（假设是一个batch）的梯度
$g_{t}=\bigtriangledown _{\theta}J(\theta)$
学习率为
$\eta$
根据历史梯度的一阶动量
$m_{t}=\phi (g_{1},g_{2},...g_{t})$
根据历史梯度的二阶动量
$v_{t}=\psi (g_{1},g_{2},...g_{t})$
更新模型参数
$\theta_{t+1}=\theta_{t}-\frac{1}{\sqrt{v_{t}+\epsilon }}m_{t}$ ,
平滑项，防止分母为0

1.1 SGD

$m_{t}=\eta*g_{t}$
$v_{t}=I^2$

SGD 的缺点在于收敛速度慢，可能在鞍点处震荡。并且，如何合理的选择学习率是 SGD 的一大难点。

1.2 Momentum 动量

$m_{t}=\gamma*m_{t-1}+\eta*g_{t}$
也就是按照一定比例的前一次的变化方向和大小，一般比例是0.9

1.3SGD-M

带一阶动量的SGD
$m_{t}=\gamma*m_{t-1}+\eta*g_{t}$
$v_{t}=I^2$

1.4Nesterov Accelerated Gradient

在这里插入图片描述

也就是在求解梯度的时候把参数改成参数经过动量变换后的参数的梯度
$g_{t}=\bigtriangledown _{\theta}J(\theta-\gamma*m_{t-1})$
$m_{t}=\gamma*m_{t-1}+\eta*g_{t}$
$v_{t}=I^2$

1.5Adagrad

在这里插入图片描述
也就是根据参数改变的频率修改相应参数变化的快慢
$v_{t}=\sum g_{t}^{2}$

1.6RMSprop

在这里插入图片描述
$v_{t}=\gamma*v_{t-1}+(1-\gamma)*g_{t}^{2}$

1.7Adam

$m_{t}=\eta*(\beta _{1}*m_{t-1}+(1-\beta _{1})g_{t})$
$v_{t}=\beta _{2}*v_{t-1}+(1-\beta_{2} )*g_{t}^{2}$
其中
$m_0=0,v_{0}=0.$
$m_1=0.1*g_{0},v_{1}=0.1*g^2_{0}.$
所以初始阶段有偏移，偏向0 。所以做一个偏执矫正
$\hat m_t=\frac{m_t}{1-\beta_1^t}$
$\hat v_t=\frac{v_t}{1-\beta_1^t}$

2.学习率调节器

在这里插入图片描述

2.1 CosineAnnealingLR

在这里插入图片描述

from torch.optim import lr_scheduler
from matplotlib import pyplot as plt

from torch.optim import SGD

from torch import nn

class DummyModel(nn.Module):
    def __init__(self, class_num=10):
        super(DummyModel, self).__init__()
        self.base = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(),
        )
        self.gap = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(128, class_num)

    def forward(self, x):
        x = self.base(x)
        x = self.gap(x)
        x = x.view(x.shape[0], -1)
        x = self.fc(x)
        return x

model = DummyModel().cuda()
def create_optimizer():
    return SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)


def plot_lr(scheduler, title='', labels=['base'], nrof_epoch=100):
    lr_li = [[] for _ in range(len(labels))]
    epoch_li = list(range(nrof_epoch))
    for epoch in epoch_li:
        scheduler.step()  # 调用step()方法,计算和更新optimizer管理的参数基于当前epoch的学习率
        lr = scheduler.get_last_lr()  # 获取当前epoch的学习率
        for i in range(len(labels)):
            lr_li[i].append(lr[i])
    for lr, label in zip(lr_li, labels):
        plt.plot(epoch_li, lr, label=label)
    plt.grid()
    plt.xlabel('epoch')
    plt.ylabel('lr')
    plt.title(title)
    plt.legend()
    plt.show()
optimizer = create_optimizer()
scheduler = lr_scheduler.CosineAnnealingLR(optimizer, 50, 1e-5)
plot_lr(scheduler, title='CosineAnnealingLR')

2.2 step（）更新参数到Optimizer

 def step(self, epoch=None):
        # Raise a warning if old pattern is detected
        # https://github.com/pytorch/pytorch/issues/20124
        if self._step_count == 1:
            if not hasattr(self.optimizer.step, "_with_counter"):
                warnings.warn("Seems like `optimizer.step()` has been overridden after learning rate scheduler "
                              "initialization. Please, make sure to call `optimizer.step()` before "
                              "`lr_scheduler.step()`. See more details at "
                              "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

            # Just check if there were two first lr_scheduler.step() calls before optimizer.step()
            elif self.optimizer._step_count < 1:
                warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
                              "In PyTorch 1.1.0 and later, you should call them in the opposite order: "
                              "`optimizer.step()` before `lr_scheduler.step()`.  Failure to do this "
                              "will result in PyTorch skipping the first value of the learning rate schedule. "
                              "See more details at "
                              "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
        self._step_count += 1

        class _enable_get_lr_call:

            def __init__(self, o):
                self.o = o

            def __enter__(self):
                self.o._get_lr_called_within_step = True
                return self

            def __exit__(self, type, value, traceback):
                self.o._get_lr_called_within_step = False

        with _enable_get_lr_call(self):
            if epoch is None:
                self.last_epoch += 1
                values = self.get_lr()  #获取新的学习率，每个学习率函数分别实现不同的代码
            else:
                warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
                self.last_epoch = epoch
                if hasattr(self, "_get_closed_form_lr"):
                    values = self._get_closed_form_lr()
                else:
                    values = self.get_lr()

        for i, data in enumerate(zip(self.optimizer.param_groups, values)):
            param_group, lr = data
            param_group['lr'] = lr #更新optimizer学习率参数
            self.print_lr(self.verbose, i, lr, epoch)

        self._last_lr = [group['lr'] for group in self.optimizer.param_groups]

2.3 get_lr() 真正的实现余弦退火的实现

    def get_lr(self):
        if not self._get_lr_called_within_step:
            warnings.warn("To get the last learning rate computed by the scheduler, "
                          "please use `get_last_lr()`.", UserWarning)

        if self.last_epoch == 0:
            return self.base_lrs
        elif (self.last_epoch - 1 - self.T_max) % (2 * self.T_max) == 0:
            return [group['lr'] + (base_lr - self.eta_min) *
                    (1 - math.cos(math.pi / self.T_max)) / 2
                    for base_lr, group in
                    zip(self.base_lrs, self.optimizer.param_groups)]
        return [(1 + math.cos(math.pi * self.last_epoch / self.T_max)) /
                (1 + math.cos(math.pi * (self.last_epoch - 1) / self.T_max)) *
                (group['lr'] - self.eta_min) + self.eta_min
                for group in self.optimizer.param_groups]

东东就是我

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pytorch.optimizer 优化算法

https://zhuanlan.zhihu.com/p/346205754https://blog.csdn.net/google19890102/article/details/69942970https://zhuanlan.zhihu.com/p/32626442https://zhuanlan.zhihu.com/p/322306231.优化器optimizerimport torchimport numpy as npimport warningswarnings.filter
复制链接

扫一扫