1、什么是优化器
Pytorch优化器:管理并更新模型中可学习参数的值,使得模型输出更接近真实标签;
管理是指优化器管理和修改参数,更新是指优化器的优化策略。优化策略通常采用梯度下降,梯度是一个向量,梯度的方向是使得方向导数最大。
2、optimizer的属性
优化器基本属性:
defaults:优化器超参数;
state:参数的缓存,如momentum参数;
param_groups:管理的参数组;
_step_count:记录更新次数,学习率调整中使用;
class Optimizer(Object):
def __init__(self,defaults):
self.defaults = defaults
self.state = defaultdict(dict)
self.param_groups = [{'params':param_groups}]
3、optimizer的方法
zero_grad():清空所管理参数的梯度;(Pytorch特性:梯度张量不自动清零);
class Optimizer(Object):
def zero_grad(self):
for group in self.param_groups:
for p in group['param']:
if p.grad is not None:
p.grad.detach_()
p.grad.zero_()
step():执行一步更新;
class Optimizer(Object):
def __init__(self.params,defaults):
self.defaults = defaults
self.state = defaultdict(dict)
self.param_groups = []
add_param_group():添加参数组;
class Optimizer(Object):
def add_param_group(self.param_group):
for group in self.param_groups:
param_set_update(set(group['params']))
set_param_groups.append(param_group)
state_dict():获取优化器当前状态信息字典;
load_state_dict():加载状态信息字典;
class Optimizer(Object):
def state_dict(self):
return {'state':packed_state,'param_groups':param_groups}
def load_state_dict(self,state_dict):
4、优化器基本方法的使用
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
import torch
import torch.optim as optim
from toolss.common_tools import set_seed
set_seed(1) # 设置随机种子
weight = torch.randn((2, 2), requires_grad=True)
weight.grad = torch.ones((2, 2))
optimizer = optim.SGD([weight], lr=0.1)
# ----------------------------------- step -----------------------------------
flag = 0
# flag = 1
if flag:
print("weight before step:{}".format(weight.data))
optimizer.step() # 修改lr=1 0.1观察结果
print("weight after step:{}".format(weight.data))
# ----------------------------------- zero_grad -----------------------------------
flag = 0
# flag = 1
if flag:
print("weight before step:{}".format(weight.data))
optimizer.step() # 修改lr=1 0.1观察结果
print("weight after step:{}".format(weight.data))
print("weight in optimizer:{}\nweight in weight:{}\n".format(id(optimizer.param_groups[0]['params'][0]), id(weight))) # 参数的内存地址
print("weight.grad is {}\n".format(weight.grad))
optimizer.zero_grad()
print("after optimizer.zero_grad(), weight.grad is\n{}".format(weight.grad))
# ----------------------------------- add_param_group -----------------------------------
# 增加一组参数、
flag = 0
# flag = 1
if flag:
print("optimizer.param_groups is\n{}".format(optimizer.param_groups))
w2 = torch.randn((3, 3), requires_grad=True)
optimizer.add_param_group({"params": w2, 'lr': 0.0001})
print("optimizer.param_groups is\n{}".format(optimizer.param_groups))
# ----------------------------------- state_dict -----------------------------------
flag = 0
# flag = 1
if flag:
optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
opt_state_dict = optimizer.state_dict()
print("state_dict before step:\n", opt_state_dict)
for i in range(10):
optimizer.step()
print("state_dict after step:\n", optimizer.state_dict())
torch.save(optimizer.state_dict(), os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))
# -----------------------------------load state_dict -----------------------------------
flag = 0
# flag = 1
if flag:
optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
state_dict = torch.load(os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))
print("state_dict before load state:\n", optimizer.state_dict())
optimizer.load_state_dict(state_dict)
print("state_dict after load state:\n", optimizer.state_dict())
5.learning rate学习率
梯度是沿着负梯度方向进行更新的,学习率控制更新的步伐;
6、momentum动量
Momentum(动量,冲量):结合当前梯度与上一次更新信息,用于当前更新;
学习动量法之前,先来学习指数加权平均:
Pytorch中的梯度更新公式为:
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201231113017827.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L20wXzQ1ODY2NzE4,size_16,color_FFFFFF,t_70
7、torch.optim.SGD
相关函数
在pytorch中,所有的tensor有一个requires_grad参数,如果设置为True,则反向传播时,该tensor就会自动求导。tensor的requires_grad的属性默认为False
requires_grad被用于说明当前量是否需要在计算中保留对应的梯度信息