1、直接调用optim内置优化(Adam()和SGD())
optimizer = torch.optim.SGD(model.parameters(), lr = 0.001,betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Adam()参数 | 描述 |
---|
parameters() | 待优化参数的iterable |
lr= | 学习率(默认:1e-3) |
betas (Tuple[float, float]) | 用于计算梯度以及梯度平方的运行平均值的系数 |
weight_decay (float) | 权重衰减(L2惩罚) |
eps (float) | 增加数值计算的稳定性 |
SGD()参数 | 描述 |
---|
parameters() | 待优化参数的iterable |
lr= | 学习率(默认:1e-3) |
momentum | 冲量 |
weight_decay (float) | 权重衰减(L2惩罚) |
2、为每层单独设置参数
import torch.nn as nn
import torch
class NetWord(nn.Module):
def __init__(self):
super(NetWord,self).__init__()
self.main_1 = nn.Sequential(
nn.Linear(28 * 28,512),
nn.LeakyReLU(),
)
self.main_2 = nn.Sequential(
nn.Linear(512, 256),
nn.LeakyReLU(),
nn.Linear(256, 1),
nn.Sigmoid()
)
def forward(self,x):
x = x.view(-1,28*28)
x = self.main_1(x)
x = self.main_2(x)
return x
model = NetWord()
optimizer = torch.optim.SGD([
{'params': model.main_1.parameters()},
{'params': model.main_2.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
print(optimizer)
输出:
Parameter Group 0
dampening: 0
differentiable: False
foreach: None
lr: 0.01
maximize: False
momentum: 0.9
nesterov: False
weight_decay: 0
Parameter Group 1
dampening: 0
differentiable: False
foreach: None
lr: 0.001
maximize: False
momentum: 0.9
nesterov: False
weight_decay: 0
)
3、学习率衰减
StepLR、ExponentialLR、MultiStepLR和ReduceLROnPlateau
import torch
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, gamma=0.9)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer,milestones=[20,80],gamma = 0.9)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9 )
torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0.0001, eps=1e-08)
scheduler.step()
ReduceLROnPlateau参数 | 触发条件 |
---|
mode | 'min’模式检测metric是否不再减小,'max’模式检测metric是否不再增大 |
factor | 触发条件后lr*=factor |
patience | 不再减小(或增大)的累计次数 |
verbose | 触发条件后输出 |
threshold | 只关注超过阈值的显著变化 |
threshold_mode | ‘rel’参数:max模式下如果超过best(1+threshold)为显著,min模式下如果低于best(1-threshold)为显著;‘abs’参数:max模式下如果超过best+threshold为显著,min模式下如果低于best-threshold为显著 |
cooldown | 触发条件后,等待一定的epoch,减缓lr下降速度 |
min_lr | 最小的允许lr |
eps | 如果新旧lr之间的差异小于1e-08,则忽略此次更新 |