转载:https://github.com/jwyang/faster-rcnn.pytorch/issues/222
1. 优化器状态保存的是cuda类型Tensor,但是再入时为了节省内存,实用了map to cpu,因此完成载入后需要再转换成cuda类型Tensor
you can re-initialise the weights manually using this
model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)
optimizer.load_state_dict(checkpoint['optimizer_weight'])
# We must convert the resumed state data of optimizer to gpu
"""It is because the previous training was done on gpu, so when saving the optimizer.state_dict, the stored
states(tensors) are of cuda version. During resuming, when we load the saved optimizer, load_state_dict()
loads this cuda version to cpu. But in this project, we use map_location to map the state tensors to cpu.
In the training process, we need cuda version of state tensors, so we have to convert them to gpu."""
for state in optimizer.state.values():
for k, v in state.items():
if torch.is_tensor(v):
state[k] = v.cuda()
Additionally for others who may encounter this problem with the adam optimizer. Use this
optimizer.load_state_dict(checkpoint['optimizer'])
lr = optimizer.param_groups[0]['lr']
weight_decay = optimizer.param_groups[0]['weight_decay']
double_bias = True
bias_decay = True
params = []
for key, value in dict(fasterRCNN.named_parameters()).items():
if value.requires_grad:
if 'bias' in key:
params += [{'params':[value],'lr':lr*(double_bias + 1), \
'weight_decay': bias_decay and weight_decay or 0}]
else:
params += [{'params':[value],'lr':lr, 'weight_decay': weight_decay}]
optimizer = torch.optim.Adam(params)
2. 优化器状态恢复后,发现 learning rate的当前值和循环次数不匹配
optimizer = optim.SGD(posenet.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4)
checkpoint = torch.load(opt.ckpt_path)
posenet.load_state_dict(checkpoint['weights'])
optimizer.load_state_dict(checkpoint['optimizer_weight'])
print('Optimizer has been resumed from checkpoint...')
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2, last_epoch=-1)
for i in range(start_epoch):
# update the learning rate for start_epoch times
scheduler.step()
def train(epoch):
print('\n ############################# Train phase, Epoch: {} #############################'.format(epoch))
posenet.train()
train_loss = 0
scheduler.step()
print('\nLearning rate at this epoch is: %0.9f' % scheduler.get_lr()[0]) # changes every epoch
# print('\nLearning rate at this epoch is: ', optimizer.param_groups[0]['lr'], '\n') # Never changes
for batch_idx, target_tuple in enumerate(train_loader):
do sth.....
Why
scheduler.get_lr()[0]
changes after we do shceduler.step(), butoptimizer.param_groups[0]['lr']
never changes in the loop? Am I missing sth? Hope for your help, thank you!Answer:
Ah, it behaves normal now… The
scheduler.get_lr()[0]
andoptimizer.param_groups[0]['lr']
output equally. Thank you very much, ptrblck, you have helped me for several times! Best wishes for you. 需要注意,推荐使用外部的函数 adjust_learning_rate (例如这个例子)对优化器内部的learning rate调整,不然使用scheduler.step()这种方式learning rate会被之前的优化器checkpoint覆盖掉。