pytorch 恢复保存的优化器状态，继续优化

最新推荐文章于 2024-06-08 09:29:07 发布

AI无昵称

最新推荐文章于 2024-06-08 09:29:07 发布

阅读量6.4k

点赞数 7

分类专栏： pytorch

原文链接：https://github.com/jwyang/faster-rcnn.pytorch/issues/222

版权

pytorch 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

转载：https://github.com/jwyang/faster-rcnn.pytorch/issues/222

1. 优化器状态保存的是cuda类型Tensor，但是再入时为了节省内存，实用了map to cpu，因此完成载入后需要再转换成cuda类型Tensor

you can re-initialise the weights manually using this

model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)

optimizer.load_state_dict(checkpoint['optimizer_weight'])

# We must convert the resumed state data of optimizer to gpu
"""It is because the previous training was done on gpu, so when saving the optimizer.state_dict, the stored
 states(tensors) are of cuda version. During resuming, when we load the saved optimizer, load_state_dict()
 loads this cuda version to cpu. But in this project, we use map_location to map the state tensors to cpu.
 In the training process, we need cuda version of state tensors, so we have to convert them to gpu."""

for state in optimizer.state.values():
    for k, v in state.items():
        if torch.is_tensor(v):
            state[k] = v.cuda()

Additionally for others who may encounter this problem with the adam optimizer. Use this

        optimizer.load_state_dict(checkpoint['optimizer'])
        
        lr = optimizer.param_groups[0]['lr']
        weight_decay = optimizer.param_groups[0]['weight_decay']
        double_bias = True
        bias_decay = True
        
        params = []
        for key, value in dict(fasterRCNN.named_parameters()).items():
            if value.requires_grad:
                if 'bias' in key:
                    params += [{'params':[value],'lr':lr*(double_bias + 1), \
                            'weight_decay': bias_decay and weight_decay or 0}]
                else:
                    params += [{'params':[value],'lr':lr, 'weight_decay': weight_decay}]
        
        optimizer = torch.optim.Adam(params)

2. 优化器状态恢复后，发现 learning rate的当前值和循环次数不匹配

转载自：https://discuss.pytorch.org/t/how-to-implement-torch-optim-lr-scheduler-cosineannealinglr/28797/10?u=jia_lee

optimizer = optim.SGD(posenet.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4)
checkpoint = torch.load(opt.ckpt_path)  
posenet.load_state_dict(checkpoint['weights'])
optimizer.load_state_dict(checkpoint['optimizer_weight'])
print('Optimizer has been resumed from checkpoint...')


scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2, last_epoch=-1) 

for i in range(start_epoch):
    #  update the learning rate for start_epoch times
    scheduler.step()   


def train(epoch):
    print('\n ############################# Train phase, Epoch: {} #############################'.format(epoch))
    posenet.train()
    train_loss = 0
    scheduler.step()
    print('\nLearning rate at this epoch is: %0.9f' % scheduler.get_lr()[0])  # changes every epoch
    # print('\nLearning rate at this epoch is: ', optimizer.param_groups[0]['lr'], '\n')  # Never changes

    for batch_idx, target_tuple in enumerate(train_loader):
          do sth.....

Why scheduler.get_lr()[0] changes after we do shceduler.step(), but optimizer.param_groups[0]['lr'] never changes in the loop? Am I missing sth? Hope for your help, thank you!

Answer:

Ah, it behaves normal now… The scheduler.get_lr()[0] and optimizer.param_groups[0]['lr'] output equally. Thank you very much, ptrblck, you have helped me for several times! Best wishes for you. 需要注意，推荐使用外部的函数 adjust_learning_rate (例如这个例子)对优化器内部的learning rate调整，不然使用scheduler.step()这种方式learning rate会被之前的优化器checkpoint覆盖掉。

AI无昵称

关注

7
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
pytorch 恢复保存的优化器状态，继续优化

转载：https://github.com/jwyang/faster-rcnn.pytorch/issues/2221. 优化器状态保存的是cuda类型Tensor，但是再入时为了节省内存，实用了map to cpu，因此完成载入后需要再转换成cuda类型Tensoryou can re-initialise the weights manually using thismodel...
复制链接

扫一扫

专栏目录