pytorch 恢复保存的优化器状态,继续优化

转载:https://github.com/jwyang/faster-rcnn.pytorch/issues/222

1. 优化器状态保存的是cuda类型Tensor,但是再入时为了节省内存,实用了map to cpu,因此完成载入后需要再转换成cuda类型Tensor

you can re-initialise the weights manually using this

model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)

optimizer.load_state_dict(checkpoint['optimizer_weight'])

# We must convert the resumed state data of optimizer to gpu
"""It is because the previous training was done on gpu, so when saving the optimizer.state_dict, the stored
 states(tensors) are of cuda version. During resuming, when we load the saved optimizer, load_state_dict()
 loads this cuda version to cpu. But in this project, we use map_location to map the state tensors to cpu.
 In the training process, we need cuda version of state tensors, so we have to convert them to gpu."""

for state in optimizer.state.values():
    for k, v in state.items():
        if torch.is_tensor(v):
            state[k] = v.cuda()

Additionally for others who may encounter this problem with the adam optimizer. Use this

        optimizer.load_state_dict(checkpoint['optimizer'])
        
        lr = optimizer.param_groups[0]['lr']
        weight_decay = optimizer.param_groups[0]['weight_decay']
        double_bias = True
        bias_decay = True
        
        params = []
        for key, value in dict(fasterRCNN.named_parameters()).items():
            if value.requires_grad:
                if 'bias' in key:
                    params += [{'params':[value],'lr':lr*(double_bias + 1), \
                            'weight_decay': bias_decay and weight_decay or 0}]
                else:
                    params += [{'params':[value],'lr':lr, 'weight_decay': weight_decay}]
        
        optimizer = torch.optim.Adam(params)

2. 优化器状态恢复后,发现 learning rate的当前值和循环次数不匹配

转载自:https://discuss.pytorch.org/t/how-to-implement-torch-optim-lr-scheduler-cosineannealinglr/28797/10?u=jia_lee

optimizer = optim.SGD(posenet.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4)
checkpoint = torch.load(opt.ckpt_path)  
posenet.load_state_dict(checkpoint['weights'])
optimizer.load_state_dict(checkpoint['optimizer_weight'])
print('Optimizer has been resumed from checkpoint...')


scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2, last_epoch=-1) 

for i in range(start_epoch):
    #  update the learning rate for start_epoch times
    scheduler.step()   


def train(epoch):
    print('\n ############################# Train phase, Epoch: {} #############################'.format(epoch))
    posenet.train()
    train_loss = 0
    scheduler.step()
    print('\nLearning rate at this epoch is: %0.9f' % scheduler.get_lr()[0])  # changes every epoch
    # print('\nLearning rate at this epoch is: ', optimizer.param_groups[0]['lr'], '\n')  # Never changes

    for batch_idx, target_tuple in enumerate(train_loader):
          do sth.....

Why scheduler.get_lr()[0] changes after we do shceduler.step(), but optimizer.param_groups[0]['lr'] never changes in the loop? Am I missing sth? Hope for your help, thank you!

Answer: 

Ah, it behaves normal now… The scheduler.get_lr()[0] and optimizer.param_groups[0]['lr'] output equally. Thank you very much, ptrblck, you have helped me for several times! Best wishes for you. 需要注意,推荐使用外部的函数 adjust_learning_rate (例如这个例子)对优化器内部的learning rate调整,不然使用scheduler.step()这种方式learning rate会被之前的优化器checkpoint覆盖掉。

 

  • 7
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
当使用Adam优化器进行训练时,要保存优化器的参数,可以使用以下代码: ``` state = { 'model': model.state_dict(), 'optimizer': optimizer.state_dict(), 'epoch': epoch } torch.save(state, 'optimizer.pth') ``` 这样可以将模型的参数和优化器的参数保存到一个名为`optimizer.pth`的文件中。这样在以后需要继续训练时,可以加载这个文件并恢复模型和优化器状态。 请注意,上面的代码是基于PyTorch框架的示例。如果使用其他框架,代码可能会有所不同。 参考文献: 引用来源链接<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [Pytorch保存模型用于测试和用于继续训练的区别详解](https://download.csdn.net/download/weixin_38707153/14858616)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [记录之关于tensoflow中使用Adam优化算法导致模型保存时参数的变化的记录](https://blog.csdn.net/qq_41368074/article/details/121531284)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值