场景:
跑pytorch版本的Mask RCNN开源代码的train.py的时候报错
代码链接:https://github.com/facebookresearch/maskrcnn-benchmark
问题描述
加载模型的时候出错
例如:报错信息如下:
[2024-01-18 16:00:47,212 INFO train.py line 246 70433] Outputing checkpoints to: ckpt/ade20k-resnet50dilated-ppm_deepsup
Traceback (most recent call last):
File "train.py", line 273, in <module>
main(cfg, gpus)
File "train.py", line 144, in main
net_encoder = ModelBuilder.build_encoder(
File "/mnt/tl/d2l-zh/pytorch/semantic-segmentation-pytorch/mit_semseg/models/models.py", line 88, in build_encoder
orig_resnet = resnet.__dict__['resnet50'](pretrained=pretrained)
File "/mnt/tl/d2l-zh/pytorch/semantic-segmentation-pytorch/mit_semseg/models/resnet.py", line 192, in resnet50
model.load_state_dict(load_url(model_urls['resnet50']), strict=False)
File "/mnt/tl/d2l-zh/pytorch/semantic-segmentation-pytorch/mit_semseg/models/utils.py", line 18, in load_url
return torch.load(cached_file, map_location=map_location)
File "/mnt/tl/anaconda3/envs/d2l-zh/lib/python3.8/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/mnt/tl/anaconda3/envs/d2l-zh/lib/python3.8/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
解决方案:
后面发现是模型加载文件下载出错,自己重新加载之后移进来就好了。
查看下载模型参数的url如下,自己去下浏览器下好
model_urls = {
'resnet18': 'http://sceneparsing.csail.mit.edu/model/pretrained_resnet/resnet18-imagenet.pth',
'resnet50': 'http://sceneparsing.csail.mit.edu/model/pretrained_resnet/resnet50-imagenet.pth',
'resnet101': 'http://sceneparsing.csail.mit.edu/model/pretrained_resnet/resnet101-imagenet.pth'
}
然后拖到代码中的如下指定位置就行了。
def load_url(url, model_dir='./pretrained', map_location=None):
pass # ...省略,下好的模型路径放到'./pretrained'就行了
原因分析:
看报错信息是卡在使用pickle加载模型参数的时候报的错
后续查阅了资料,有说是因为版本pytorch版本不对引起的,之前保存模型参数的是python2.7的版本,如果用python3的版本加载就会出现问题。
但是也有说reload一下就好了,然后试了一下重新下载就可以了。
猜测:
转移自己下的文件的时候发现目录下其实已经有个目标文件了。可能是有些网站需要cookie,所以自动下载的方法只能下载到一个乱七八糟的只有几k大小的文件,模型参数文件不可能这么小。之前使用一些有wget指令的脚本文件的时候就会这样,类似问题需要留意。