linux试图载入pytorch的预训练模型resnet101时遇到如下报错:
Traceback (most recent call last):
File “train_baseline.py”, line 272, in
cnn = resnet101(pretrained=True).to(device)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torchvision/models/resnet.py”, line 200, in resnet101
model.load_state_dict(model_zoo.load_url(model_urls[‘resnet101’]))
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/utils/model_zoo.py”, line 67, in load_url
return torch.load(cached_file, map_location=map_location)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/serialization.py”, line 368, in load
return _load(f, map_location, pickle_module)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/serialization.py”, line 532, in _load
magic_number = pickle_module.load(f)
_pickle.UnpicklingError: unpickling stack underflow
起因是最初下载resenet101时,提示系统的临时文件夹容量不足:
OSError: [Errno 18] Invalid cross-device link: ‘/tmp/tmpjqtk1ks_’ -> ‘/home/user/.torch/models/resnet101-5d3b4d8f.pth’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “train_baseline.py”, line 272, in
cnn = resnet101(pretrained=True).to(device)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torchvision/models/resnet.py”, line 200, in resnet101
model.load_state_dict(model_zoo.load_url(model_urls[‘resnet101’]))
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/utils/model_zoo.py”, line 66, in load_url
_download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/utils/model_zoo.py”, line 107, in _download_url_to_file
shutil.move(f.name, dst)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 564, in move
copy_function(src, real_dst)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 263, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 122, in copyfile
copyfileobj(fsrc, fdst)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device
解决容量不足的问题的方法是:在较大的硬盘空间里重新创建并定义一个临时文件夹,比如我这里是 /mnt/tmp
文件夹:
export TMPDIR=/mnt/tmp
source ~/.bashrc
但是重新定义完临时文件夹后,重新运行代码还是存在问题,即文章开头提到的:
_pickle.UnpicklingError: unpickling stack underflow
这个问题是由于先前已经缓存了resnet101模型的一部分,但是没有缓存完毕,导致临时文件夹中存在部分不完整的模型,载入失败。解决方法是删除原来临时文件夹中的resnet101模型。
原来的resnet101模型的路径存在以下两种可能:
/home/user/.cache/torch/checkpoints
或者是
/home/user/.torch/models
路径中的user
是你的用户名,请按照你的用户名进行更改。
不同系统的具体路径不同,可以两个都尝试一下。直接用ls
命令可能无法查看到~/.cache
或者~/.torch
这类隐藏文件夹,直接cd进入目录即可。
删除下载了一半的模型,并且记得定义新的临时文件夹,再次运行代码,解决问题。