在docker容器里运行pytorch写的代码,报错:“RuntimeError: DataLoader worker (pid 83709) is killed by signal: Bus error. It is possible that dataloader’s workers are out of shared memory. Please try to raise your shared memory limit.”,具体如下图:
一,报错原因:
docker的共享内存shm不够
1, 在运行的代码前面加上下面代码(建议使用这种方法,操作比较简单):
import sys
import torch
from torch.utils.data import dataloader
from torch.multiprocessing import reductions
from multiprocessing.reduction import ForkingPickler
default_collate_func = dataloader.default_collate
def default_collate_override(batch):
dataloader._use_shared_memory = False
return default_collate_func(batch)
setattr(dataloader, 'default_collate', default_collate_override)
for t in torch._storage_classes:
if sys.version_info[0] == 2:
if t in ForkingPickler.dispatch:
del ForkingPickler.dispatch[t]
else:
if t in ForkingPickler._extra_reducers:
del ForkingPickler._extra_reducers[t]
2,在运行docker容器的启动命令中加入以下参数:
--shm-size="64g"
具体改为多少依据宿主机的运行内存来定
3,修改对应docker容器的配置文件
首先停止docker服务:
$ systemctl stop docker
修改容器的配置文件:
$ su root
$ cd /var/lib/docker/containers/容器ID
$ ls
$ vim hostconfig.json
修改里面Shmsize的大小:
重启docker服务:
$ systemctl restart docker
Done!!!