今天在Docker中运行Pytorch程序是出现了这个错误:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
问题原因:
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with
--ipc=host
or--shm-size
command line options tonvidia-docker run
.
说明:Pytorch的IPC会利用共享内存,所以对于当前代码运行环境的共享内存必须足够大:
解决方法:
1:修改当前Docker的shm-size:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0,1 --shm-size 8G -it ******* env LANG=C.UTF-8 /bin/bash
2:修改DataLoader中参数num_workers的值:
dataloader = torch.utils.data.DataLoader(
dataset,
batch_size=16,
shuffle=True,
num_workers=0,
pin_memory=True,
collate_fn=dataset.collate_fn
)
错误解决!!!