@[TOC](Docker中使用pytorch报错ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).)
问题原因
由于docker中shm内存不足,导致pytorch调用Dataloader时报错。
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
解决办法
1)方法一:修改将Dataloader的num_workers,将其减小。
2)方法二:修改docker的shm大小(推荐)
(1)关闭docker
sudo docker stop dockerID(具体ID,通过docker ps -a查询)
(2)进入
cd /var/lib/docker/containers/dockerID开头/
(3)修改hostconfig.json
vim hostconfig.json
在hostconfig.json文件中找到关键字“ShmSize”,将其后面数字加大后保存(比如我设置为67108864222大小,约64G)
然后保存退出。
(4)重启docker
启动容器
sudo docker start dockerID
进入容器
sudo docker exec -it dockerID
或
sudo nvidia-docker exec -it dockerID bash