训练模型时,容器的共享内存不够导致出错:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
RuntimeError: DataLoader worker (pid 4236) is killed by signal: Bus error. It is possible that dataloader’s workers are out of shared memory. Please try to raise your shared memory limit.
RuntimeError: DataLoader worker (pid(s) 4236) exited unexpectedly
解决方法:
1 docker ps -a
找到自己的容器id,即CONTAINER ID。以我的为例子,我的是7331d7806ecd
2 执行下面命令,得到另一个Id.
docker inspect 7331d7806ecd | grep Id
"Id": "7331d7806ecd42bf6490c6e0aa2a631f9bd66694f97c9ac91d77b9b1240f52c2",
3.执行cd /var/lib/docker/containers
, ls一下,找到 Id,cd 进去。
4.再ls一下,发现有个hostconfig.json文件,先别编辑,先停止一下容器,即systemctl stop docker
或者 service docker stop
。
然后再编辑,即vi hostconfig.json
。
5.改shmsize的大小为8G:"ShmSize":8589934592
6.最后执行下面命令,重启容器
systemctl restart docker
docker start 7331d7806ecd
docker exec -it 7331d7806ecd /bin/bash
可以df命令看一下结果。