错误信息
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
File "train_raf-db.py", line 214, in <module>
run_training()
File "train_raf-db.py", line 158, in run_training
outputs, alpha = model(imgs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/nanfang-pytorch1.7/Amend-Representation-Module/src/Networks.py", line 39, in forward
x = self.features(x)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
File "/opt/conda/lib/python3.8/site-packages/apex/amp/wrap.py", line 28, in wrapper
return orig_fn(*new_args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 103) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
解决方法
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Dataloader中的num_workers设置与docker的shared memory相关问题
总结:是由于docker容器内shm_size默认大小64m太小导致的问题,有两个解决思路
1.在Dataloader中将num_worker设置为0,只需在代码中修改比较简单,缺点是训练过程变慢,特别是对较大数据例如视频图像
2.改变容器中shared_memory大小,方法普遍要求重启docker服务或重建docker容器,公共服务器上重启docker服务不太现实,重建容器是进行设置较为方便。