程序运行一段时间停止或报错:RuntimeError: Too many open files. Communication with the workers is no longer possible.
报错:
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using
ulimit -n
in the shell or change the sharing strategy by callingtorch.multiprocessing.set_sharing_strategy('file_system')
at the beginning of your code
每次运行程序,经过若干个epoch就会自动停下来,查资料发现以下解决方案:
解决:
在代码开头引入 torch 之后加上:
import torchimport torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')
原因:
pytorch 的 dataloader 在读取数据时,设置了较大的 batchsize 和 num_workers。