版权声明:本文为转载,出处: pytorch错误:RuntimeError: received 0 items of ancdata解决 - 碧水青山 - 博客园
1. 错误说明
RuntimeError: received 0 items of ancdata
是在dataloader加载数据时出现的错误
2. 原因
pytorch多线程共享tensor是通过打开文件的方式实现的,而打开文件的数量是有限制的。
通过ulimit -a
查看,当需共享的tensor超过open files限制时,即会出现该错误。
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 128088
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 128088
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
3. 解决办法
1、增加open files的限制数量:
不能用 sudo ulimit -n
,
而需执行sudo sh -c "ulimit -n 65535 && exec su $LOGNAME"
解释如下:
ulimit is a shell builtin like cd, not a separate program. sudo
looks for a binary to run, but there is no ulimit binary, which
is why you get the error message. You need to run it in a shell.
However, while you do need to be root to raise the limit to
65535, you probably don’t want to run your program as root. So
after you raise the limit you should switch back to the current
user.
To do this, run:
sudo sh -c "ulimit -n 65535 && exec su $LOGNAME"
and you will get a new shell, without root privileges, but with
the raised limit. The exec causes the new shell to replace the
process with sudo privileges, so after you exit that shell, you
won’t accidentally end up as root again.
2、修改多线程的tensor方式为file_system:
torch.multiprocessing.set_sharing_strategy('file_system')
默认方式为file_descriptor
,受限于open files数量