mpi tcp连接报错_openmpi跨节点报错tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)...

客户反应作业无法跨节点,运行测试命令如下

mpirun -np 8 -hostfile hostfilt.txt sleep 5运行后报错如下:

[test02:01719] [[24772,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

--------------------------------------------------------------------------

ORTE was unable to reliably start one or more daemons.

This usually is caused by:

* not finding the required libraries and/or binaries on

one or more nodes. Please check your PATH and LD_LIBRARY_PATH

settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.

Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).

Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required

(e.g., on Cray). Please check your configure cmd line and consider using

one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a

lack of common network interfaces and/or no route found between

them. Please check network connectivity (including firewalls

and network routing requirements).

--------------------------------------------------------------------------网上查到的解决方法:

I had this same problem on Cygwin with OpenMPI 1.10.4.

Try adding "-report-uri -" to your mpirun command to see what IP address it's trying to use for connection:

mpirun -report-uri - -np 2 a.exe

It should print out a line that looks something like this:

568328192.0;tcp://192.168.10.103,169.254.247.11,0.0.0.0,0.0.0.0,0.0.0.0:55600

If the first IP address after the "tcp://" is not a current valid address for your machine, that's the problem and things are likely to break (even if the correct IP appears later in the list). Apparently ORTE is not smart enough to order the interfaces based on what is actually enabled and online.

If the wrong IP corresponds to an old/disabled interface, uninstall it (if possible)

Reference: https://stackoverflow.com/questions/34032655/cygwin-error-tcp-peer-send-blocking-send-to-socket

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值