1. OpenMPI多节点运行报错问题
问题描述:节点一即host3,通过mpirun调用节点二即host4的mpi程序,报错如下。
$ mpirun -np 1 --host host4 ./main
[[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 367
[[INVALID],INVALID]-[[59225,0],0] mca_oob_tcp_peer_try_connect: connect to 255.255.255.255:51754 failed: Network is unreachable (101)
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmp