当我输入命令时
mpirun --mca bt1_tcp_if_exclude "p2p1,lo,virbr0,virbr0-nic" -n 5 -v -show-progress --display-map -hostfile my_hostfile.txt my_mpi_program -in infile
Data for JOB [1392,1] offset 0
======================== JOB MAP ========================
Data for node: mschramm Num slots: 4 Max slots: 4 Num procs: 4
Process OMPI jobid: [1392,1] App: 0 Process rank: 0
Process OMPI jobid: [1392,1] App: 0 Process rank: 1
Process OMPI jobid: [1392,1] App: 0 Process rank: 2
Process OMPI jobid: [1392,1] App: 0 Process rank: 3
Data for node: client_1 Num slots: 4 Max slots: 4 Num procs: 1
Process OMPI jobid: [1392,1] App: 0 Process rank: 4
=============================================================
App launch reported: 2 (out of 2) daemons - 0 (out of 5) procs
App launch reported: 2 (out of 2) daemons - 4 (out of 5) procs
[mschramm][[1392,1],0][btl_tcp_endpoint.c:818:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.122.1 failed: Connection refused (111)
这表示由于192.168.122.1,连接被拒绝,但我已经声明在原始命令中排除它(--mca bt1_tcp_if_exclude“p2p1,lo,virbr0,virbr0 nic”)
任何和所有的帮助将不胜感激。
MPI程序要么停止(如上所示),要么挂起。这似乎是随机的。
在被告知是“l”而不是“1”之后。。。我有以下错误
App launch reported: 2 (out of 2) daemons - 4 (out of 6) procs [mschramm][[6062,1],0]
[btl_tcp_endpoint.c:818:mca_btl_tcp_endpoint_complete_connect]
[mschramm][[6062,1],1 [btl_tcp_endpoint.c:818:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.1.118 failed: No route to host (113) connect() to 192.168.1.118 failed: No route to host (113)
在主节点上运行以下命令
[mpi_user@mschramm ~]$ ifconfig em1
em1: flags=4163 mtu 1500
inet 192.168.1.143 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::8d16:c4ff:a398:26c1 prefixlen 64 scopeid 0x20
ether f8:b1:56:cd:91:f1 txqueuelen 1000 (Ethernet)
RX packets 15649 bytes 2240360 (2.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 35427 bytes 34470024 (32.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 20 memory 0xf7200000-f7220000
[mpi_user@mschramm ~]$ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.27.15.254 0.0.0.0 UG 0 0 0 p2p1
0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 em1
10.27.12.0 0.0.0.0 255.255.252.0 U 0 0 0 p2p1
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 em1
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
在第一个客户机节点上
[mpi_user@client_1 ~]$ ifconfig em1
em1: flags=4163 mtu 1500
inet 192.168.1.118 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::1b26:a452:58ac:1cdd prefixlen 64 scopeid 0x20
inet6 fe80::8d16:c4ff:a398:26c1 prefixlen 64 scopeid 0x20
ether f8:b1:56:cd:97:74 txqueuelen 1000 (Ethernet)
RX packets 36341 bytes 34562000 (32.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14544 bytes 2126373 (2.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 20 memory 0xf7100000-f7120000
[mpi_user@client_1 ~]$ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 em1
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 em1
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0