rank :0 ,source: -1 ,dest: 1
rank :2 ,source: 1 ,dest: 0
Fatal error in MPI_Send: Unknown error class, error stack:
MPI_Send(174)..............: MPI_Send(buf=0x7ffd4cc4db30, count=5, MPI_INT, dest=1, tag=5, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1832): Communication error with rank 1: Connection refused
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18691 RUNNING AT yuanhe
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@centos] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:1@centos] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@centos] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@yuanhe] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@yuanhe] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@yuanhe] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@yuanhe] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
example1.c
#include <stdio.h>
#include "mpi.h"
const int COUNT = 5;
int
main (int argc, char *argv[])
{
MPI_Status status;
int tag = 5, size, rank;
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
MPI_Comm_size (MPI_COMM_WORLD, &size);
int A[5], B[5], C[5];
int i;
for (i = 0; i < COUNT; i++)
{
A[i] = 2;
B[i] = 0;
C[i] = 0;
}
int source = (rank - 1) % size;
int dest = (rank + 1) % size;
printf ("rank :%d ,source: %d ,dest: %d\n", rank, source, dest);
MPI_Send (A, COUNT, MPI_INT, dest, tag, MPI_COMM_WORLD);
if (rank != 0)
MPI_Recv (B, COUNT, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
int sum = 0;
for (i = 0; i < COUNT; i++)
{
C[i] = A[i] + B[i];
sum += C[i];
}
int ans = 0;
MPI_Reduce (&sum, &ans, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (rank == 0)
printf ("the sum of numbers is %d\n", ans);
MPI_Finalize ();
return 0;
}
原因是:主机名没有配置正确。由于SSH不能达到本文中所说的”在列表中的每台机器上面都可以不用输入密码地SSH到列表中的所有机器上面,包括本机(localhost)” 而导致的。
你要做到在yuanhe机器上ssh yuanhe 和ssh centos都能无密码登陆
在centos机器上ssh yuanhe 和ssh centos 都能无密码登陆才行。
在主机yuanhe上
sudo vim /etc/hosts
注释掉127.0.0.1 yuanhe 留着127.0.0.1 localhost
在主机centos上
sudo vim /etc/hosts
注释掉127.0.0.1 centos 留着127.0.0.1 localhost