上一篇blog里简单研究了点对点通信的一些特点,现就集合通信的功能做一些简单的测试。
先来看看MPI_Barrier,其实这个函数严格来说不能算作集合通信,测试程序如下:
void MPI_Barrier_commworld()
{
int rank,size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if(0==rank) sleep(5);
MPI_Barrier(MPI_COMM_WORLD);
printf("%d of %d has reached\n",rank,size);
}
运行结果如下:
0 of 4 has reached //运行到此处要停留一段时间
2 of 4 has reached
1 of 4 has reached
3 of 4 has reached
改为MPI_COMM_SELF,运行结果如下:
2 of 4 has reached
1 of 4 has reached
3 of 4 has reached //运行至此处要停留一段时间
0 of 4 has reached
MPI_Barrier 看来没什么好测试的了。
再来看看MPI_Bcast,主要是在MPI_COMM_SELF的情况,其实这一点也没有太多要测试的,缓冲区不会有什么变化,测试程序如下:
void MPI_Bcast_commself()
{
int rank,size;
int temp = 5;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Bcast(&temp,1,MPI_INT,0,MPI_COMM_SELF);
printf("%d of %d has received %d\n",rank,size,temp);
}
运行结果就不贴出来了。
接下来是MPI_Gather,先来一个常规的,测试程序如下:
void MPI_Gather_commworld()
{
int i;
int rank,size;
int* temp = (int*)malloc(sizeof(int)*4);
temp[0] = 3;
temp[1] = 2;
temp[2] = 1;
temp[3] = 0;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Gather(&rank, 1, MPI_INT, temp ,1, MPI_INT, 0 , MPI_COMM_WORLD);
printf("%d has gather ",rank);
for(i=0;i<4;i++){
printf("%d ",temp[i]);
}
printf("\n");
}
程序运行结果如下:
1 has gather 3 has gather 3 2 1 0
3 2 1 0
0 has gather 0 1 2 3
2 has gather 3 2 1 0
输出格式还不太对,不过通过程序的运行结果可以发现,只有rank 0变了,其他几个缓冲区都没有变化,与程序预期结果相同,再来看看MPI_COMM_SELF的情况,测试程序如下:
void MPI_Gather_commself()
{
int i;
int rank,size;
int* temp = (int*)malloc(sizeof(int)*4);
temp[0] = 4;
temp[1] = 4;
temp[2] = 4;
temp[3] = 4;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Gather(&rank, 1, MPI_INT, temp ,1, MPI_INT, 0 , MPI_COMM_SELF);
//MPI_Gather(&rank, 1, MPI_INT, temp ,2, MPI_INT, 0 , MPI_COMM_SELF); recvcount大于sendcount,仅能接收sendcount个元素
//MPI_Gather(&rank, 2, MPI_INT, temp ,2, MPI_INT, 0 , MPI_COMM_SELF); sendcount大于实际的send长度,会按照sendcount的长度进行发送,直到出现异常
//MPI_Gather(&rank, 2, MPI_INT, temp ,1, MPI_INT, 0 , MPI_COMM_SELF); 直接出现异常。
printf("%d has gather ",rank);
for(i=0;i<4;i++){
printf("%d ",temp[i]);
}
printf("\n");
}
程序运行结果如下:
1 has gather 2 has gather 1 4 4 4
0 has gather 0 4 4 4
2 4 4 4
3 has gather 3 4 4 4
可以看到只有第一个int被复制,上述第四种情况运行结果如下:
Fatal error in PMPI_Gather: Message truncated, error stack:
PMPI_Gather(904)......: MPI_Gather(sbuf=0x7ffdbb561064, scount=2, MPI_INT, rbuf=0xbd0220, rcount=1, MPI_INT, root=0, MPI_COMM_SELF) failed
MPIR_Gather_impl(726).:
MPIR_Gather(686)......:
MPIR_Gather_intra(187):
MPIR_Localcopy(74)....: Message truncated; 8 bytes received but buffer size is 4
可以发现出现了缓冲区溢出异常。收到了8byte,但接收缓冲区只有4byte。
通过以上四种情况,可以得出结论:MPI_Gather的操作都是以sendcount作为标准。
再来看看MPI_Reduce,先来一个比较特殊的情况,sendbuf与recvbuf指向同一片内存。程序源码如下:
void MPI_Reduce_commworld()
{
int i;
int rank,size;
int* temp = (int*)malloc(sizeof(int)*4);
temp[0] = 4;
temp[1] = 3;
temp[2] = 2;
temp[3] = 1;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Reduce( temp, temp, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
for(i=0;i<4;i++){
printf("%d",temp[i]);
}
printf("\n");
}
程序运行结果如下:
4Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1258): MPI_Reduce(sbuf=0x776220, rbuf=0x776220, count=1, MPI_INT, MPI_MIN, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1185): Buffers must not be aliased
321
通过最后一句可以发现buffer一定不能是别名。MPI_COMM_SELF 也产生同样的异常。
接下来对它的功能进行一个简单的测试:
void MPI_Reduce_commworld()
{
int i;
int rank,size;
int* temp = (int*)malloc(sizeof(int)*4);
int* sendbuf = (int*)malloc(sizeof(int)*4);
int* recvbuf = (int*)malloc(sizeof(int)*4);
temp[0] = 4;
temp[1] = 3;
temp[2] = 2;
temp[3] = 1;
sendbuf[0] = 0;
sendbuf[1] = 1;
sendbuf[2] = 2;
sendbuf[3] = 3;
recvbuf[0] = 0;
recvbuf[1] = 0;
recvbuf[2] = 0;
recvbuf[3] = 0;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Reduce( sendbuf, recvbuf, 4, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
for(i=0;i<4;i++){
printf("%d ",recvbuf[i]);
}
printf("\n");
}
程序运行结果如下:
0 0 0 0
0 0 0 0
0 0 0 0
0 1 2 3
通过上述程序的运行,可以发现MPI_INT 是求出缓冲区中对应位置上的最小值。MPI_COMM_SELF可将结果全部写入root节点。
这里还要注意一点的就是发送与接收缓冲区都以count字节为准,即可以大于或小于缓冲区的实际长度。
最后看一个创建函数的例子,程序源码如下:
void
CblacsAbsMax(void *invec, void *inoutvec, int *len, MPI_Datatype *datatype) {
int i, n = *len; double *dinvec, *dinoutvec;
if (MPI_DOUBLE == *datatype) {
dinvec = (double *)invec;
dinoutvec = (double *)inoutvec;
for (i = n; i; i--, dinvec++, dinoutvec++)
if (fabs(*dinvec) > fabs(*dinoutvec)) *dinoutvec = *dinvec;
}
}
这个函数是我从HPCC中抄出来的。
MPI_Op_create( CblacsAbsMax, 1, &op );
MPI_Reduce( sendbuf, recvbuf, 2, MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
MPI_Op_free( &op );
运行结果与上面的例子相同。