教材是《并行程序设计导论》,代码参考该教材。
用了MPI_Scatter和MPI_Gather(Allgather也一样,不写destination的参数就行了
四个进程并行跑得比串行慢,问了其他人他们也是这个结果,猜测是通信开销大的原因。有大佬知道并且愿意告诉我的欢迎留言qwq
代码中注释掉的部分是测试时从命令行输入向量使用的,那个时候把n设了12。因为要求是n=10000,所以我直接让第i个元素等于i,并且只输出了i=1126对应的元素,以及我们需要的运行时间。
编程过程中遇到的一个坑是,我一直以为char *和char []是一样的,所以直接给Read_vectorh和Print_vector传递了char *类型的参数,于是一直报warning(事实上直接传"string"就可以qwq)
#include <stdio.h>
#include <string.h>
#include <mpi.h>void Read_vector(double local_a[], int local_n, int n, char vec_name[], int my_rank, MPI_Comm comm){
double* a = NULL;
int i;if(my_rank == 0){
a = malloc(n*sizeof(double));
//printf("Enter the vector %s\n",vec_name);
for(i = 0; i < n; i++)
a[i] = i;//scanf("%lf", &a[i]);
MPI_Scatter(a, local_n, MPI_DOUBLE, local_a, local_n, MPI_DOUBLE, 0, comm);
free(a);
} else{
MPI_Scatter(a, local_n, MPI_DOUBLE, local_a, local_n, MPI_DOUBLE, 0, comm);
}
}void Print_vector(double local_b[], int local_n, int n, char title[], int my_rank, MPI_Comm comm){
double* b = NULL;
//int i;if(my_rank == 0){
b = malloc(n*sizeof(double));
MPI_Gather(local_b, local_n, MPI_DOUBLE, b, local_n, MPI_DOUBLE, 0, comm);
printf("%s\n", title);printf("%f ", b[1126]);
/*for(i = 0; i < n; i++)
printf("%f ", b[i]);*/
printf("\n");
free(b);
} else {
MPI_Gather(local_b, local_n, MPI_DOUBLE, b, local_n, MPI_DOUBLE, 0, comm);
}
}
void Parrel_vector_sum(double local_x[], double local_y[], double local_z[], int local_n){
int local_i;
for(local_i = 0; local_i < local_n; local_i++)
local_z[local_i] = local_x[local_i] + local_y[local_i];
}int main(void){
int comm_sz;
int my_rank;
int n = 10000;
double *local_a,*local_b,*local_c;
char A[10]="A";
char B[10]="B";
char C[50]="the 1126th element of'A + B' is ";MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);int local_n = n/comm_sz;
local_a = malloc(local_n*sizeof(double));
local_b = malloc(local_n*sizeof(double));
local_c = malloc(local_n*sizeof(double));Read_vector(local_a, local_n, n, A, my_rank, MPI_COMM_WORLD);
Read_vector(local_b, local_n, n, B, my_rank, MPI_COMM_WORLD);
double start = MPI_Wtime();
Parrel_vector_sum(local_a, local_b, local_c, local_n);
Print_vector(local_c, local_n, n, C, my_rank, MPI_COMM_WORLD);
double finish = MPI_Wtime();
printf("the time needed is %e\n",finish-start);MPI_Finalize();
return 0;
}
运行之后是这样的:
有任何错误欢迎指正。