c语言程序设计怎么改成mpi,C和MPI如何将部分代码写成并行？

最新推荐文章于 2024-08-27 15:26:53 发布

展飞哥

最新推荐文章于 2024-08-27 15:26:53 发布

阅读量440

点赞数

文章标签： c语言程序设计怎么改成mpi

我一直在使用PETSc库编写一些代码,现在我要将其中的一部分更改为并行运行.我想要并行化的大多数事情是矩阵初始化以及我生成和计算大量值的部分.无论如何,如果因为某些原因运行代码的所有部分将运行的次数与我使用的内核数量相同,我的问题仍然存在.

这只是我测试PETSc和MPI的简单示例代码

int main(int argc, char** argv)

{

time_t rawtime;

time ( &rawtime );

string sta = ctime (&rawtime);

cout << "Solving began..." << endl;

PetscInitialize(&argc, &argv, 0, 0);

Mat A; /* linear system matrix */

PetscInt i,j,Ii,J,Istart,Iend,m = 120000,n = 3,its;

PetscErrorCode ierr;

PetscBool flg = PETSC_FALSE;

PetscScalar v;

#if defined(PETSC_USE_LOG)

PetscLogStage stage;

#endif

/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Compute the matrix and right-hand-side vector that define

the linear system, Ax = b.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */

Create parallel matrix, specifying only its global dimensions.

When using MatCreate(), the matrix format can be specified at

runtime. Also, the parallel partitioning of the matrix is

determined by PETSc at runtime.

Performance tuning note: For problems of substantial size,

preallocation of matrix memory is crucial for attaining good

performance. See the matrix chapter of the users manual for details.

ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);

ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m,n);CHKERRQ(ierr);

ierr = MatSetFromOptions(A);CHKERRQ(ierr);

ierr = MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);CHKERRQ(ierr);

ierr = MatSeqAIJSetPreallocation(A,5,PETSC_NULL);CHKERRQ(ierr);

ierr = MatSetUp(A);CHKERRQ(ierr);

Currently, all PETSc parallel matrix formats are partitioned by

contiguous chunks of rows across the processors. Determine which

rows of the matrix are locally owned.

ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr);

Set matrix elements for the 2-D, five-point stencil in parallel.

- Each processor needs to insert only elements that it owns

locally (but any non-local elements will be sent to the

appropriate processor during matrix assembly).

- Always specify global rows and columns of matrix entries.

Note: this uses the less common natural ordering that orders first

all the unknowns for x = h then for x = 2h etc; Hence you see J = Ii +- n

instead of J = I +- m as you might expect. The more standard ordering

would first do all variables for y = h, then y = 2h etc.

PetscMPIInt rank; // processor rank

PetscMPIInt size; // size of communicator

MPI_Comm_rank(PETSC_COMM_WORLD,&rank);

MPI_Comm_size(PETSC_COMM_WORLD,&size);

cout << "Rank = " << rank << endl;

cout << "Size = " << size << endl;

cout << "Generating 2D-Array" << endl;

double temp2D[120000][3];

for (Ii=Istart; Ii

for(J=0; J

temp2D[Ii][J] = 1;

}

cout << "Processor " << rank << " set values : " << Istart << " - " << Iend << " into 2D-Array" << endl;

v = -1.0;

for (Ii=Istart; Ii

for(J=0; J

MatSetValues(A,1,&Ii,1,&J,&v,INSERT_VALUES);CHKERRQ(ierr);

}

cout << "Ii = " << Ii << " processor " << rank << " and it owns: " << Istart << " - " << Iend << endl;

Assemble matrix, using the 2-step process:

MatAssemblyBegin(), MatAssemblyEnd()

Computations can be done while messages are in transition

by placing code between these two statements.

ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);

ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);

MPI_Finalize();

cout << "No more MPI" << endl;

return 0;

}

我的真实程序有几个不同的.cpp文件.我在主程序中初始化MPI,在另一个.cpp文件中调用一个函数,我在那里实现了相同类型的矩阵填充,但是在填充矩阵之前程序所做的所有cout都将被打印出与我的核心数一样多的次数.

我可以运行我的测试程序作为mpiexec -n 4测试并且它成功运行但由于某种原因我必须运行我的真实程序作为mpiexec -n 4 ./myprog

我的测试程序输出如下

Solving began...

Rank = 0

Size = 4

Generating 2D-Array

Processor 0 set values : 0 - 30000 into 2D-Array

Rank = 2

Size = 4

Generating 2D-Array

Processor 2 set values : 60000 - 90000 into 2D-Array

Rank = 3

Size = 4

Generating 2D-Array

Processor 3 set values : 90000 - 120000 into 2D-Array

Rank = 1

Size = 4

Generating 2D-Array

Processor 1 set values : 30000 - 60000 into 2D-Array

Ii = 30000 processor 0 and it owns: 0 - 30000

Ii = 90000 processor 2 and it owns: 60000 - 90000

Ii = 120000 processor 3 and it owns: 90000 - 120000

Ii = 60000 processor 1 and it owns: 30000 - 60000

no more MPI

两条评论后编辑：

所以我的目标是在具有20个节点且每个节点有2个核心的小型集群上运行它.后来应该在超级计算机上运行所以mpi绝对是我需要的方式.我目前正在两台不同的机器上测试它,其中一台有1个处理器/ 4个核心,第二个有4个处理器/ 16个核心.

解决方法:

MPI是SPMD / MPMD模型的实现(单个程序多个数据/多个程序多个数据). MPI作业包括同时运行在彼此之间交换消息的进程,以便合作解决问题.您不能并行运行部分代码.您只能让部分代码不能相互通信但仍然可以并发执行.你应该使用mpirun或mpiexec以并行模式启动你的应用程序.

如果您只想使代码的一部分并行并且可以忍受只能在一台机器上运行代码的限制,那么您需要的是OpenMP而不是MPI.或者您也可以根据PETSc网站使用低级POSIX线程编程,它支持pthreads. OpenMP建立在pthread之上,因此使用PETSc和OpenMP可能是可能的.

标签：c,parallel-processing,mpi

来源： https://codeday.me/bug/20190826/1727941.html

展飞哥

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫