程序调用非阻塞通信函数MPI_Isend(), MPI_Irecv(),接收时有MPI_wait()操作
迭代到第5000多次时出现如下错误:
5280 -1.272734378291617E-004 1.271885446338949E-004
1.93516788631215 -0.246120726174522 9.005226840169125E-006
1.00000247207768
[cli_3]: aborting job:
Fatal error in MPI_Isend: Internal MPI error!, error stack:
MPI_Isend(145): MPI_Isend(buf=0x12e37e40, count=5000, MPI_DOUBLE_PRECISION, dest=4, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed
(unknown)(): Internal MPI error!
[cli_2]: aborting job:
Fatal error in MPI_Isend: Internal MPI error!, error stack:
MPI_Isend(145): MPI_Isend(buf=0x12dbdd20, count=5000, MPI_DOUBLE_PRECISION, dest=3, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed
(unknown)(): Internal MPI error!
[cli_5]: aborting job:
Fatal error in MPI_Isend: Internal MPI error!, error stack:
MPI_Isend(145): MPI_Isend(buf=0x12f2c080, count=5000, MPI_DOUBLE_PRECISION, dest=6, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed
(unknown)(): Internal MPI error!
[cli_6]: aborting job:
Fatal error in MPI_Isend: Internal MPI error!, error stack:
MPI_Isend(145): MPI_Isend(buf=0x12fa61a0, count=5000, MPI_DOUBLE_PRECISION, dest=7, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed
(unknown)(): Internal MPI error!
rank 6 in job 2 v3901_33329 caused collective abort of all ranks
exit status of rank 6: return code 13
rank 5 in job 2 v3901_33329 caused collective abort of all ranks
exit status of rank 5: return code 13
rank 2 in job 2 v3901_33329 caused collective abort of all ranks
exit status of rank 2: return code 13
原因尚不知,有时会出现out of memory错误,难道是MPI_Isend()函数把内存耗尽了?不应该啊