In the API MPI tests, on the CUDA+OpenCL config on bs_1310, the following intermittent problem can be seen when run with multiple ranks:
00:09:11.740 [==========] Running 7 tests from 4 test cases.
00:09:11.740 [----------] Global test environment set-up.
00:09:11.740 [----------] 3 tests from ApiRunner
00:09:11.740 [ RUN ] ApiRunner.BasicMD
00:09:11.740 Checkpoint file is from part 5, new output files will be suffixed '.part0006'.
00:09:11.740 The current CPU can measure timings more accurately than the code in
00:09:11.740 gmxapi-mpi-test was configured to use. This might affect your simulation
00:09:11.740 speed as accurate timings are needed for load-balancing.
00:09:11.740 Please consider rebuilding gmxapi-mpi-test with the GMX_USE_RDTSCP=ON CMake option.
00:09:11.740 Reading file /mnt/workspace/Matrix_PreSubmit_master/87cf13c6/gromacs/src/api/cpp/tests/topol.tpr, VERSION 2019-dev-20181010-8a92135 (single precision)
00:09:11.740
00:09:11.740 Overriding nsteps with value passed on the command line: 10 steps, 0.01 ps
00:09:11.740 Changing nstlist from 10 to 100, rlist from 1 to 1
00:09:11.740
00:09:11.740
00:09:11.740 Using 2 MPI processes
00:09:11.740 Using 4 OpenMP threads per MPI process
00:09:11.740
00:09:11.740 On host bs-nix1310 2 GPUs auto-selected for this run.
00:09:11.740 Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
00:09:11.740 PP:0,PP:1
00:09:11.740 starting mdrun 'spc-and-methanol'
00:09:11.740 80 steps, 0.1 ps (continuing from step 70, 0.1 ps).
00:09:11.740
00:09:11.740 Writing final coordinates.
00:09:11.740
00:09:11.740 NOTE: 45 % of the run time was spent communicating energies,
00:09:11.740 you might want to use the -gcom option of mdrun
00:09:11.740
00:09:11.740 Core t (s) Wall t (s) (%)
00:09:11.740 Time: 0.175 0.022 795.8
00:09:11.740 (ns/day) (hour/ns)
00:09:11.740 Performance: 43.232 0.555
00:09:11.740 [ OK ] ApiRunner.BasicMD (1319 ms)
00:09:11.740 [ RUN ] ApiRunner.Reinitialize
00:09:11.740 Checkpoint file is from part 6, new output files will be suffixed '.part0007'.
00:09:11.740 The current CPU can measure timings more accurately than the code in
00:09:11.740 gmxapi-mpi-test was configured to use. This might affect your simulation
00:09:11.740 speed as accurate timings are needed for load-balancing.
00:09:11.740 Please consider rebuilding gmxapi-mpi-test with the GMX_USE_RDTSCP=ON CMake option.
00:09:11.740 Reading file /mnt/workspace/Matrix_PreSubmit_master/87cf13c6/gromacs/src/api/cpp/tests/topol.tpr, VERSION 2019-dev-20181010-8a92135 (single precision)
00:09:11.740
00:09:11.740 Overriding nsteps with value passed on the command line: 20 steps, 0.02 ps
00:09:11.740 Changing nstlist from 10 to 100, rlist from 1 to 1
00:09:11.740
00:09:11.740
00:09:11.740 Using 2 MPI processes
00:09:11.740 Using 4 OpenMP threads per MPI process
00:09:11.740
00:09:11.740 On host bs-nix1310 2 GPUs auto-selected for this run.
00:09:11.740 Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
00:09:11.740 PP:0,PP:1
00:09:11.740
00:09:11.740 Non-default thread affinity set probably by the OpenMP library,
00:09:11.740 disabling internal thread affinity
00:09:11.740 starting mdrun 'spc-and-methanol'
00:09:11.740 100 steps, 0.1 ps (continuing from step 80, 0.1 ps).
00:09:11.740
00:09:11.740
00:09:11.740 Received the remote INT/TERM signal, stopping within 100 steps
00:09:11.740
00:09:11.740
00:09:11.740
00:09:11.740 Received the remote INT/TERM signal, stopping within 100 steps
00:09:11.740
00:09:11.740
00:09:11.740 NOTE: 47 % of the run time was spent communicating energies,
00:09:11.740 you might want to use the -gcom option of mdrun
00:09:11.740
00:09:11.740 Core t (s) Wall t (s) (%)
00:09:11.740 Time: 0.144 0.018 796.5
00:09:11.740 (ns/day) (hour/ns)
00:09:11.740 Performance: 9.589 2.503
00:09:11.740 Checkpoint file is from part 7, new output files will be suffixed '.part0008'.
00:09:11.740 The current CPU can measure timings more accurately than the code in
00:09:11.740 gmxapi-mpi-test was configured to use. This might affect your simulation
00:09:11.740 speed as accurate timings are needed for load-balancing.
00:09:11.740 Please consider rebuilding gmxapi-mpi-test with the GMX_USE_RDTSCP=ON CMake option.
00:09:11.740 Reading file /mnt/workspace/Matrix_PreSubmit_master/87cf13c6/gromacs/src/api/cpp/tests/topol.tpr, VERSION 2019-dev-20181010-8a92135 (single precision)
00:09:11.740
00:09:11.740 Overriding nsteps with value passed on the command line: 20 steps, 0.02 ps
00:09:11.740 Changing nstlist from 10 to 100, rlist from 1 to 1
00:09:11.740
00:09:11.740
00:09:11.740 Using 2 MPI processes
00:09:11.740 Using 4 OpenMP threads per MPI process
00:09:11.740
00:09:11.740 On host bs-nix1310 2 GPUs auto-selected for this run.
00:09:11.740 Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
00:09:11.740 PP:0,PP:1
00:09:11.740
00:09:11.740 Non-default thread affinity set probably by the OpenMP library,
00:09:11.740 disabling internal thread affinity
00:09:11.740 starting mdrun 'spc-and-methanol'
00:09:11.740 101 steps, 0.1 ps (continuing from step 81, 0.1 ps).
00:09:11.740
00:09:11.740 -------------------------------------------------------
00:09:11.740 Program: gmxapi-mpi-test, version 2019-dev-20181010-8a92135
00:09:11.740 Source file: src/gromacs/gpu_utils/gpuregiontimer.h (line 94)
00:09:11.740 Function: GpuRegionTimerWrapper::openTimingRegion(CommandStream):: [with GpuRegionTimerImpl = GpuRegionTimerImpl]
00:09:11.740 MPI rank: 1 (out of 2)
00:09:11.740
00:09:11.740 Assertion failed:
00:09:11.740 Condition: debugState_ == TimerState::Idle
00:09:11.740 GPU timer should be idle, but is stopped.
00:09:11.740
00:09:11.740 For more information and tips for troubleshooting, please check the GROMACS
00:09:11.740 website at http://www.gromacs.org/Documentation/Errors
00:09:11.740 -------------------------------------------------------
00:09:11.740 --------------------------------------------------------------------------
00:09:11.740 MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
00:09:11.740 with errorcode 1.
00:09:11.740
00:09:11.740 NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
00:09:11.740 You may or may not see output from other processes, depending on
00:09:11.740 exactly when Open MPI kills them.
00:09:11.740 --------------------------------------------------------------------------
00:09:11.740 --------------------------------------------------------------------------
00:09:11.740 mpiexec has exited due to process rank 1 with PID 19321 on
00:09:11.740 node bs-nix1310 exiting improperly. There are two reasons this could occur:
00:09:11.740
00:09:11.740 1. this process did not call "init" before exiting, but others in
00:09:11.740 the job did. This can cause a job to hang indefinitely while it waits
00:09:11.740 for all processes to call "init". By rule, if one process calls "init",
00:09:11.740 then ALL processes must call "init" prior to termination.
00:09:11.740
00:09:11.740 2. this process called "init", but exited without calling "finalize".
00:09:11.740 By rule, all processes that call "init" MUST call "finalize" prior to
00:09:11.740 exiting or it will be considered an "abnormal termination"
00:09:11.740
00:09:11.740 This may have caused other processes in the application to be
00:09:11.740 terminated by signals sent by mpiexec (as reported here).
00:09:11.740 --------------------------------------------------------------------------
I have reproduced with a manual build on that host. Sometimes BasicMD also fails.
This seems like a GPU infrastructure issue with multiple simulation parts within the same process, which is exposed by the API use case, but probably not caused by it. I'm not sure why our existing continuation test cases don't trigger it, but perhaps they are not run with multiple ranks per simulation.