ubuntu install mpirun :
sudo apt install mpich
centos install mpirun:
download mpich-4.1.2.tar.gz from Downloads | MPICH
tar xf mpich-4.1.2.tar.gz
cd mpich-4.1.2
./configure --disable-fortran
make;make install
run mpirun for 多个主机的nccl-test
需要用当前用户设置 多个主机间的无密码ssh登录,如果没有设置成功的话,会导致ssh hang住或者抛出提示信息
mpirun --allow-run-as-root --bind-to none -v -H h1:4,h2:4 -x NCCL_IB_DISABLE=0 -x NCCL_SOCKET_IFNAME=eth0 -x NCCL_IB_HCA=mlx5_0,mlx5_1,mlx5_4,mlx5_5 -x CUDA_VISIBLE_DEVICES=0,1,2,3 -x NCCL_DEBUG=INFO /home/notebook/code/group/caofulin/nccl-tests/build/all_reduce_perf -b 8 -e 4G -f 2 -g 1
h1,h2 为两个主机的通信IP,一般是管理网