使用DDP进行多卡加速训练,卡在以下位置:
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 8 processes
----------------------------------------------------------------------------------------------------
解决方法
export NCCL_P2P_DISABLE=1