报错信息:
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
报错原因:
windows系统不支持nccl,采用gloo;
报错解决:
代码开头添加:
import os
os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "gloo"
代码处修改:
dist.init_process_group(backend=backend, init_method="env://")
# 将原来的backend赋值为'gloo'
dist.init_process_group(backend='gloo', init_method="env://")