目录
insightface训练
商量就是多卡训练:windows不支持nccl:
try:
world_size =1# int(os.environ["WORLD_SIZE"])
rank =0# int(os.environ["RANK"])
# distributed.init_process_group("nccl")
distributed.init_process_group("gloo")
except KeyError:
world_size = 1
rank = 0
distributed.init_process_group(
backend="nccl",
init_method="tcp://127.0.0.1:12584",
rank=rank,
world_size=world_size,
)
out of memory问题
1 torch distributed.init out of memory,