分布式训练时出现如下报错
Error detected in CudnnBatchNormBackward0. No forward pass information available. Enable detect anomaly during forward pass for more information
在DistributedDataParallel加上broadcast_buffers=False 问题解决
torch.nn.parallel.DistributedDataParallel(net,device_ids=[args.local_rank],find_unused_parameters=True, broadcast_buffers=False)