AssertionError: Default process group is not initialized
# BatchNorm2d_class = BatchNorm2d = torch.nn.SyncBatchNorm
BatchNorm2d_class = BatchNorm2d = torch.nn.BatchNorm2d
将bn_help.py里的第19行做更改。(放弃分布式训练)
出现新的bug
RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 1; 11.91 GiB total capacity; 1.44 GiB already allocated; 33.94 MiB free; 1.54 GiB reserved in total by PyTorch)
解决办法:减小batchsize为1,cuda:0 ,还是报错
在报错行前加入释放缓存的语句:
if hasattr(torch.cuda, 'empty_cache'):
torch.cuda.empty_cache()
换gpu训练,成功!