nn.DataParallel设置单机多卡训练遇到的问题
如下代码所示
model= model.cuda()
model = nn.DataParallel(model, device_ids=[1, 2, 3])
此时,选择使用的GPU-id为 1, 2,3;而nn.DataParallel
默认使用0号GPU,所以上述代码回报错如下:
RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:0
解决办法:
device = torch.device("cuda:1" if use_cuda else "cpu")
model = model.to(device)
model = nn.DataParallel(model, device_ids=[1, 2, 3])
注意,"cuda:1"
与 device_ids=[1, 2, 3] 对应.