我的踩坑记录
报错一:
RuntimeError: Caught RuntimeError in replica 0 on device 0.
报错二:
RuntimeError: sizes of tensors must match except in dimension 1. expected size 1 but got size 2 for tensor number 1 in the list.
错误原因:
数据集和GPU个数需要呈现严格的倍数关系:data_number % GPU_number = 0, 否则就会呈现这样错误。原因在于DP通过第一维度(也就是batch_size)分配给不同的GPU。