YOLOv7 开启 --image-weights DDP 训练时候遇到的错误:RuntimeError: Tensors must be CUDA and dense
前提:
使用 DDP 模式来训练模型,我开启了 --image-weights 参数,结果报如下错误:
Traceback (most recent call last):
File "train.py", line 614, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 320, in train
dist.broadcast(indices, 0)
File "/home/xxx/conda/envs/landmark/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1193, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: Tensors must be CUDA and dense
原因:
--image-weights 参数 与 DDP 不兼容。
解决:
方法1: 不开启 --image-weights 参数。
方法2: so the natural solution is to train --image-weights on single GPU.
参考:
YOLOv5 作者说两者不兼容,他在新的代码中添加了一个提示。
https://github.com/ultralytics/yolov5/issues?q=tensors+must+be+cuda+and+dense+