yolov5单机多卡训练报错

yolov5单机多卡训练报错

Traceback (most recent call last):
File “train.py”, line 638, in
main(opt)
File “train.py”, line 532, in main
train(opt.hyp, opt, device, callbacks)
File “train.py”, line 113, in train
data_dict = data_dict or check_dataset(data) # check if None
File “/home/stxx/renxs/anaconda3/envs/mmyolo/lib/python3.8/contextlib.py”, line 120, in exit
Traceback (most recent call last):
File “train.py”, line 638, in
main(opt)
File “train.py”, line 532, in main
train(opt.hyp, opt, device, callbacks)
File “train.py”, line 112, in train
with torch_distributed_zero_first(LOCAL_RANK):
File “/home/stxx/renxs/anaconda3/envs/mmyolo/lib/python3.8/contextlib.py”, line 113, in enter
next(self.gen)
return next(self.gen) File “/home/stxx/syy/yolov5-3class/yolov5/utils/torch_utils.py”, line 94, in torch_distributed_zero_first

File “/home/stxx/syy/yolov5-3class/yolov5/utils/torch_utils.py”, line 91, in torch_distributed_zero_first
dist.barrier(device_ids=[local_rank])
TypeError: barrier() got an unexpected keyword argument ‘device_ids’
dist.barrier(device_ids=[0])

TypeError: barrier() got an unexpected keyword argument ‘device_ids’
Traceback (most recent call last):
File “/home/stxx/renxs/anaconda3/envs/mmyolo/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/home/stxx/renxs/anaconda3/envs/mmyolo/lib/python3.8/runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “/home/stxx/syy/mmyolo_venv/mmyolo_venv/lib/python3.8/site-packages/torch/distributed/launch.py”, line 260, in
main()
File “/home/stxx/syy/mmyolo_venv/mmyolo_venv/lib/python3.8/site-packages/torch/distributed/launch.py”, line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,

解决方案

更换pytorch版本,我原来的版本是
torch1.7.1+cu110、 torchvision0.8.2
更换到
torch1.8.0+cu111 torchvision0.9.0+cu111
因为我的cuda是11,所以安装如下

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

就可以多卡运行train了

python -m torch.distributed.launch --nproc_per_node=2 --master_port 8089 train.py --device 2,3
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值