Pytorch分布式训练错误

参考   PyTorch分布式训练简介 - 云+社区 - 腾讯云

参考  Pytorch分布式训练错误 - 云+社区 - 腾讯云

subprocess.CalledProcessError: Command ‘[’/home/labpos/anaconda3/envs/idr/bin/python’, ‘-u’, ‘main_distribute.py’, ‘–local_rank=1’]’ returned non-zero exit status 1.

pytorch DistributedDataParallel训练时遇到的问题

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn’t able to locate the output tensors in the return value of your module’s forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /opt/conda/conda-

在DistributedDataParallel 中加入find_unused_parameters=True

model = torch.nn.parallel.DistributedDataParallel(model,device_ids=[args.local_rank],output_device=args.local_rank, find_unused_parameters=True)

评论 12 您还未登录,请先 登录 后发表或查看评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
©️2022 CSDN 皮肤主题:数字20 设计师:CSDN官方博客 返回首页

打赏作者

Wanderer001

ROIAlign原理

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值