[fairseq] 报错:TypeError: _broadcast_coalesced(): incompatible function arguments

前言

我通过👇复写了模型的state_dict方法,具体就是给其增加了dynamic_mask(字典类型,里面是tensor),allocated_neuron_num(整型)。

def state_dict(self, destination=None, prefix='', keep_vars=False):
    state_dict = super().state_dict(destination, prefix, keep_vars)
    state_dict['model.dynamic_mask'] = gloVar.dynamic_mask
    state_dict['model.allocated_neuron_num'] = gloVar.allocated_neuron_num
    return state_dict

结果报错:

  File "/data3/syxu/sparsenmt_exp/sparsenmt/fairseq/fairseq/models/distributed_fairseq_model.py", line 58, in DistributedFairseqModel
    wrapped_model = DistributedDataParallel(
  File "/data3/syxu/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 580, in __init__
    self._sync_params_and_buffers(authoritative_rank=0)
  File "/data3/syxu/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 597, in _sync_params_and_buffers
    self._distributed_broadcast_coalesced(
  File "/data3/syxu/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
    dist._broadcast_coalesced(
TypeError: _broadcast_coalesced(): incompatible function arguments. The following argument types are supported:
    1. (process_group: torch._C._distributed_c10d.ProcessGroup, tensors: List[at::Tensor], buffer_size: int, src: int = 0) -> None

解决

不使用报错中显示的DistributedDataParallel。根据文档,这个在fairseq中体现为–ddp-backend参数。
报错时–ddp-backend=pytorch_ddp(默认),改为legacy_ddpno_c10d都不会再报错。

参考

https://fairseq.readthedocs.io/en/latest/command_line_tools.html
https://blog.csdn.net/j___t/article/details/104368597

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值