[fairseq] 报错：TypeError: _broadcast_coalesced(): incompatible function arguments

最新推荐文章于 2023-03-20 21:33:08 发布

Muasci

最新推荐文章于 2023-03-20 21:33:08 发布

阅读量806

点赞数

分类专栏： fairseq 文章标签：机器翻译

本文链接：https://blog.csdn.net/jokerxsy/article/details/125550900

版权

fairseq 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

前言

我通过👇复写了模型的state_dict方法，具体就是给其增加了dynamic_mask（字典类型，里面是tensor），allocated_neuron_num（整型）。

def state_dict(self, destination=None, prefix='', keep_vars=False):
    state_dict = super().state_dict(destination, prefix, keep_vars)
    state_dict['model.dynamic_mask'] = gloVar.dynamic_mask
    state_dict['model.allocated_neuron_num'] = gloVar.allocated_neuron_num
    return state_dict

结果报错：

  File "/data3/syxu/sparsenmt_exp/sparsenmt/fairseq/fairseq/models/distributed_fairseq_model.py", line 58, in DistributedFairseqModel
    wrapped_model = DistributedDataParallel(
  File "/data3/syxu/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 580, in __init__
    self._sync_params_and_buffers(authoritative_rank=0)
  File "/data3/syxu/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 597, in _sync_params_and_buffers
    self._distributed_broadcast_coalesced(
  File "/data3/syxu/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
    dist._broadcast_coalesced(
TypeError: _broadcast_coalesced(): incompatible function arguments. The following argument types are supported:
    1. (process_group: torch._C._distributed_c10d.ProcessGroup, tensors: List[at::Tensor], buffer_size: int, src: int = 0) -> None

解决

不使用报错中显示的DistributedDataParallel。根据文档，这个在fairseq中体现为–ddp-backend参数。
报错时–ddp-backend=pytorch_ddp（默认），改为legacy_ddp、no_c10d都不会再报错。

参考

https://fairseq.readthedocs.io/en/latest/command_line_tools.html
https://blog.csdn.net/j___t/article/details/104368597

Muasci

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[fairseq] 报错：TypeError: _broadcast_coalesced(): incompatible function arguments

我通过👇复写了模型的state_dict方法，具体就是给其增加了dynamic_mask（字典类型，里面是tensor），allocated_neuron_num（整型）。结果报错：解决不使用报错中显示的DistributedDataParallel。根据文档，这个在fairseq中体现为–ddp-backend参数。报错时–ddp-backend=（默认），改为、都不会再报错。https://fairseq.readthedocs.io/en/latest/command_line_tools..
复制链接

扫一扫