Reducer buckets have been rebuilt in this iteration.

最新推荐文章于 2025-03-02 19:14:16 发布

zisuina_2

最新推荐文章于 2025-03-02 19:14:16 发布

阅读量1.3w

点赞数

分类专栏：深度学习 pytorch cuda 文章标签： python

本文链接：https://blog.csdn.net/zisuina_2/article/details/121159052

版权

pytorch 同时被 3 个专栏收录

9 篇文章

订阅专栏

深度学习

7 篇文章

订阅专栏

cuda

3 篇文章

订阅专栏

在跑torch多GPU报错

“Reducer buckets have been rebuilt in this iteration.”原因是torch版本问题， torch1.7以上的distributed.py发生更改导致报错：

这玩意是distributed.py里的前向函数报错

def forward(self, inputs, *kwargs):           if self.ddp_join_enabled:               ones = torch.ones(                   1, device=self.device               )               work = dist.all_reduce(ones, group=self.process_group, async_op=True)               self.reducer._set_forward_pass_work_handle(                   work, self.ddp_join_divide_by_initial_world_size               )
# Calling _rebuild_buckets before forward compuation,
      # It may allocate new buckets before deallocating old buckets
      # inside _rebuild_buckets. To save peak memory usage,
      # call _rebuild_buckets before the peak memory usage increases
      # during forward computation.
      # This should be called only once during whole training period.
      if self.reducer._rebuild_buckets():
          logging.info("Reducer buckets have been rebuilt in this iteration.")

      if self.require_forward_param_sync:
          self._sync_params()

      if self.ddp_join_enabled:
          # Notify joined ranks whether they should sync in backwards pass or not.
          self._check_global_requires_backward_grad_sync(is_joined_rank=False)

      # ！！！
      if self.device_ids:
          inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
          if len(self.device_ids) == 1:
              output = self.module(*inputs[0], **kwargs[0])
          else:
            # 单进程多线程多卡的情况
              outputs = self.parallel_apply(self._module_copies[:len(inputs)], inputs, kwargs)
              output = self.gather(outputs, self.output_device)
      else:
          output = self.module(*inputs, **kwargs)

      if torch.is_grad_enabled() and self.require_backward_grad_sync:
          self.require_forward_param_sync = True
          # We'll return the output object verbatim since it is a freeform
          # object. We need to find any tensors in this object, though,
          # because we need to figure out which parameters were used during
          # this forward pass, to ensure we short circuit reduction for any
          # unused parameters. Only if `find_unused_parameters` is set.
          if self.find_unused_parameters:
          # 当DDP参数 find_unused_parameter 为 true 时，其会在 forward 结束时，启动一个回溯，标记出所有没被用到的 parameter，提前把这些设定为 ready，这样 backward 就可以在一个 subgraph 进行，但这样会牺牲一部分时间。
              self.reducer.prepare_for_backward(list(_find_tensors(output)))
          else:
              self.reducer.prepare_for_backward([])
      else:
          self.require_forward_param_sync = False

      return output

解决思路：
1、对torch进行降级，构建torch1.6环境
torch1.6 cuda10 torchvision 0.7.0

pip install torch==1.6.0 torchvision==0.7.0 

# CUDA 10.2
pip install torch==1.6.0 torchvision==0.7.0

# CUDA 10.1
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# CUDA 9.2
pip install torch==1.6.0+cu92 torchvision==0.7.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html

修改代码
已解决，适用我的问题
我的代码：

predicts, loss, loss_statics = model(data)

结构整体改为：

loss, loss_statics = model(data)

对于distributed.py DistributedDataParallel来说，它的forward只接受关于 Loss的返回值，predicits就不可以加入；之前torch1.6版本没有问题；