【多卡训练错误】error indicates that your module has parameters that were not used in producing loss

错误记录

报错

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by
making sure all forward function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn’t able to locate the output tensors in the return value of your module’s forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 0: 160 161 182 183 204 205 230 231 252 253 274 275 330 331 414 415 438 439 462 463 486 487 512 513 536 537 560 561 584 585
In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error

解决经过

源代码
 class AttentionBlock(nn.Module):
    def __init__(
        self,
    ):
        super().__init__()
        self.encoder_kv = conv_nd(1, 512, channels * 2, 1) #这行没有注释掉
        self.encoder_qkv = conv_nd(1, 512, channels * 3, 1)
        self.trans = nn.Linear(resolution*resolution*9+128,resolution*resolution*9)
    def forward(self, x, encoder_out=None):
        b, c, *spatial = x.shape
        x = x.reshape(b, c, -1)
        qkv = self.qkv(self.norm(x))
        if encoder_out is not None:
            # encoder_out = self.encoder_kv(encoder_out)  #这行代码注释了,没有用self.encoder_kv
            encoder_out = self.encoder_qkv(encoder_out)
        return encode_out
错误原因

self.encoder_kv 在def__init__中写了,但是在forward中没有使用,导致to torch.nn.parallel.DistributedDataParallel出错。改正方法

修正代码

方法一:

class AttentionBlock(nn.Module):
   def __init__(
       self,
   ):
       super().__init__()
       #self.encoder_kv = conv_nd(1, 512, channels * 2, 1) #这行在forward中没有用到注释掉
       self.encoder_qkv = conv_nd(1, 512, channels * 3, 1)
       self.trans = nn.Linear(resolution*resolution*9+128,resolution*resolution*9)
   def forward(self, x, encoder_out=None):
       b, c, *spatial = x.shape
       x = x.reshape(b, c, -1)
       qkv = self.qkv(self.norm(x))
       if encoder_out is not None:
           # encoder_out = self.encoder_kv(encoder_out)  
           encoder_out = self.encoder_qkv(encoder_out)
       return encode_out

把self.encoder_kv = conv_nd(1, 512, channels * 2, 1)在forward中不用的函数给注释掉就行了,程序正常运行。

方法二:

            from torch.nn.parallel.distributed import DistributedDataParallel as DDP
            self.ddp_model = DDP(
               self.model,
               device_ids=[self.device],
               # output_device=self.device,
               # broadcast_buffers=False,
               # bucket_cap_mb=128,
               find_unused_parameters=True, #这个参数加上
           )

find_unused_parameters=True
也是非常有效

注意:当设置find_unused_parameters=True 时,记得加入如下命令代码,以查找有哪些未使用的参数,(自己判断一下是否真的不使用这些参数)
ls = [name for name,para in model.named_parameters() if para.grad==None]
print(ls)
  • 10
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
在PyTorch中,我们可以使用DataParallel模块来实现在单卡导入多卡训练模型。DataParallel可以将一个单卡模型并行复制到多个GPU卡上,并且在每个卡上独立地进行前向传播和梯度计算,最后再将梯度进行平均并更新单卡模型。 为了使用DataParallel,首先需要导入所需的库和模块: ```python import torch import torch.nn as nn from torch.utils.data import DataLoader ``` 然后,定义一个继承自nn.Module的模型,例如: ```python class MyModel(nn.Module): def __init__(self): super(MyModel, self).__init__() self.fc = nn.Linear(10, 2) def forward(self, x): return self.fc(x) ``` 接下来,创建一个DataLoader来加载训练数据: ```python dataset = ... dataloader = DataLoader(dataset, batch_size=128, shuffle=True) ``` 然后,创建一个模型实例并将其放在指定的GPU上: ```python device = torch.device("cuda:0") model = MyModel().to(device) ``` 接下来,使用DataParallel将模型复制到所有可用的GPU卡上: ```python model = nn.DataParallel(model) ``` 这样,模型就能够在多个GPU上并行运行了。在训练过程中,可以像使用单卡训练模型一样进行训练操作: ```python optimizer = torch.optim.Adam(model.parameters(), lr=0.001) criterion = nn.CrossEntropyLoss() for epoch in range(num_epochs): for images, labels in dataloader: images = images.to(device) labels = labels.to(device) outputs = model(images) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() ``` 需要注意的是,DataParallel模块会自动将模型进行复制,以及将输入数据划分到不同的GPU卡上进行运算,因此在定义模型时无需指定多卡运算。在进行推理或测试时,可以使用.module属性获取到原始的单卡模型,并将其放在对应的设备(GPU或CPU)上运行。 以上就是使用PyTorch实现单卡导入多卡训练模型的简要介绍。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值