自定义BatchNormal中running mean与running var在nn.DataParallel下不更新的问题

问题描述

自定义实现一个类似BatchNormal的功能层,在单GPU情况下,动量更新变量running mean均值与running var方差可以正常随训练迭代累积更新,但换上多GPU环境(使用nn.DataParallel包装模型)时模型性能下降非常离谱,然后查看了一下那个BatchNormal层的running mean和running var,发现每次迭代都从初始的0和1开始,原本应该是累积更新的,而且最后在保存state_dict()时,保存的也是0和1

解决方案

找了好久都没找到有类似问题的,最后在PyTorch社区发现了

link

将=替换为.copy_

self.running_mean.copy_(...)
# instead of
self.running_mean = (...)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
加载InpaintingModel_gen.pth预训练模型时出现:RuntimeError: Error(s) in loading state_dict for ContextEncoder: Missing key(s) in state_dict: "encoder.0.weight", "encoder.0.bias", "encoder.2.weight", "encoder.2.bias", "encoder.3.weight", "encoder.3.bias", "encoder.3.running_mean", "encoder.3.running_var", "encoder.5.weight", "encoder.5.bias", "encoder.6.weight", "encoder.6.bias", "encoder.6.running_mean", "encoder.6.running_var",...并且载入的模型为:class ContextEncoder(nn.Module): def init(self): super(ContextEncoder, self).init() # 编码器 self.encoder = nn.Sequential( nn.Conv2d(4, 64, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(128), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(256), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), ) # 解码器 self.decoder = nn.Sequential( nn.ConvTranspose2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.ConvTranspose2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.ConvTranspose2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.ConvTranspose2d(512, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.ConvTranspose2d(64, 3, kernel_size=4, stride=2, padding=1), nn.Sigmoid(), ) def forward(self, x): x = self.encoder(x) x = self.decoder(x) return x 要怎么改
最新发布
05-11
根据报错信息可以看到,预训练模型缺少了一些键对应的值,这些键是`self.encoder`的权重和偏置参数。因此,我们需要在模型的`__init__`方法对`self.encoder`的权重和偏置进行初始化,这样就可以让预训练模型的键和模型的键对应上。 具体来说,可以在`__init__`方法添加如下代码: ``` self.encoder[0].weight = nn.Parameter(torch.zeros_like(self.encoder[0].weight)) self.encoder[0].bias = nn.Parameter(torch.zeros_like(self.encoder[0].bias)) self.encoder[2].weight = nn.Parameter(torch.zeros_like(self.encoder[2].weight)) self.encoder[2].bias = nn.Parameter(torch.zeros_like(self.encoder[2].bias)) self.encoder[3].weight = nn.Parameter(torch.zeros_like(self.encoder[3].weight)) self.encoder[3].bias = nn.Parameter(torch.zeros_like(self.encoder[3].bias)) self.encoder[5].weight = nn.Parameter(torch.zeros_like(self.encoder[5].weight)) self.encoder[5].bias = nn.Parameter(torch.zeros_like(self.encoder[5].bias)) self.encoder[6].weight = nn.Parameter(torch.zeros_like(self.encoder[6].weight)) self.encoder[6].bias = nn.Parameter(torch.zeros_like(self.encoder[6].bias)) self.encoder[6].running_mean = nn.Parameter(torch.zeros_like(self.encoder[6].running_mean)) self.encoder[6].running_var = nn.Parameter(torch.zeros_like(self.encoder[6].running_var)) ``` 这段代码,我们对`self.encoder`的前7个卷积层的权重和偏置进行了初始化,并对第7个卷积层的`running_mean`和`running_var`也进行了初始化。这样,在加载预训练模型时,就可以正确地对应到模型的键,避免了报错。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值