pytorch中存储各层权重参数时的命名规则,为什么有些层的名字中带module.

1,对于__init__中使用self定义的变量会使用这个变量名作为存储时的名字。

self.conv1 = nn.Conv2d(3, 12, kernel_size=3, stride=1, padding=1) 

    卷积层有两个参数:权重和偏移项,上例对应的名称为conv1.weight、conv1.bias

self.bn1 = nn.BatchNorm2d(12)

    BN层有5个参数:bn1.weight、bn1.bias、bn1.running_mean、bn1.running_var、bn1.num_batches_tracked

2,当使用nn.Sequential时会根据传入的list的顺序对其进行编号,从0开始。

conv1 = nn.Conv2d(3, 12, kernel_size=3, stride=1, padding=1)
bn1 = nn.BatchNorm2d(12)
s1 = [conv1, bn1]
self.stage1 = nn.Sequential(*s1)

注意此时的conv1和bn1都没有self,stage1有self,由于Sequential将conv1和bn1进行顺序封装,因此conv1会被编号为stage1.0,bn1会被编号为stage1.1,具体结果如下:

stage1.0.weight、stage1.0.bias
stage1.1.weight、stage1.1.bias、stage1.1.running_mean、stage1.1.running_var、stage1.1.num_batches_tracked

3,当一个module被from torch.nn import DataParallel或者from torch.nn.parallel import DistributedDataParallel包围住后,会在这个变量名后面加上module.。

conv1 = nn.Conv2d(3, 12, kernel_size=3, stride=1, padding=1)
bn1 = nn.BatchNorm2d(12)
s1 = [conv1, bn1]
stage1 = nn.Sequential(*s1)
self.stage2 = DataParallel(stage1)

注意只有stage2前面有self,输出结果如下:

stage2.module.0.weight、stage2.module.0.bias
stage2.module.1.weight、stage2.module.1.bias、stage2.module.1.running_mean、stage2.module.1.running_var、stage2.module.1.num_batches_tracked

举例一:

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.nn import DataParallel
from torch.nn.parallel import DistributedDataParallel
import torch.distributed as dist


class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 12, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(12)
        self.conv2 = nn.Conv2d(12, 24, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(24)
        self.fc1 = nn.Linear(24 * 5 * 5, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = x.view(-1, 24 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


if __name__ == '__main__':
    # torch.cuda.set_device(dist.get_rank())
    # dist.init_process_group(backend="nccl", init_method="tcp://localhost:23456",
    #                         rank=dist.get_rank(), world_size=dist.get_world_size())
    model = CNN()
    # model = DataParallel(model)
    for name in model.state_dict():
        print(name)

输出结果:根据self后面具体的层输出对应的变量名

conv1.weight、conv1.bias
bn1.weight、bn1.bias、bn1.running_mean、bn1.running_var、bn1.num_batches_tracked
conv2.weight、conv2.bias
bn2.weight、bn2.bias、bn2.running_mean、bn2.running_var、bn2.num_batches_tracked
fc1.weight、fc1.bias
fc2.weight、fc2.bias

举例二:

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.nn import DataParallel
from torch.nn.parallel import DistributedDataParallel
import torch.distributed as dist


class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 12, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(12)
        # s1 = [self.conv1, self.bn1]
        self.s1 = [self.conv1, self.bn1]
        self.stage1 = nn.Sequential(*self.s1)
        self.conv2 = nn.Conv2d(12, 24, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(24)
        self.fc1 = nn.Linear(24 * 5 * 5, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = x.view(-1, 24 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


if __name__ == '__main__':
    # torch.cuda.set_device(dist.get_rank())
    # dist.init_process_group(backend="nccl", init_method="tcp://localhost:23456",
    #                         rank=dist.get_rank(), world_size=dist.get_world_size())
    model = CNN()
    # model = DataParallel(model)
    for name in model.state_dict():
        print(name)

结果:self.conv1和self.bn1通过self.s1传入Sequential,所以self.stage会根据出现顺序进行编号,但原本的self.conv1和self.bn1仍然存在,同时self.s1并没有,虽然他有self,但是他不是pytorch自带的层,是python的基本数据结构

conv1.weight、conv1.bias
bn1.weight、bn1.bias、bn1.running_mean、bn1.running_var、bn1.num_batches_tracked
stage1.0.weight、stage1.0.bias
stage1.1.weight、stage1.1.bias、stage1.1.running_mean、stage1.1.running_var、stage1.1.num_batches_tracked
conv2.weight、conv2.bias
bn2.weight、bn2.bias、bn2.running_mean、bn2.running_var、bn2.num_batches_tracked
fc1.weight、fc1.bias
fc2.weight、fc2.bias

举例三:

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.nn import DataParallel
from torch.nn.parallel import DistributedDataParallel
import torch.distributed as dist


class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        conv1 = nn.Conv2d(3, 12, kernel_size=3, stride=1, padding=1)
        bn1 = nn.BatchNorm2d(12)
        s1 = [conv1, bn1]
        self.stage1 = nn.Sequential(*s1)
        self.stage2 = DataParallel(self.stage1)
        self.conv2 = nn.Conv2d(12, 24, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(24)
        self.fc1 = nn.Linear(24 * 5 * 5, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.stage2(x)
        x = F.relu(self.bn2(self.conv2(x)))
        x = x.view(-1, 24 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


if __name__ == '__main__':
    # torch.cuda.set_device(dist.get_rank())
    # dist.init_process_group(backend="nccl", init_method="tcp://localhost:23456",
    #                         rank=dist.get_rank(), world_size=dist.get_world_size())
    model = CNN()
    model = DataParallel(model)
    for name in model.state_dict():
        print(name)

结果:self.stage1按照Sequential进行编号,self.stage通过DataParallel进行包裹,因此会在stage2后面多出module.,由于最后的model也被DataParallel包裹,所以CNN里面所有变量前面都多了module.

module.stage1.0.weight、module.stage1.0.bias
module.stage1.1.weight、module.stage1.1.bias、module.stage1.1.running_mean、module.stage1.1.running_var、module.stage1.1.num_batches_tracked
module.stage2.module.0.weight、module.stage2.module.0.bias
module.stage2.module.1.weight、module.stage2.module.1.bias、module.stage2.module.1.running_mean、module.stage2.module.1.running_var、module.stage2.module.1.num_batches_tracked
module.conv2.weight、module.conv2.bias
module.bn2.weight、module.bn2.bias、module.bn2.running_mean、module.bn2.running_var、module.bn2.num_batches_tracked
module.fc1.weight、module.fc1.bias
module.fc2.weight、module.fc2.bias

 

  • 20
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
PyTorch,`nn.Module`是所有神经网络模块的基类。它是一个封装了参数、计算方法以及其他网络组件的类,可以用来构建自己的神经网络模型。 每个`nn.Module`子类的构造函数都应该调用基类的构造函数。在`__init__`方法,我们可以定义网络的各个层、参数和其他组件。我们也可以在`forward`方法定义网络的前向传播过程,即输入数据经过一系列计算后得到输出结果。 `nn.Module`提供了很多实用的方法,例如`parameters`方法可以返回模型所有可训练的参数,`to`方法可以将模型转移到指定的设备上等。 示例代码: ```python import torch import torch.nn as nn class MyModel(nn.Module): def __init__(self): super(MyModel, self).__init__() self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1) self.relu = nn.ReLU(inplace=True) self.pool = nn.MaxPool2d(kernel_size=2, stride=2) self.fc = nn.Linear(16 * 14 * 14, 10) def forward(self, x): x = self.conv1(x) x = self.relu(x) x = self.pool(x) x = x.view(x.size(0), -1) x = self.fc(x) return x model = MyModel() input = torch.randn(1, 3, 28, 28) output = model(input) ``` 这里我们定义了一个简单的卷积神经网络模型,包括了一个卷积层、一个ReLU激活函数、一个最大池化层和一个全连接层。在`forward`方法,我们定义了输入数据的前向传播过程。我们可以通过调用`parameters`方法打印出模型的所有参数:`print(list(model.parameters()))`。我们还可以使用`to`方法将模型转移到GPU上:`model.to(device)`。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值