【中级】用nn.Module，nn.Sequential，nn.Functional 构建网络

最新推荐文章于 2023-10-09 16:02:04 发布

WeissSama

最新推荐文章于 2023-10-09 16:02:04 发布

阅读量483

点赞数

分类专栏： pytorch 文章标签：深度学习 pytorch

本文链接：https://blog.csdn.net/Bismarckczy/article/details/128778845

版权

pytorch 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

之前写了一遍文章上文链接简单介绍用nn.Sequential来构造神经网络的方法，本文进一步介绍更灵活的方法来构造神经网络。

Pytorch中，模型的参数用nn.Parameter来表达，但是要手动去管理这些参数不容易，所以我们构建模型一般使用nn.Module，管理参数也是使用nn.Module和它的子类。
nn.Parameter 具有require_grad=True的属性
nn.ParameterList 可以将多个nn.Parameter组成一个列表
nn.ParameterDict 可以将多个nn.Parameter组成一个字典
例子如下

import torch
from torch import nn

w = troch.nn.Parameter(torch.randn(2,2))
print(w)
print(w.requires_grad)
输出结果：
Parameter containing:
tensor([[0.5738, 0.6093],
        [0.1924, 0.0886]], requires_grad=True)
True
================================================================

w1 = nn.Parameter(torch.rand(2, 2))
w2 = nn.Parameter(torch.rand(2, 2))
para_l = nn.ParameterList([w1, w2])
print(para_l)
print(para_l[0])
输出结果：
ParameterList(
    (0): Parameter containing: [torch.float32 of size 2x2]
    (1): Parameter containing: [torch.float32 of size 2x2]
)
Parameter containing:
tensor([[0.9076, 0.4244],
        [0.1826, 0.7291]], requires_grad=True)
================================================================
para_dict = nn.ParameterDict({
    'a': nn.Parameter(torch.rand(2, 2)),
    'b': nn.Parameter(torch.rand(2, 2))
})
print(para_dict)
print(para_dict['a'])
输出结果：
ParameterDict(
    (a): Parameter containing: [torch.FloatTensor of size 2x2]
    (b): Parameter containing: [torch.FloatTensor of size 2x2]
)
Parameter containing:
tensor([[0.8930, 0.3071],
        [0.3121, 0.7276]], requires_grad=True)

使用nn.Module管理nn.Parameter【这部分不重要】

# 可以用Module将它们管理起来
# module.parameters()返回一个生成器，包括其结构下的所有parameters

module = nn.Module()
module.w = w
module.params_list = params_list
module.params_dict = params_dict

num_param = 0
for param in module.parameters():
    print(param,"\n")
    num_param = num_param + 1
print("number of Parameters =",num_param)
输出结果
Parameter containing:
tensor([[0.1886, 0.7707],
        [0.7998, 0.8668]], requires_grad=True) 
Parameter containing:
tensor([[0.3607, 0.2619],
        [0.0187, 0.2842]], requires_grad=True) 
Parameter containing:
tensor([[0.0505, 0.9494],
        [0.1531, 0.6120]], requires_grad=True) 
Parameter containing:
tensor([[0.5063, 0.5944],
        [0.0197, 0.2520]], requires_grad=True) 
Parameter containing:
tensor([[0.1467, 0.2571],
        [0.7267, 0.6117]], requires_grad=True) 
number of Parameters = 5

一般将有学习参数的的函数放在构造函数中，
下面可以看看nn.Module的子类nn.Linear是如何与nn.Parameter还有nn.Functional互相作用的

class Linear(nn.Module):
    __constants__ = ['in_features', 'out_features']

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

用nn.Module来管理子模块【这部分比较重要】

nn.Module提供了很多方法来管理子模块
其中比较重要的有下面几个
children() 方法:
返回生成器，包括模块下的所有子模块。
named_children() 方法：
返回一个生成器，包括模块下的所有子模块，以及它们的名字。
modules() 方法：
返回一个生成器，包括模块下的所有各个层级的模块，包括模块本身。
named_modules() 方法：
返回一个生成器，包括模块下的所有各个层级的模块以及它们的名字，包括模块本身。

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        
        self.embedding = nn.Embedding(num_embeddings = 10000,embedding_dim = 3,padding_idx = 1)
        self.conv = nn.Sequential()
        self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
        self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_1",nn.ReLU())
        self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
        self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_2",nn.ReLU())
        
        self.dense = nn.Sequential()
        self.dense.add_module("flatten",nn.Flatten())
        self.dense.add_module("linear",nn.Linear(6144,1))
        self.dense.add_module("sigmoid",nn.Sigmoid())
        
    def forward(self,x):
        x = self.embedding(x).transpose(1,2)
        x = self.conv(x)
        y = self.dense(x)
        return y
    
net = Net()

i = 0
for child in net.children():
    i+=1
    print(child,"\n")
print("child number",i)
# children是以Sequential为单位

Embedding(10000, 3, padding_idx=1)  	#children 1

Sequential(												#children2
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

Sequential(											#children3
  (flatten): Flatten()
  (linear): Linear(in_features=6144, out_features=1, bias=True)
  (sigmoid): Sigmoid()
) 

child number 3

i = 0
for name,child in net.named_children():
    i+=1
    print(name,":",child,"\n")
print("child number",i)

embedding : Embedding(10000, 3, padding_idx=1) 

conv : Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

dense : Sequential(
  (flatten): Flatten()
  (linear): Linear(in_features=6144, out_features=1, bias=True)
  (sigmoid): Sigmoid()
) 

child number 3

可以通过named_childred方法来索引相应的层

children_dict = {name:module for name,module in net.named_children()}
print(children_dict)
embedding = children_dict['embedding']
embedding.requires_grad_(False) #冻结其参数，这样可以控制训练参数数目

（1）model.children()和model.named_children()方法返回的是迭代器iterator；
（2）model.children():每一次迭代返回的每一个元素实际上是 Sequential 类型,而Sequential类型又可以使用下标index索引来获取每一个Sequenrial 里面的具体层，比如conv层、dense层等；
（3）model.named_children():每一次迭代返回的每一个元素实际上是一个元组类型，元组的第一个元素是名称，第二个元素就是对应的层或者是Sequential。

类似的：
（1）model.modules()和model.named_modules()方法返回的是迭代器iterator；
（2）model的modules()方法和named_modules()方法都会将整个模型的所有构成（包括包装层、单独的层、自定义层等）由浅入深依次遍历出来，只不过modules()返回的每一个元素是直接返回的层对象本身，而named_modules()返回的每一个元素是一个元组，第一个元素是名称，第二个元素才是层对象本身。
（3）如何理解children和modules之间的这种差异性。注意pytorch里面不管是模型、层、激活函数、损失函数都可以当成是Module的拓展，所以modules和named_modules会层层迭代，由浅入深，将每一个自定义块block、然后block里面的每一个层都当成是module来迭代。而children就比较直观，就表示的是所谓的“孩子”，所以没有层层迭代深入。