4-3.nn.functional和nn.Module

菜鸟Octopus

已于 2024-07-09 10:39:34 修改

阅读量825

点赞数 8

分类专栏： pytorch 文章标签：机器学习数据挖掘人工智能

于 2024-07-09 10:36:32 首次发布

本文链接：https://blog.csdn.net/zy345293721/article/details/140288997

版权

pytorch 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

文章最前：我是Octopus，这个名字来源于我的中文名–章鱼；我热爱编程、热爱算法、热爱开源。所有源码在我的个人github ；这博客是记录我学习的点点滴滴，如果您对 Python、Java、AI、算法有兴趣，可以关注我的动态，一起学习，共同进步。

import torch 
import torchkeras
print("torch.__version__="+torch.__version__) 
print("torchkeras.__version__="+torchkeras.__version__)

torch.__version__=2.0.1
torchkeras.__version__=3.9.3

一，nn.functional 和 nn.Module

前面我们介绍了Pytorch的张量的结构操作和数学运算中的一些常用API。

利用这些张量的API我们可以构建出神经网络相关的组件(如激活函数，模型层，损失函数)。

Pytorch和神经网络相关的功能组件大多都封装在 torch.nn模块下。

这些功能组件的绝大部分既有函数形式实现，也有类形式实现。

其中nn.functional(一般引入后改名为F)有各种功能组件的函数实现。例如：

(激活函数)

F.relu
F.sigmoid
F.tanh
F.softmax

(模型层)

F.linear
F.conv2d
F.max_pool2d
F.dropout2d
F.embedding

(损失函数)

F.binary_cross_entropy
F.mse_loss
F.cross_entropy

为了便于对参数进行管理，一般通过继承 nn.Module 转换成为类的实现形式，并直接封装在 nn 模块下。例如：

(激活函数)

nn.ReLU
nn.Sigmoid
nn.Tanh
nn.Softmax

(模型层)

nn.Linear
nn.Conv2d
nn.MaxPool2d
nn.Dropout2d
nn.Embedding

(损失函数)

nn.BCELoss
nn.MSELoss
nn.CrossEntropyLoss

实际上nn.Module除了可以管理其引用的各种参数，还可以管理其引用的子模块，功能十分强大。

import torch

torch.relu(torch.tensor(-1.0))

tensor(0.)

import torch.nn.functional as F

F.relu(torch.tensor(-1.0))

tensor(0.)

二，使用nn.Module来管理参数(配合nn.Parameter使用)

在Pytorch中，模型的参数是需要被优化器训练的，因此，通常要设置参数为 requires_grad = True 的张量。

同时，在一个模型中，往往有许多的参数，要手动管理这些参数并不是一件容易的事情。

Pytorch一般将参数用nn.Parameter来表示，并且用nn.Module来管理其结构下的所有参数。

import torch 
from torch import nn 
import torch.nn.functional  as F

torch.randn(2,2,requires_grad = True)

tensor([[0.1829, 0.0693],
        [0.0767, 1.2441]], requires_grad=True)

# nn.Parameter 具有 requires_grad = True 属性
w = nn.Parameter(torch.randn(2,2))
print(w)
print(w.requires_grad)

Parameter containing:
tensor([[-0.8092, -0.8830],
        [ 1.6357, -0.1740]], requires_grad=True)
True

# nn.ParameterList 可以将多个nn.Parameter组成一个列表
params_list = nn.ParameterList([nn.Parameter(torch.rand(8,i)) for i in range(1,3)])
print(params_list)
print(params_list[0].requires_grad)

ParameterList(
    (0): Parameter containing: [torch.float32 of size 8x1]
    (1): Parameter containing: [torch.float32 of size 8x2]
)
True

# nn.ParameterDict 可以将多个nn.Parameter组成一个字典
params_dict = nn.ParameterDict({"a":nn.Parameter(torch.rand(2,2)),
                               "b":nn.Parameter(torch.zeros(2))})
print(params_dict)
print(params_dict["a"].requires_grad)

ParameterDict(
    (a): Parameter containing: [torch.FloatTensor of size 2x2]
    (b): Parameter containing: [torch.FloatTensor of size 2]
)
True

# 可以用Module将它们管理起来
# module.parameters()返回一个生成器，包括其结构下的所有parameters

module = nn.Module()
module.w = nn.Parameter(torch.randn(2,2))
module.params_list = nn.ParameterList([nn.Parameter(torch.rand(8,i)) for i in range(1,3)])
module.params_dict = nn.ParameterDict({"a":nn.Parameter(torch.rand(2,2)),
                               "b":nn.Parameter(torch.zeros(2))})

num_param = 0
for param in module.named_parameters():
    print(param,"\n")
    num_param = num_param + 1
print("number of Parameters =",num_param)

('w', Parameter containing:
tensor([[-1.2390,  0.3316],
        [-0.4232, -0.0090]], requires_grad=True)) 

('params_list.0', Parameter containing:
tensor([[0.8785],
        [0.6456],
        [0.4697],
        [0.8962],
        [0.1122],
        [0.4837],
        [0.8089],
        [0.0515]], requires_grad=True)) 

('params_list.1', Parameter containing:
tensor([[0.7440, 0.5626],
        [0.2430, 0.0113],
        [0.5884, 0.0815],
        [0.7125, 0.4120],
        [0.7275, 0.1608],
        [0.4658, 0.0085],
        [0.8578, 0.7290],
        [0.0327, 0.2239]], requires_grad=True)) 

('params_dict.a', Parameter containing:
tensor([[0.6698, 0.5646],
        [0.2482, 0.8258]], requires_grad=True)) 

('params_dict.b', Parameter containing:
tensor([0., 0.], requires_grad=True)) 

number of Parameters = 5

#实践当中，一般通过继承nn.Module来构建模块类，并将所有含有需要学习的参数的部分放在构造函数中。

#以下范例为Pytorch中nn.Linear的源码的简化版本
#可以看到它将需要学习的参数放在了__init__构造函数中，并在forward中调用F.linear函数来实现计算逻辑。

class Linear(nn.Module):
    __constants__ = ['in_features', 'out_features']

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

三，使用nn.Module来管理子模块

一般情况下，我们都很少直接使用 nn.Parameter来定义参数构建模型，而是通过一些拼装一些常用的模型层来构造模型。

这些模型层也是继承自nn.Module的对象,本身也包括参数，属于我们要定义的模块的子模块。

nn.Module提供了一些方法可以管理这些子模块。

children() 方法: 返回生成器，包括模块下的所有子模块。
named_children()方法：返回一个生成器，包括模块下的所有子模块，以及它们的名字。
modules()方法：返回一个生成器，包括模块下的所有各个层级的模块，包括模块本身。
named_modules()方法：返回一个生成器，包括模块下的所有各个层级的模块以及它们的名字，包括模块本身。

其中chidren()方法和named_children()方法较多使用。

modules()方法和named_modules()方法较少使用，其功能可以通过多个named_children()的嵌套使用实现。

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        
        self.embedding = nn.Embedding(num_embeddings = 10000,embedding_dim = 3,padding_idx = 1)
        self.conv = nn.Sequential()
        self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
        self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_1",nn.ReLU())
        self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
        self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_2",nn.ReLU())
        
        self.dense = nn.Sequential()
        self.dense.add_module("flatten",nn.Flatten())
        self.dense.add_module("linear",nn.Linear(6144,1))
        
    def forward(self,x):
        x = self.embedding(x).transpose(1,2)
        x = self.conv(x)
        y = self.dense(x)
        return y
    
net = Net()

i = 0
for child in net.children():
    i+=1
    print(child,"\n")
print("child number",i)

Embedding(10000, 3, padding_idx=1) 

Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
) 

child number 3

i = 0
for name,child in net.named_children():
    i+=1
    print(name,":",child,"\n")
print("child number",i)

embedding : Embedding(10000, 3, padding_idx=1) 

conv : Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

dense : Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
) 

child number 3

i = 0
for module in net.modules():
    i+=1
    print(module)
print("module number:",i)

Net(
  (embedding): Embedding(10000, 3, padding_idx=1)
  (conv): Sequential(
    (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
    (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_1): ReLU()
    (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
    (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_2): ReLU()
  )
  (dense): Sequential(
    (flatten): Flatten(start_dim=1, end_dim=-1)
    (linear): Linear(in_features=6144, out_features=1, bias=True)
  )
)
Embedding(10000, 3, padding_idx=1)
Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
)
Conv1d(3, 16, kernel_size=(5,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Conv1d(16, 128, kernel_size=(2,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
)
Flatten(start_dim=1, end_dim=-1)
Linear(in_features=6144, out_features=1, bias=True)
module number: 12

下面我们通过named_children方法找到embedding层，并将其参数设置为不可训练(相当于冻结embedding层)。

children_dict = {name:module for name,module in net.named_children()}

print(children_dict)
embedding = children_dict["embedding"]
embedding.requires_grad_(False) #冻结其参数

{'embedding': Embedding(10000, 3, padding_idx=1), 'conv': Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
), 'dense': Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
)}





Embedding(10000, 3, padding_idx=1)

#可以看到其第一层的参数已经不可以被训练了。
for param in embedding.parameters():
    print(param.requires_grad)
    print(param.numel())

False
30000

from torchkeras import summary
summary(net,input_shape = (200,),input_dtype = torch.LongTensor);
# 不可训练参数数量增加

--------------------------------------------------------------------------
Layer (type)                            Output Shape              Param #
==========================================================================
Embedding-1                             [-1, 200, 3]               30,000
Conv1d-2                               [-1, 16, 196]                  256
MaxPool1d-3                             [-1, 16, 98]                    0
ReLU-4                                  [-1, 16, 98]                    0
Conv1d-5                               [-1, 128, 97]                4,224
MaxPool1d-6                            [-1, 128, 48]                    0
ReLU-7                                 [-1, 128, 48]                    0
Flatten-8                                 [-1, 6144]                    0
Linear-9                                     [-1, 1]                6,145
==========================================================================
Total params: 40,625
Trainable params: 10,625
Non-trainable params: 30,000
--------------------------------------------------------------------------
Input size (MB): 0.000763
Forward/backward pass size (MB): 0.287788
Params size (MB): 0.154972
Estimated Total Size (MB): 0.443523
--------------------------------------------------------------------------

菜鸟Octopus

关注

8
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
4-3.nn.functional和nn.Module

文章最前：我是Octopus，这个名字来源于我的中文名–章鱼；我热爱编程、热爱算法、热爱开源。所有源码在我的个人github；这博客是记录我学习的点点滴滴，如果您对 Python、Java、AI、算法有兴趣，可以关注我的动态，一起学习，共同进步。
复制链接

扫一扫

专栏目录