【Pytorch学习笔记】模型模块02——模型的容器

最新推荐文章于 2025-05-23 14:00:19 发布

越轨

最新推荐文章于 2025-05-23 14:00:19 发布

阅读量673

点赞数 18

分类专栏： Pytorch学习笔记文章标签： pytorch 学习笔记人工智能 python

本文链接：https://blog.csdn.net/qq_50040241/article/details/148153461

版权

Pytorch学习笔记专栏收录该内容

7 篇文章

订阅专栏

Module的容器

容器就是存放不同的网络层，以便于模型训练时按照一定的顺序和要求执行不同的网络层。常用的容器包括：Sequential、ModuleList、ModuleDict、ParameterList和ParameterDict

Sequential

Sequential是PyTorch中最常用的容器之一，它可以按顺序包含多个网络层，数据会按照添加层的顺序依次流过每一层。

1. Sequential的基本特点

按照构造时的顺序执行各个模块
自动实现forward方法，不需要手动定义
结构简单，适合顺序结构的网络

2. 创建Sequential的方法

有两种主要方式创建Sequential：

# 方式1：使用Sequential构造函数
model1 = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 2)
)

# 方式2：使用OrderedDict
from collections import OrderedDict
model2 = nn.Sequential(OrderedDict([
    ('fc1', nn.Linear(10, 20)),
    ('relu', nn.ReLU()),
    ('fc2', nn.Linear(20, 2))
]))

3. Sequential的优缺点

优点：

代码简洁，易于理解和使用
自动按顺序执行，不需要定义forward方法
支持索引访问各层

缺点：

只能处理单一输入到单一输出的顺序结构
不适合处理多输入或分支结构的网络

4. 实际应用示例

import torch.nn as nn

# 创建一个分类网络
class ClassifierNet(nn.Module):
    def __init__(self, input_size, num_classes):
        super(ClassifierNet, self).__init__()
        
        self.features = nn.Sequential(
            nn.Linear(input_size, 512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Dropout(0.3)
        )
        
        self.classifier = nn.Sequential(
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

# 创建模型实例
model = ClassifierNet(input_size=784, num_classes=10)

5. Sequential的常用操作

访问特定层：model[0]
添加新层：model.append(new_layer)
扩展多个层：model.extend([layer1, layer2])
插入层：model.insert(index, new_layer)

通过这些操作，我们可以灵活地构建和修改顺序模型结构。Sequential的简洁性使其成为构建简单神经网络的首选工具。

6. Sequential的实现原理

Sequential的底层实现主要基于以下几个关键点：

继承自Module： Sequential继承自nn.Module，具备Module的所有基本功能
有序容器： 内部使用OrderedDict存储模块，保证执行顺序
自动forward： 自动实现forward函数，按序调用各个子模块

7. Sequential的调用机制

当数据通过Sequential模型时，调用过程如下：

class Sequential(Module):
    def forward(self, input):
        for module in self:
            input = module(input)
        return input

实际应用示例：

# 创建Sequential模型
model = nn.Sequential(
    nn.Linear(10, 20),    # 第一层
    nn.ReLU(),           # 第二层
    nn.Linear(20, 2)     # 第三层
)

# 当调用model(x)时，等价于：
def forward(x):
    x = model[0](x)  # 线性层
    x = model[1](x)  # ReLU激活
    x = model[2](x)  # 线性层
    return x

8. Sequential的高级用法

动态添加层： 可以在运行时动态修改网络结构
条件分支： 可以结合if语句实现简单的条件执行
层命名： 使用OrderedDict可以为每层指定名称

示例代码：

# 动态添加层
model = nn.Sequential()
model.add_module('fc1', nn.Linear(10, 20))
model.add_module('relu1', nn.ReLU())

# 条件分支示例
class ConditionalNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.feature_extractor = nn.Sequential(
            nn.Linear(10, 20),
            nn.ReLU()
        )
        self.classifier = nn.Sequential(
            nn.Linear(20, 2)
        )
    
    def forward(self, x, use_classifier=True):
        x = self.feature_extractor(x)
        if use_classifier:
            x = self.classifier(x)
        return x

通过这种设计，Sequential既保持了简单易用的特性，又提供了足够的灵活性来构建各种网络结构。

ModuleList

ModuleList是PyTorch中另一个重要的容器，它提供了一种灵活的方式来管理和组织神经网络的子模块。

1. ModuleList的基本特点

可以存储任意Module子类的实例
支持索引访问和迭代操作
自动注册所有子模块参数
需要手动实现forward方法

2. ModuleList vs Python List

ModuleList与普通Python列表的主要区别：

特性	ModuleList	Python List
参数注册	自动注册到模型	不会注册参数
设备迁移	自动迁移所有子模块	需要手动处理
状态字典	包含在state_dict中	不包含在state_dict中

3. 创建和使用ModuleList

import torch.nn as nn

class DynamicNet(nn.Module):
    def __init__(self, num_layers):
        super(DynamicNet, self).__init__()
        # 创建多个线性层
        self.layers = nn.ModuleList([
            nn.Linear(10, 10) for _ in range(num_layers)
        ])
    
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

# 创建包含3个层的模型
model = DynamicNet(num_layers=3)

4. ModuleList的常用操作

添加模块： layers.append(new_module)
扩展模块： layers.extend([module1, module2])
插入模块： layers.insert(index, module)
访问模块： layers[index]

示例代码：

class FlexibleNet(nn.Module):
    def __init__(self):
        super(FlexibleNet, self).__init__()
        self.layers = nn.ModuleList([nn.Linear(10, 10)])
        
    def add_layer(self, in_features, out_features):
        self.layers.append(nn.Linear(in_features, out_features))
        
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

5. ModuleList的实际应用场景

动态网络结构： 根据需求动态添加或删除层
循环神经网络： 实现多层RNN结构
残差网络： 构建可变深度的ResNet

实现示例：

class DynamicResNet(nn.Module):
    def __init__(self, num_blocks):
        super(DynamicResNet, self).__init__()
        self.blocks = nn.ModuleList([
            nn.Sequential(
                nn.Linear(256, 256),
                nn.ReLU(),
                nn.Linear(256, 256)
            ) for _ in range(num_blocks)
        ])
    
    def forward(self, x):
        identity = x
        for block in self.blocks:
            out = block(x)
            x = out + identity  # 残差连接
            identity = x
        return x

6. ModuleList的性能考虑

使用ModuleList时需要注意以下几点：

所有子模块都会被注册，占用内存空间
适合需要频繁修改网络结构的场景
对于固定结构的网络，Sequential可能更合适

通过合理使用ModuleList，我们可以构建更加灵活和动态的神经网络架构，特别适合需要在运行时修改网络结构的场景。

ModuleDict

ModuleDict是PyTorch中的一个容器类型，它允许我们使用字典的方式来组织和管理神经网络的子模块。

1. ModuleDict的基本特点

以键值对的形式存储模块
支持动态添加和删除模块
自动注册所有子模块的参数
支持字典式的访问方式

2. ModuleDict的创建和使用

import torch.nn as nn

class DynamicNet(nn.Module):
    def __init__(self):
        super(DynamicNet, self).__init__()
        self.choices = nn.ModuleDict({
            'conv': nn.Conv2d(10, 10, 3),
            'pool': nn.MaxPool2d(3),
            'linear': nn.Linear(10, 1)
        })
    
    def forward(self, x, choice):
        return self.choices[choice](x)

# 创建模型实例
model = DynamicNet()

3. ModuleDict的常用操作

添加模块： choices['new_key'] = new_module
删除模块： del choices['key']
更新模块： choices.update({'key': module})
检查键是否存在： 'key' in choices

4. ModuleDict的实际应用场景

以下是一个使用ModuleDict实现多种激活函数选择的示例：

class MultiActivationNet(nn.Module):
    def __init__(self):
        super(MultiActivationNet, self).__init__()
        self.linear = nn.Linear(10, 20)
        self.activations = nn.ModuleDict({
            'relu': nn.ReLU(),
            'leaky_relu': nn.LeakyReLU(),
            'sigmoid': nn.Sigmoid(),
            'tanh': nn.Tanh()
        })
    
    def forward(self, x, activation='relu'):
        x = self.linear(x)
        x = self.activations[activation](x)
        return x

5. ModuleDict vs 普通字典

ModuleDict与普通Python字典的主要区别：

特性	ModuleDict	Python Dict
参数管理	自动注册到模型	不注册参数
设备迁移	自动迁移所有模块	需手动处理
状态保存	包含在state_dict中	不保存状态

6. ModuleDict的高级用法

实现一个动态特征提取器：

class DynamicFeatureExtractor(nn.Module):
    def __init__(self):
        super(DynamicFeatureExtractor, self).__init__()
        self.extractors = nn.ModuleDict({
            'cnn': nn.Sequential(
                nn.Conv2d(3, 16, 3),
                nn.ReLU(),
                nn.MaxPool2d(2)
            ),
            'transformer': nn.TransformerEncoder(
                nn.TransformerEncoderLayer(d_model=256, nhead=8),
                num_layers=3
            ),
            'mlp': nn.Sequential(
                nn.Linear(256, 128),
                nn.ReLU(),
                nn.Linear(128, 64)
            )
        })
    
    def add_extractor(self, name, module):
        self.extractors[name] = module
    
    def forward(self, x, extractor_type):
        return self.extractors[extractor_type](x)

7. 最佳实践和注意事项

为确保代码可维护性，建议使用有意义的键名
在添加新模块时注意检查键是否已存在
考虑使用get()方法来安全访问模块
定期清理未使用的模块以优化内存使用

通过合理使用ModuleDict，我们可以构建更加灵活和模块化的神经网络架构，特别适合需要动态切换不同模块的场景。

ParameterList

ParameterList是PyTorch中用于管理模型参数的容器，它允许我们以列表的形式存储和操作可学习参数。

1. ParameterList的基本特点

存储和管理nn.Parameter类型的参数
支持索引访问和迭代操作
自动注册参数到模型中
参数会参与反向传播过程

2. 创建和使用ParameterList

import torch
import torch.nn as nn

class CustomModel(nn.Module):
    def __init__(self, num_layers):
        super(CustomModel, self).__init__()
        # 创建参数列表
        self.weights = nn.ParameterList([
            nn.Parameter(torch.randn(10, 10)) for _ in range(num_layers)
        ])
    
    def forward(self, x):
        for weight in self.weights:
            x = torch.mm(x, weight)
        return x

# 创建模型实例
model = CustomModel(num_layers=3)

3. ParameterList的常用操作

添加参数： weights.append(nn.Parameter(torch.randn(10, 10)))
访问参数： weights[index]
获取参数数量： len(weights)
迭代参数： for weight in weights

4. ParameterList vs Parameter数组

特性	ParameterList	Parameter数组
参数注册	自动注册	需手动注册
动态修改	支持	较难实现
内存管理	自动管理	需手动管理

5. 实际应用示例

class DynamicLinear(nn.Module):
    def __init__(self, input_size, hidden_sizes):
        super(DynamicLinear, self).__init__()
        sizes = [input_size] + hidden_sizes
        self.weights = nn.ParameterList([
            nn.Parameter(torch.randn(sizes[i], sizes[i+1]))
            for i in range(len(sizes)-1)
        ])
        self.biases = nn.ParameterList([
            nn.Parameter(torch.zeros(sizes[i+1]))
            for i in range(len(sizes)-1)
        ])
    
    def forward(self, x):
        for w, b in zip(self.weights, self.biases):
            x = torch.mm(x, w) + b
            x = torch.relu(x)
        return x

# 使用示例
model = DynamicLinear(input_size=10, hidden_sizes=[20, 15, 5])

这是一个使用PyTorch实现的动态线性神经网络层。详细解释这段代码：

1. 类定义和初始化

DynamicLinear继承自nn.Module，可以根据输入参数动态创建不同大小的线性层网络
参数包括：
- input_size：输入特征的维度
- hidden_sizes：一个列表，定义了各个隐藏层的维度

2. 参数创建

使用ParameterList创建两组参数：权重(weights)和偏置(biases)
weights包含了每层之间的转换矩阵，使用randn随机初始化
biases包含了每层的偏置项，初始化为0

3. 前向传播

在forward方法中，使用zip同时遍历权重和偏置
对每一层：
- 进行矩阵乘法(torch.mm)
- 添加偏置项
- 使用ReLU激活函数

4. 使用示例

示例中创建了一个网络，输入维度为10，有三个隐藏层，维度分别是20、15和5

这种实现方式的优点是可以灵活地创建不同层数和维度的神经网络，而不需要手动定义每一层。

6. 注意事项和最佳实践

确保添加的参数维度正确
注意参数的初始化方式
合理管理参数的数量以避免内存问题
在需要动态调整参数时使用ParameterList

ParameterList在需要动态管理模型参数或实现特殊网络结构时特别有用，它提供了灵活的参数管理方式，同时保证了参数的正确注册和梯度计算。

ParameterDict

ParameterDict是PyTorch中用于管理模型参数的字典式容器，它允许我们使用键值对的形式存储和访问模型参数。

1. ParameterDict的基本特点

以字典形式存储nn.Parameter类型的参数
支持键值对访问和动态更新
自动注册参数到模型中
参数会自动参与反向传播计算

2. 创建和使用ParameterDict

import torch
import torch.nn as nn

class LayerSelect(nn.Module):
    def __init__(self):
        super(LayerSelect, self).__init__()
        # 创建参数字典
        self.params = nn.ParameterDict({
            'weight1': nn.Parameter(torch.randn(10, 5)),
            'weight2': nn.Parameter(torch.randn(5, 3)),
            'bias1': nn.Parameter(torch.zeros(5)),
            'bias2': nn.Parameter(torch.zeros(3))
        })
    
    def forward(self, x, layer_name):
        if layer_name in self.params:
            return torch.mm(x, self.params[layer_name])
        return x

# 创建模型实例
model = LayerSelect()

3. ParameterDict的常用操作

添加参数： params['new_key'] = nn.Parameter(torch.randn(size))
删除参数： del params['key']
更新参数： params.update({'key': nn.Parameter(torch.randn(size))})
检查键是否存在： 'key' in params

4. 实际应用示例

class DynamicNetwork(nn.Module):
    def __init__(self):
        super(DynamicNetwork, self).__init__()
        self.layer_params = nn.ParameterDict({
            'input_layer': nn.Parameter(torch.randn(784, 256)),
            'hidden_layer': nn.Parameter(torch.randn(256, 128)),
            'output_layer': nn.Parameter(torch.randn(128, 10))
        })
        
        self.layer_biases = nn.ParameterDict({
            'input_bias': nn.Parameter(torch.zeros(256)),
            'hidden_bias': nn.Parameter(torch.zeros(128)),
            'output_bias': nn.Parameter(torch.zeros(10))
        })
    
    def add_layer(self, name, input_size, output_size):
        self.layer_params[name] = nn.Parameter(torch.randn(input_size, output_size))
        self.layer_biases[f"{name}_bias"] = nn.Parameter(torch.zeros(output_size))
    
    def forward(self, x):
        x = torch.mm(x, self.layer_params['input_layer']) + self.layer_biases['input_bias']
        x = torch.relu(x)
        x = torch.mm(x, self.layer_params['hidden_layer']) + self.layer_biases['hidden_bias']
        x = torch.relu(x)
        x = torch.mm(x, self.layer_params['output_layer']) + self.layer_biases['output_bias']
        return x

这段代码实现了一个动态神经网络类DynamicNetwork，详细解释其结构和功能：

1. 类的初始化

使用两个ParameterDict来存储网络参数：
- layer_params：存储各层的权重矩阵
- layer_biases：存储各层的偏置向量
默认网络结构为三层：
- 输入层：784→256维
- 隐藏层：256→128维
- 输出层：128→10维

2. 动态添加层

通过add_layer方法可以动态添加新的网络层
为新层创建随机初始化的权重矩阵和零初始化的偏置向量

3. 前向传播

前向传播过程包含三个主要步骤：
- 输入层：线性变换 + 偏置 + ReLU激活
- 隐藏层：线性变换 + 偏置 + ReLU激活
- 输出层：线性变换 + 偏置

4. 特点

使用ParameterDict使得参数管理更加灵活
支持动态添加新的网络层
自动处理参数的注册和梯度计算

5. ParameterDict vs 普通字典

特性	ParameterDict	Python Dict
参数注册	自动注册到模型	不注册参数
梯度计算	自动处理梯度	不处理梯度
设备迁移	自动迁移参数	需手动处理
状态保存	包含在state_dict中	不保存状态