教你用Pytorch搭建第一个神经网络

summer_8102

于 2024-03-31 15:14:54 发布

阅读量1.1k

点赞数 12

文章标签： python pytorch 神经网络

本文链接：https://blog.csdn.net/qq_41715032/article/details/137195861

版权

文章目录

1.使用nn.Sequential定义神经网络
2. 自定义神经网络
3. 参数管理
4. 内置初始化

1.使用nn.Sequential定义神经网络

import torch 
from torch import nn
from torch.nn import functional as F

net = nn.Sequential(nn.Linear(20,256),nn.ReLU(),nn.Linear(256,10))
x = torch.rand(2,20)
net(x)

上述代码示例中，首先构建了一个简单的神经网络net，然后生成了一个随机的输入张量x，最后通过net(x)将x作为输入数据通过神经网络。具体实现的步骤和过程如下：

1.1构建神经网络：

神经网络 net 使用 nn.Sequential 构建，它按顺序包含了以下层：

① nn.Linear(20,256)：第一个线性层（全连接层），接收20维的输入向量，输出 256 维的向量。

② nn.ReLU(): 一个ReLU 激活函数，对第一个线性层的输出应用非线性激活，增加模型的非线性能力，使得网络可以学习更复杂的函数映射。

③ nn.Linear(256,10): 第二个线性层，将ReLU激活函数的256维输出转换成10维的输出向量。这10维的输出通常用于分类任务，每一维代表一个类别的分类。

1.2 生成输入向量：

通过torch.rand(2, 20)创建了一个形状为(2, 20)的张量x，这意味着有两个20维的向量，每个元素都是从[0, 1)区间内均匀分布生成的随机数。这个张量可以被视作包含两个样本的批次，每个样本是一个20维的输入向量。

1.3. 通过网络传递数据：

当执行net(x)时，输入的张量x按照net定义的层序列通过神经网络：

首先，x通过第一个线性层，得到一个形状为(2, 256)的张量，即两个256维的向量。
然后，这个(2, 256)的张量通过ReLU激活函数，仍然保持(2, 256)的形状，但所有负值都被设置为0。
最后，经过ReLU激活的张量通过第二个线性层，转换成一个形状为(2, 10)的张量，这表示两个样本的输出，每个样本对应10个类别的分数。

2. 自定义神经网络

nn.squential 定义了一种特殊的 Module ,我们自定义一个MLP实现nn.squential 一样的操作。

class MLP(nn.Module):
    def __init__ (self):
        super().__init__()
        self.hidden = nn.Linear(20,256)
        self.out = nn.Linear(256,10)
        
    def forward(self, x):
        return self.out(F.relu(self.hidden(x)))

net = MLP()
net(x)

还有一种写法：

class MySequential(nn.Module):
    def __init__(self, *args):
        super(MySequential, self).__init__()
        # 为每个模块命名并添加到当前模块中
        for idx, module in enumerate(args):
            # 这里使用add_module方法添加模块，确保模块被正确注册
            self.add_module(str(idx), module)
    
    def forward(self, x):
        # 依次通过所有模块
        for module in self._modules.values():
            x = module(x)
        return x

# 测试自定义的MySequential
net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
x = torch.rand(2, 20)  # 创建输入张量

output = net(x)  # 前向传播
print(output.shape)  # 输出结果的形状应该是[2, 10]
print(output)

3. 参数管理

假设我们用nn.sequential() 定义了net，我想取出net[2]的参数，怎么取呢？

net = nn.Sequential(nn.Linear(20,256),nn.ReLU(),nn.Linear(256,10))
x = torch.rand(2,20)
net(x)
print(net[2].state_dict())

得到的结果是：

OrderedDict([(‘weight’, tensor([[-0.0282, -0.0167, -0.0511, …, 0.0577, -0.0041, -0.0491],
[-0.0166, -0.0429, 0.0390, …, 0.0090, 0.0607, -0.0525],
[-0.0106, 0.0355, 0.0616, …, 0.0281, 0.0251, 0.0412],
…,
[ 0.0369, 0.0303, 0.0138, …, -0.0400, 0.0557, -0.0274],
[ 0.0587, 0.0082, -0.0232, …, -0.0490, -0.0268, -0.0015],
[-0.0077, -0.0336, 0.0506, …, 0.0259, -0.0537, -0.0339]])), (‘bias’, tensor([-0.0415, -0.0138, -0.0404, -0.0211, -0.0504, 0.0538, -0.0407, -0.0125,
0.0438, 0.0278]))])

它有 weight 和 bias 两个参数。

我们也可以直接访问每一层的参数。

print(type(net[2].bias))#打印它的类型
print(net[2].bias)
print(net[2].bias.data)

打印结果是：

<class ‘torch.nn.parameter.Parameter’>
Parameter containing:
tensor([-0.0415, -0.0138, -0.0404, -0.0211, -0.0504, 0.0538, -0.0407, -0.0125,
0.0438, 0.0278], requires_grad=True)
tensor([-0.0415, -0.0138, -0.0404, -0.0211, -0.0504, 0.0538, -0.0407, -0.0125,
0.0438, 0.0278])

因为我们还没有进行反向计算，所以grad=None

net[2].weight.grad == None

结果是：True

如果把整个网络的参数拿出来，可以这么做

print(*[(name, param.shape) for name, param in net[0].named_parameters()])
print(*[(name, param.shape) for name, param in net.named_parameters()])

因为有两个全连接层，第一层是0，第二个是ReLU没有参数的，拿不出来的，第三个是2。
(‘weight’, torch.Size([256, 20])) (‘bias’, torch.Size([256]))
(‘0.weight’, torch.Size([256, 20])) (‘0.bias’, torch.Size([256])) (‘2.weight’, torch.Size([10, 256])) (‘2.bias’, torch.Size([10]))

4. 内置初始化

如果是全连接层的话，我们就对它的 weight 做均值为0，方差为 0.01的正态分布操作。

def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight,mean=0,std=0.01)
        nn.init.zeros_(m.bias)

net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]