Pytorch 实现深度学习

兔子牙丫丫

已于 2023-12-11 15:28:12 修改

阅读量561

点赞数 10

分类专栏：深度学习文章标签： pytorch 神经网络人工智能

于 2023-12-08 17:09:43 首次发布

本文链接：https://blog.csdn.net/qq_43570025/article/details/134883130

版权

深度学习专栏收录该内容

14 篇文章

订阅专栏

构建自定义层

不带参数的层

下面的CenteredLayer类实现输入数据减去其均值。

import torch
import torch.nn.functional as F
from torch import nn

# 我们只需继承基础层类nn.Module并实现前向传播功能。
class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

# 让我们向该层提供一些数据，验证它是否能按预期工作。

layer = CenteredLayer()
layer(torch.FloatTensor([1, 2, 3, 4, 5]))

net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())

#作为额外的健全性检查，我们可以在向该网络发送随机数据后，检查均值是否为0。
#由于我们处理的是浮点数，因为存储精度的原因，我们仍然可能会看到一个非常小的非零数。


Y = net(torch.rand(4, 8))
Y.mean()

带参数的线性层

class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))
    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

linear = MyLinear(5, 3)
linear.weight

linear(torch.rand(2, 5))


net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
net(torch.rand(2, 64))

加载和保存张量、网络模型

加载和保存张量

对于单个张量，我们可以直接调用`load`和`save`函数分别读写它们。
这两个函数都要求我们提供一个名称，`save`要求将要保存的变量作为输入。

import torch
from torch import nn
from torch.nn import functional as F

x = torch.arange(4)
torch.save(x, 'x-file')

#我们现在可以将存储在文件中的数据读回内存。


x2 = torch.load('x-file')
x2

#我们可以存储一个张量列表，然后把它们读回内存。


y = torch.zeros(4)
torch.save([x, y],'x-files')
x2, y2 = torch.load('x-files')
(x2, y2)

#我们甚至可以(写入或读取从字符串映射到张量的字典)。 当我们要读取或写入模型中的所有权重时，这很方便。

mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
mydict2

加载和保存网络模型

保存单个权重向量（或其他张量）确实有用，但是如果我们想保存整个模型，并在以后加载它们，单独保存每个向量则会变得很麻烦。毕竟，我们可能有数百个参数散布在各处。因此，深度学习框架提供了内置函数来保存和加载整个网络。需要注意的一个重要细节是，这将保存模型的参数而不是保存整个模型。例如，如果我们有一个3层多层感知机，我们需要单独指定架构。因为模型本身可以包含任意代码，所以模型本身难以序列化。因此，为了恢复模型，我们需要用代码生成架构，然后从磁盘加载参数。让我们从熟悉的多层感知机开始尝试一下。

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)

    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
X = torch.randn(size=(2, 20))
Y = net(X)

#接下来，我们将模型的参数存储在一个叫做“mlp.params”的文件中。
torch.save(net.state_dict(), 'mlp.params')

为了恢复模型，我们[实例化了原始多层感知机模型的一个备份
这里我们不需要随机初始化模型参数，而是(直接读取文件中存储的参数。)

clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()

由于两个实例具有相同的模型参数，在输入相同的`X`时，两个实例的计算结果应该相同。让我们来验证一下。

Y_clone = clone(X)
Y_clone == Y

网络参数初始化

均匀分布

torch.nn.init.uniform_(tensor, a=0.0, b=1.0)

使输入的张量服从（a,b）的均匀分布并返回。
参数

tensor – n维张量
a –均匀分布的上界
b –均匀分布的下界

正态分布

torch.nn.init.normal_(tensor, mean=0.0, std=1.0)

从给定的均值和标准差的正态分布N(mean,std2)中生成值，初始化张量。
参数

tensor – n维张量
平均值 –正态分布的平均值
std –正态分布的标准偏差

常数

torch.nn.init.constant_(tensor, val)

以一确定数值初始化张量。

参数

tensor – n维张量
val –指定的值

类似的torch.nn.init.ones_(tensor)、torch.nn.init.zeros_(tensor)分别以1、0初始化张量。

用定值1初始化

torch.nn.init.ones_(tensor)

用定值0初始化

torch.nn.init.zeros_(tensor)

使用单位矩阵进行初始化

torch.nn.init.eye_(tensor)

xavier_uniform

torch.nn.init.xavier_uniform_(tensor, gain=1.0)

Xavier 初始化是一种常用的权重初始化方法，旨在解决深度神经网络训练过程中的梯度消失和梯度爆炸问题。该方法通过根据网络的输入和输出维度来初始化权重，使得前向传播和反向传播过程中的信号保持相对一致的方差。

在 Xavier 初始化中，权重的初始化范围由输入和输出维度共同决定。具体而言，Xavier 初始化通过从均匀分布中抽取权重值，使得权重的方差等于输入和输出维度之和的倒数。这样可以有效地避免信号在网络中过度衰减或放大。

nn.init.xavier_uniform_ 是 Xavier 初始化的一种实现方式之一。它会对传入的张量进行原位修改，将张量中的数值初始化为均匀分布中的随机值，范围为'±sqrt(6 / (fan_in + fan_out))'，其中 fan_in 和 fan_out 分别表示张量的输入维度和输出维度。

根据Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010)所述方法，从均匀分布U(−a, a)中采样，初始化输入张量，其中a的值由如下公式确定
在这里插入图片描述
参数

tensor – n维张量
gain –可选的缩放因子
其中的gain可以用torch.nn.init.calculate_gain(nonlinearity, param=None)方法得到，此方法其实就是一查表，背后对应的表格如下

以下是 nn.init.xavier_uniform_ 的使用示例：

import torch
import torch.nn as nn

# 创建一个形状为 (3, 3) 的张量
weight = torch.empty(3, 3)

# 使用 Xavier 初始化方法对权重进行初始化
nn.init.xavier_uniform_(weight)

# 打印初始化后的权重
print(weight)

nn.init.xavier_uniform_ 方法只适用于权重的初始化，不适用于偏置项（bias）。对于偏置项的初始化，可以使用其他适当的方法，例如常数初始化或零初始化。

nn.init.xavier_uniform_(self.gate_weight, gain=nn.init.calculate_gain('sigmoid'))
在给定的上行代码中，它被用于初始化名为 self.gate_weight 的参数，并使用了 sigmoid 激活函数的增益（gain）。

xavier_normal

torch.nn.init.xavier_normal_(tensor, gain=1.0)

按照Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010)中描述的方法，从正态分布N(0,std2)中采样，初始化输入张量，其中std值由下式确定：
在这里插入图片描述
参数

tensor – n维张量
gain –可选的缩放因子

注：Xavier初始化方法对ReLU通常效果不好。

kaiming均匀分布

torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

大神何恺明的手笔，根据 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015)中的描述，服从均匀分布U(−bound, bound)，其中bound值由下式确定
在这里插入图片描述
参数:

tensor- 需要初始化的张量
a- 这层之后使用的rectifier的斜率系数（此参数仅在参数nonlinearity为’leaky_relu’时生效）
mode - 可以为“fan_in”（默认）或“fan_out”。“fan_in”维持前向传播时权值方差，“fan_out”维持反向传播时的方差
nonlinearity - 非线性函数（nn.functional中的函数名），pytorch建议仅与“relu”或“leaky_relu”(默认)一起使用。

kaiming正态分布

torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

根据 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015)中的描述，服从从正态分布N(0,std2)中采样，其中std值由下式确定
在这里插入图片描述
参数:

tensor- 需要初始化的张量
a- 这层之后使用的rectifier的斜率系数（此参数仅在参数nonlinearity为’leaky_relu’时生效）
mode - 可以为“fan_in”（默认）或“fan_out”。“fan_in”维持前向传播时权值方差，“fan_out”维持反向传播时的方差
nonlinearity - 非线性函数（nn.functional中的函数名），pytorch建议仅与“relu”或“leaky_relu”(默认)一起使用。

截断正态分布

如果初始化的某一些元素不在[a,b]之间，那么就重新随机选取这个值

torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=- 2.0, b=2.0)

初始化稀疏矩阵

torch.nn.init.sparse_(tensor, sparsity, std=0.01)

sparsity表示每一列多少比例的元素是0

std表示每一列以 $N(0,std^2)$ 的方式选择非负值

fan_in 与 fan_out

下面是kaiming 初始化中对fan_mode的说法

"fan_in"可以保留前向计算中权重方差的大小。
- Linear的输入维度
- Conv2d： $in\_channel*kernel\_width*kernel\_height$
"fan_out"将保留后向传播的方差大小。
- Linear的输出维度
- Conv2d: $out\_channel*kernel\_width*kernel\_height$

Linear：

net=torch.nn.Linear(3,5)
net
#Linear(in_features=3, out_features=5, bias=True)
 
torch.nn.init._calculate_fan_in_and_fan_out(net.weight)
#(3,5)
 
torch.nn.init._calculate_correct_fan(net.weight,
                                    mode='fan_in')
#3
 
torch.nn.init._calculate_correct_fan(net.weight,
                                    mode='fan_out')
#5
Conv2d

net=torch.nn.Conv2d(kernel_size=(3,5),
                    in_channels=2,
                    out_channels=10)
net
#Conv2d(2, 10, kernel_size=(3, 5), stride=(1, 1))
 
torch.nn.init._calculate_fan_in_and_fan_out(net.weight)
#(30,150)
 
 
 
torch.nn.init._calculate_correct_fan(net.weight,
                                    mode='fan_in')
#30 （2*3*5）
 
 
torch.nn.init._calculate_correct_fan(net.weight,
                                    mode='fan_out')
#150 （10*3*5）

使用GPU

在PyTorch中，CPU和GPU可以用torch.device('cpu')和torch.device('cuda')表示。
应该注意的是，cpu设备意味着所有物理CPU和内存，这意味着PyTorch的计算将尝试使用所有CPU核心。然而，gpu设备只代表一个卡和相应的显存。如果有多个GPU，我们使用torch.device(f'cuda:{i}')来表示第块GPU（从0开始）。另外，cuda:0和cuda是等价的。

指定GPU设备

import torch
from torch import nn

torch.device('cpu'), torch.device('cuda'), torch.device('cuda:1')


# 我们可以(查询可用gpu的数量。)

torch.cuda.device_count()

现在我们定义了两个方便的函数，
[这两个函数允许我们在不存在所需所有GPU的情况下运行代码。]

def try_gpu(i=0):  #@save
    """如果存在，则返回gpu(i)，否则返回cpu()"""
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

def try_all_gpus():  #@save
    """返回所有可用的GPU，如果没有GPU，则返回[cpu(),]"""
    devices = [torch.device(f'cuda:{i}')
             for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]

try_gpu(), try_gpu(10), try_all_gpus()


X = torch.ones(2, 3, device=try_gpu())

Y = torch.rand(2, 3, device=try_gpu(1))

Z = X.cuda(1)
print(X)
print(Z)

Y + Z

神经网络与GPU

net = nn.Sequential(nn.Linear(3, 1))
net = net.to(device=try_gpu())

net(X)

net[0].weight.data.device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tensor_cpu = torch.tensor([1, 2, 3])       # 创建一个CPU张量
tensor_cpu = tensor_cpu.to(device)     # 将张量移动到CPU上

model = MyModel()                         # 创建一个模型
model = model.to(device)              # 将模型移动到GPU上