（动手学习深度学习）第5章深度学习计算

深度学习炼丹师-CXD

已于 2023-08-27 17:33:18 修改

阅读量97

点赞数 1

分类专栏：动手学习深度学习文章标签：学习深度学习人工智能

于 2023-08-27 08:45:14 首次发布

本文链接：https://blog.csdn.net/weixin_44342777/article/details/132508228

版权

动手学习深度学习专栏收录该内容

20 篇文章 2 订阅

订阅专栏

5.1 层和块

首先回顾以下多层感知机(MLP)

nn.Sequential()
net = nn.Sequential()：实质上定义了一种很特殊的Module(顺序Module)

import torch
from torch import nn
from torch import functional as F

net = nn.Sequential(
    nn.Linear(20, 256),
    nn.ReLU(),
    nn.Linear(256, 10))
X = torch.rand(2, 20)
net(X), X.shape

在这里插入图片描述

使用自定义块定义net

class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        # super().__init__() 当该类指定继承父类是，可省略参数
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)

    def forward(self, X):
        # 方式1：常规写法：执行每个层都写出来
        # X = nn.ReLU(self.hidden(X))
        # X = self.out(X)
        # return X
        
        # 方式2：对于简单的网络层，直接使用简洁写法
        return self.out(F.relu(self.hidden(X)))

实例化多层感知机的层, 然后在每次调用正向传播函数时会调用这些层

X = torch.rand(2, 20)
net = MLP()  # 调用类函数,得先实例化类,不要忘记哦
net(X), X.shape

在这里插入图片描述

手动实现顺序块

# 手动实现顺序块
class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            # 这里，module是Module子类的一个实例，我们把它保存在Module类的成员
            # 变量_modules中：_module的类型是OrderedDict
            self._module[str(idx)] = module

    def forward(self, X):
        # OrderedDict保证了按照成员添加的顺序遍历它们
        for block in self._module.values():
            X = block(X)
        return X

X = torch.rand(2, 20)
net = MySequential(nn.Linear(20, 256), nn.ReLU(), 
                             nn.Linear(256, 10))
net(X)

在这里插入图片描述
3. 使用控制流定义块

这种方式在正向传播中执行代码
(可能没有实际意义,但是能体现继承nn.Module的灵活多样性)

# 继承nn.Module的灵活多样性
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # 不计算梯度的随机权重参数， 因此其在反向传播期间保持不变
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        # 使用创建的常量函数以及relu和mm函数
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        # 复用全连接层这相当于两个全连接层共享参数
        X = self.linear(X)
        # python控制流
        while X.abs().sum() > 1 :
            X /= 2
        return X.sum()

net = FixedHiddenMLP()
net(X)

在这里插入图片描述

混合搭配各种组合块的方法

# 混合搭配块
class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(20, 64), nn.ReLU(),
            nn.Linear(64, 32), nn.ReLU())
        self.linear = nn.Linear(32, 16)

    def forward(self, X):
        X = self.net(X)
        X = self.linear(X)
        return X
chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20))  # chimera:虚构的
chimera(X), X.shape

在这里插入图片描述

5.2 参数管理

5.2.1 参数访问

首先关注具有单隐藏层的MLP

import torch
from torch import nn
net = nn.Sequential(
    nn.Linear(4, 8), nn.ReLU(), 
    nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X), net(X).shape

在这里插入图片描述
查看网络结构

net

在这里插入图片描述
参数访问

使用nn.Module创建的网络块，默认会自动创建权重W, B，并且以类型为OrderedDict顺序存储。

print(net[2].state_dict())  # 为什么权重矩阵是转置的: 默认的Linear层:y = X*w^T

在这里插入图片描述
目标参数: 其实参数Parameter中含有两种数据:1. 数值(data) 2.梯度(grad)

print(type(net[2].bias))
print(net[2].bias)
print(net[2].bias.data)

在这里插入图片描述

print(net[2].weight.grad == None)

在这里插入图片描述
一次性访问所有参数

print(*[(name, param.shape) for name, param in net[0].named_parameters()])  # 访问第0层的参数
print() 
print(*[(name, param.shape) for name, param in net.named_parameters()])  # 访问所有参数

在这里插入图片描述

next(net.named_parameters())

在这里插入图片描述

print(net.state_dict()['0.bias'].data)

在这里插入图片描述
这两种访问参数的方式都一样，后者比较常用

print(net.state_dict()['0.bias'].data == net[0].bias.data)

在这里插入图片描述

5.2.2 参数初始化

使用嵌套块收集函数

def block1():
    return nn.Sequential(
        nn.Linear(4, 8), nn.ReLU(),
        nn.Linear(8, 4), nn.ReLU())

def block2():
    net = nn.Sequential()
    for i in range(4):
        net.add_module(f'block{i}', block1())
    return net

X = torch.rand(size=(2, 4))
rgnet = nn.Sequential(
    block2(), nn.Linear(4, 1))
rgnet(X)

在这里插入图片描述

print(rgnet)

在这里插入图片描述

内置初始化

def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01)
        nn.init.zeros_(m.bias)

net.apply(init_normal)
print(net[0].weight.data[0], net[0].bias.data[0])

在这里插入图片描述

def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 1)  # 功能:使用val的值来填充输入的Tensor
        nn.init.zeros_(m.bias)

net.apply(init_constant)
print(net[0].weight.data[0], net[0].bias.data[0])

在这里插入图片描述

对某些块应用不同的初始化方法
初始化函数 uniform, normal, const, Xavier, He initialization

# 对某些块应用不同的初始化层
def xavier(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)  # 每一层输出的方差应该尽量相等

def init_42(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 42)

net[0].apply(xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data[0])

在这里插入图片描述
3. 自定义初始化

# 自定义初始化
def my_init(m):
    if type(m) == nn.Linear:
        print("Init", *[(name, param.shape) for name, param in m.named_parameters()][0])
        nn.init.uniform_(m.weight, -10, 10)
        m.weight.data *= m.weight.data.abs() >= 5 
        # a = a * (abs(a)>=5): 绝对值>=5的：相乘， 否则，权重对应元素变为0
net.apply(my_init)
print(net[0].weight[:2])

在这里插入图片描述

net[0].weight.data[:] += 1  # weight所有值+1
net[0].weight.data[0, 0] = 42  # weight第一个值=42
print(net[0].weight.data[0])

在这里插入图片描述

参数绑定(层之间共享参数)

shared = nn.Linear(8, 8)
net = nn.Sequential(
    nn.Linear(4, 8), nn.ReLU(), 
    shared, nn.ReLU(), 
    shared, nn.ReLU(), 
    nn.Linear(8, 1))
net(X)
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
print(net[4].weight.data[0, 0])

在这里插入图片描述

print(net[4].weight.data)

在这里插入图片描述

5.3 延后初始化

pytorch很少用此操作，因为pytorch定义网络结构时，必明确前后的维度，并且保证前后维度一致；MXnet、tensorflow用的较多

到目前为止，我们忽略了建立网络时需要做的以下这些事情：

我们定义了网络架构，但没有指定输入维度。
我们添加层时没有指定前一层的输出维度。
我们在初始化参数时，甚至没有足够的信息来确定模型应该包含多少参数。

有些读者可能会对我们的代码能运行感到惊讶。毕竟，深度学习框架无法判断网络的输入维度是什么。这里的诀窍是框架的延后初始化（defers initialization），即直到数据第一次通过模型传递时，框架才会动态地推断出每个层的大小。

5.4 自定义层

构建一个没有任何参数的自定义层

%%time
import torch 
import torch.nn.functional as F
from torch import nn


class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, X):
        return X - X.mean()

layer = CenteredLayer()
layer(torch.FloatTensor([1, 2, 3, 4, 5]))

在这里插入图片描述
2. 将层作为组件合并到构建更复杂的模型中

%%time
net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())

Y = net(torch.rand(4, 8))
Y.mean()

在这里插入图片描述

class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,)) 

    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

dense = MyLinear(5, 3)
print(dense.weight)
print(dense.bias)

在这里插入图片描述
使用自定义层直接执行正向传播计算

# 使用自定义层直接执行正向传播计算
dense(torch.rand(2, 5))

在这里插入图片描述

只用自定义层构建模型

# 只用自定义层构建模型
net = nn.Sequential(
    MyLinear(64, 8),
    MyLinear(8, 1))
net(torch.rand(2, 64))

在这里插入图片描述

5.5 读写文件

加载和保存张量

# 加载和保存张量
import torch
from torch import nn
from torch.nn import functional as F

x = torch.arange(4)
torch.save(x, 'x-file')  # x-fil:是一个含有archive的压缩文件 ，pytorch的专属二进制文件格式

x2 = torch.load("x-file")
x2

在这里插入图片描述
查看文件保存位置

import os
file_path = os.path.abspath('x-file')
print(file_path)

在这里插入图片描述

存储一个张量列表，然后把他们读回内存

# 存储一个张量列表，然后把他们读回内存
y = torch.zeros(4)
torch.save([x, y], 'x-file')
x2, y2 = torch.load('x-file')
x2, y2

在这里插入图片描述

写入或读取从字符串映射到张量的字典

# 写入或读取从字符串映射到张量的字典
mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
print(mydict2)

在这里插入图片描述

加载和保存模型参数

# 加载和保存模型参数
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
X = torch.randn(size=(2, 20))
y = net(X)

将模型的参数存储为一个叫做“mlp.params”的文件

torch.save(net.state_dict(), 'mlp.params')

实例化了原始多层感知机模型的一个备份。直接读取文件中存储的参数

# 实例化
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()

在这里插入图片描述

可以观察到使用之前的参数来评估模型，与训练完成的参数是一样的

y_clone = clone(X)
print(y_clone == y)

在这里插入图片描述

5.6 GPU

5.6.1 计算机设备

在PyTorch中，每个数组都有一个设备（device），我们通常将其称为环境（context）。

默认情况下，所有变量和相关的计算都分配给CPU。有时环境可能是GPU。当我们跨多个服务器部署作业时，事情会变得更加棘手。
通过智能地将数组分配给环境，我们可以最大限度地减少在设备之间传输数据的时间。例如，当在带有GPU的服务器上训练神经网络时，
我们通常希望模型的参数在GPU上。

使用nvidia-smi命令来查看显卡信息。

!nvidia-smi

在这里插入图片描述

在PyTorch中，CPU和GPU可以用torch.device(‘cpu’) 和torch.device(‘cuda’)表示。

在PyTorch中，CPU和GPU可以用torch.device(‘cpu’) 和torch.device(‘cuda’)表示。
gpu设备只代表一个卡和相应的显存。
如果有多个GPU，我们使用torch.device(f’cuda:{i}') 来表示第i块GPU（从0开始）。
另外，cuda:0和cuda是等价的。

import torch
from torch import nn

torch.device('cpu'), torch.device('cuda'), torch.device('cuda:1')

在这里插入图片描述查询可用GPU的数量

torch.cuda.device_count()

定义使用GPU的两个函数：

try_gpu():是否使用1个GPU
try_all_gpus(): 是否使用所有的GPU

# 使用GPU
def try_gpu(i=0):  
    """如果存在，则返回gpu(i)，否则返回cpu()"""
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

def try_all_gpus():  
    """返回所有可用的GPU，如果没有GPU，则返回[cpu(),]"""
    devices = [torch.device(f'cuda:{i}')
             for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]

try_gpu(), try_gpu(10), try_all_gpus()

在这里插入图片描述

5.6.2 张量与GPU

默认情况下，张量实在cpu上创建的

X = torch.tensor([1, 2, 3])
X.device

在这里插入图片描述
将张量从CPU转移到GPU

device = try_gpu()
X = X.to(device)
X.device

在这里插入图片描述

也可以在创建张量时指定存储设备

X = torch.ones(2, 3, device=try_gpu())
# 如果至少了两个GPU，也可指定在第几个GPU上创建
#  Y = torch.ones(2, 3, device=try_gpu(1))  

X

在这里插入图片描述

5.6.3 不同设备上张量的计算

如果我们要计算X + Y，我们需要决定在哪里执行这个操作。

例如，有两个张量X,Y，其中：X：存储在第一个GPU， Y：存储在第二个GPU。
我们可以将X传输到第二个GPU并在那里执行操作。
不要简单地X加上Y，因为这会导致异常，运行时引擎不知道该怎么做：它在同一设备上找不到数据会导致失败。
由于Y位于第二个GPU上，所以我们需要将X移到那里，然后才能执行相加运算

先查看设备上是否有GPU

# 使用GPU训练
if not torch.cuda.is_available():
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available.  Training on GPU ...')

在这里插入图片描述

定义张量：

例如，有两个张量X、Y，其中：X：存储在GPU上， Y：存储在GPU上。

X = torch.ones((2, 3))
Y = torch.randn((2, 3), device = try_gpu())
X, Y

在这里插入图片描述
如果我们要计算X + Y，我们需要决定在哪里执行这个操作。 (若直接计算则会出错)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

# X + Y

声明设备环境

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device

在这里插入图片描述

将X传输到GPU上

X = X.to(device)
X

在这里插入图片描述
5. 执行X + Y。（此时X,Y均在同一个设备上，不会出错）

X + Y

在这里插入图片描述

注意：
人们使用GPU来进行机器学习，因为单个GPU相对运行速度快。但是在设备（CPU、GPU和其他机器）之间传输数据比计算慢得多。这也使得并行化变得更加困难，因为我们必须等待数据被发送（或者接收），然后才能继续进行更多的操作。
这就是为什么拷贝操作要格外小心。根据经验，多个小操作比一个大操作糟糕得多。此外，一次执行几个操作比代码中散布的许多单个操作要好得多。如果一个设备必须等待另一个设备才能执行其他操作，那么这样的操作可能会阻塞。这有点像排队订购咖啡，而不像通过电话预先订购：当客人到店的时候，咖啡已经准备好了。
最后，当我们打印张量或将张量转换为NumPy格式时，如果数据不在内存中，框架会首先将其复制到内存中，这会导致额外的传输开销。
更糟糕的是，它现在受制于全局解释器锁，使得一切都得等待Python完成。

5.6.4 神经网络与GPU

神经网络模型也可以指定设备

net = nn.Sequential(nn.Linear(3, 1))
net = net.to(device=try_gpu())

确认模型参数存储在同一个GPU上。

net[0].weight.data.device

在这里插入图片描述
只要所有的数据和参数都在同一个设备上，我们就可以有效地学习模型

总结

我们可以指定用于存储和计算的设备，例如CPU或GPU。默认情况下，数据在主内存中创建，然后使用CPU进行计算。
深度学习框架要求计算的所有设备都在同一设备上，无论是CPU还是GPU，否则会报错。
不经意地移动数据可能会显著降低性能。
- 一个典型的错误：计算GPU上每个小批量的损失，并命令将其报告给用户(或将其记录在Numpy ndarray中)时，将触发全局解释锁，从而使所有GPU堵塞。最好是为GPU内部的日志，并且只移动较大的日志。