d2l-ai深度学习日记(四)-深度学习计算

吴耀好

于 2024-09-28 14:09:55 发布

阅读量744

点赞数 29

分类专栏： d2l学习文章标签：人工智能深度学习 python pytorch conda

本文链接：https://blog.csdn.net/Wyh666a/article/details/142614165

版权

d2l学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

前言:

这个博客《d2l-ai深度学习日记》将记录我在深度学习领域的学习与探索，特别是基于《动手学深度学习》这本经典教材的学习过程。在这个过程中，我不仅希望总结所学，还希望通过分享心得，与志同道合的朋友一起交流成长。这不仅是对知识的沉淀，也是我备战研究生考试、追逐学术进阶之路的一部分。

过去学习日志:

d2l-ai深度学习日记(三)-CSDN博客

d2l-ai深度学习日记(二)-CSDN博客

d2l-ai深度学习日记(一)-CSDN博客

一.层和块

首先说明和理解单层神经网络:（1）接受⼀些输⼊；（2）⽣成相应的标量输出；（3）具有⼀组相关参数（parameters），更新这些参数可以优化某⽬标函数。

再在单层神经网络的基础上,叠加到多层神经网络:（1）接受⼀组输⼊，（2）⽣成相应的输出，（3）由⼀组可调整参数描述.

其实都差不多,对于复杂的神经网络,为了方便研究,引入块的概念

块（block）可以描述单个层、由多个层组成的组件或整个模型本⾝.

1.⾃定义块

net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
X = torch.rand(2, 20) net(X)

输出:

(tensor([[0.9552, 0.1055, 0.4143, 0.6566, 0.0974, 0.4917, 0.3033, 0.5293, 0.8635,
          0.5454, 0.7582, 0.0841, 0.4433, 0.9050, 0.5753, 0.6413, 0.1913, 0.5681,
          0.7266, 0.0624],
         [0.3716, 0.0499, 0.7162, 0.1795, 0.5246, 0.1605, 0.0451, 0.9703, 0.3133,
          0.5547, 0.9010, 0.7549, 0.3573, 0.2265, 0.6316, 0.6850, 0.3806, 0.1511,
          0.7892, 0.8868]]),
 tensor([[ 2.0529e-01,  2.5533e-02,  4.4521e-02, -1.7059e-01,  7.9211e-03,
          -1.9895e-01,  2.1033e-04, -7.2666e-02, -1.3248e-01, -1.2476e-01],
         [ 2.1410e-01, -5.3917e-02,  2.1959e-02, -2.1866e-01,  5.0237e-02,
          -1.2994e-01, -8.8673e-02, -1.2499e-01, -2.1385e-01, -2.7487e-01]],
        grad_fn=<AddmmBackward0>))

这里可以把nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10)分别理解为三个块(虽然这里没有给块只是一层神经网络),然后使用Sequential将这几个块连在一起,就变成了一个模型.

不过这里的代码不够直观,我看不到Sequential的内部结构,所以我从零开始编写⼀个块:

class MLP(nn.Module):
    # ⽤模型参数声明层。这⾥，我们声明两个全连接的层 
    def __init__(self):
        # 调⽤MLP的⽗类Module的构造函数来执⾏必要的初始化。 
        # 这样，在类实例化时也可以指定其他函数参数，例如模型参数params（稍后将介绍） 
        super().__init__()
        self.hidden = nn.Linear(20, 256) # 隐藏层 
        self.out = nn.Linear(256, 10)  # 输出层
        # 定义模型的前向传播，即如何根据输⼊X返回所需的模型输出 
    def forward(self, X):
        # 注意，这⾥我们使⽤ReLU的函数版本，其在nn.functional模块中定义。 
        return self.out(F.relu(self.hidden(X)))
net = MLP() 
X,net(X)

⾸先，我们定制的__init__函数通过super().__init__() 调⽤⽗类的__init__函数，省去了重复编写模版代码的痛苦。然后，我们实例化两个全连接层，分别为self.hidden和self.out。输出:

(tensor([[0.6323, 0.5777, 0.3461, 0.6884, 0.5336, 0.9907, 0.4930, 0.3447, 0.7401,
          0.8314, 0.0510, 0.8178, 0.9791, 0.4548, 0.3637, 0.2382, 0.1611, 0.9832,
          0.9458, 0.7899],
         [0.8188, 0.7822, 0.4751, 0.3578, 0.9700, 0.7996, 0.7000, 0.1862, 0.8203,
          0.6949, 0.8516, 0.2803, 0.2932, 0.3641, 0.1304, 0.3470, 0.2885, 0.3683,
          0.2968, 0.3663]]),
 tensor([[-0.0370, -0.0996, -0.0727,  0.3801,  0.0075,  0.2394,  0.0810,  0.0596,
          -0.0437,  0.1252],
         [ 0.1070, -0.1422, -0.1210,  0.2432,  0.0299,  0.2160, -0.0058,  0.1041,
          -0.0972,  0.0591]], grad_fn=<AddmmBackward0>))

2.顺序块

为了方便理解,我进一步,制作了自己的MySequential函数,来代替Sequential函数

class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__() 
        for idx, module in enumerate(args):
            # 这⾥，module是Module⼦类的⼀个实例。我们把它保存在'Module'类的成员 # 变量_modules中。_module的类型是OrderedDict
            self._modules[str(idx)] = module
    def forward(self, X):
         # OrderedDict保证了按照成员添加的顺序遍历它们 
        for block in self._modules.values():
             X = block(X)
        return X
net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10)) 
net(X)

输出:

tensor([[-0.1880,  0.2571, -0.0548, -0.1871,  0.0098, -0.0714, -0.0674, -0.1154,
         -0.0302,  0.0968],
        [-0.0016,  0.1710, -0.1317, -0.2083, -0.0779, -0.0033, -0.0535, -0.1174,
          0.1162,  0.0705]], grad_fn=<AddmmBackward0>)

3.在前向传播函数中执⾏代码

在使用Sequential中,虽然使得编写网络更方便了,但是存在一定的局限性,我无法将自己的数学计算想法融入到网络中,为了实现这一点,引进"常量参数"概念

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__() 
        # 不计算梯度的随机权重参数。因此其在训练期间保持不变 
        self.rand_weight = torch.rand((20, 20), requires_grad=False) 
        self.linear = nn.Linear(20, 20)
    def forward(self, X):
        X = self.linear(X)
        # 使⽤创建的常量参数以及relu和mm函数 
        X = F.relu(torch.mm(X, self.rand_weight) + 1) 
        # 复⽤全连接层。这相当于两个全连接层共享参数
        X = self.linear(X) # 控制流
        while X.abs().sum() > 1:
            X /= 2 
        return X.sum()

在返回输出之前，模型做了⼀些不寻常的事情：它运⾏了⼀个while循环，在L 1 范数⼤于1的条件下，将输出向量除以2，直到它满⾜条件为⽌。最后，模型返回了X中所有项的和。

又因为其权重（self.rand_weight）在实例化时被随机初始化，之后为常量。这个权重不是⼀个模型参数，因此它永远不会被反向传播更新,所以我实现了在前向传播函数中执⾏代码.

net = FixedHiddenMLP() 
net(X)

输出:

tensor(0.1690, grad_fn=<SumBackward0>)

二.参数管理

在上一次学习进行实战的时候,我的超参数调整得很杂乱无章,像无头苍蝇一样,并且没有保存模型,使得效率特别低下,这里将说明操作参数的具体细节.

1.参数访问

定义网络例子:

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1)) 
X = torch.rand(size=(2, 4))
net(X)

访问相关参数:

print(net[2].state_dict())

输出:

OrderedDict([('weight', tensor([[-0.2519,  0.1542,  0.3514, -0.0779,  0.0612,  0.2336, -0.3134, -0.0249]])), ('bias', tensor([-0.0001]))])

输出的结果告诉我们⼀些重要的事情：⾸先，这个全连接层包含两个参数，分别是该层的权重和偏置。两者都存储为单精度浮点数（float32）。

访问⽬标参数:

print(type(net[2].bias)) 
print(net[2].bias)
print(net[2].bias.data)

输出:

<class 'torch.nn.parameter.Parameter'>
Parameter containing:
tensor([-0.0001], requires_grad=True)
tensor([-0.0001])

⼀次性访问所有参数:

print(*[(name, param.shape) for name, param in net[0].named_parameters()]) 
print(*[(name, param.shape) for name, param in net.named_parameters()])

输出:

('weight', torch.Size([8, 4])) ('bias', torch.Size([8]))
('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))

从嵌套块收集参数:

def block1():
    return nn.Sequential(nn.Linear(4, 8), nn.ReLU(),nn.Linear(8, 4), nn.ReLU())
def block2():
    net = nn.Sequential() 
    for i in range(4):
        # 在这⾥嵌套 
        net.add_module(f'block {i}', block1()) 
    return net
rgnet = nn.Sequential(block2(), nn.Linear(4, 1)) 
rgnet(X),print(rgnet)

输出:

tensor([[-0.1187],
        [-0.1187]], grad_fn=<AddmmBackward0>)
Sequential(
  (0): Sequential(
    (block 0): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 1): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 2): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 3): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
  )
  (1): Linear(in_features=4, out_features=1, bias=True)
)

2.参数初始化

在访问了如此多的参数之后,需要知道,这些参数的值都是可以认为修改的,只不过是很多个变量组合在一起,现在来看看如何正确地初始化参数

1.内置初始化

def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01) 
        nn.init.zeros_(m.bias)
net.apply(init_normal) 
print(net[0].weight.data[0])
print(net[0].bias.data[0])
def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 1) 
        nn.init.zeros_(m.bias)
net.apply(init_constant) 
print(net[0].weight.data[0])
print(net[0].bias.data[0])

第一个函数是将线性模型的权重设置为均值为0,方差为0.01的正态分布,偏置设置为0

第二个函数是将线性模型的权重设置为均值为常数1,偏置设置为0

输出:

(tensor([-0.0128,  0.0003, -0.0035, -0.0125]), tensor(0.))
(tensor([1., 1., 1., 1.]), tensor(0.))

2.⾃定义初始化

下⾯的例⼦中，我使⽤以下的分布为任意权重参数w定义初始化⽅法:

$\omega\sim\left\{\begin{matrix} U(5,10) &\mathrm{Probability}\frac{1}{4} \\ 0& \mathrm{Probability}\frac{1}{2}\\ U(-10,-5)& \mathrm{Probability}\frac{1}{4} \end{matrix}\right.$

def my_init(m):
    if type(m) == nn.Linear:
        print("Init", *[(name, param.shape)for name, param in m.named_parameters()][0]) 
        nn.init.uniform_(m.weight, -10, 10)
        m.weight.data *= m.weight.data.abs() >= 5
net.apply(my_init) 
net[0].weight[:2]

输出:

Init weight torch.Size([8, 4])
Init weight torch.Size([1, 8])
tensor([[ 5.9444,  5.0421, -7.1173, -7.5744],
        [-8.0533, -5.4951, -6.0108,  6.8597]], grad_fn=<SliceBackward0>)

3.参数绑定

# 我们需要给共享层⼀个名称，以便可以引⽤它的参数 
shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),shared, nn.ReLU(),shared, nn.ReLU(), nn.Linear(8, 1))
net(X) 
# 检查参数是否相同 
print(net)
print(net[2].weight.data[0] == net[4].weight.data[0]) 
net[2].weight.data[0, 0] = 100
# 确保它们实际上是同⼀个对象，⽽不只是有相同的值 
print(net[2].weight.data[0] == net[4].weight.data[0])

通过这个例子这个非常好理解,通过一个层的参数来绑定另一个层,当上面的网络第三层改变的时候,第五层也一定会改变

输出

Sequential(
  (0): Linear(in_features=4, out_features=8, bias=True)
  (1): ReLU()
  (2): Linear(in_features=8, out_features=8, bias=True)
  (3): ReLU()
  (4): Linear(in_features=8, out_features=8, bias=True)
  (5): ReLU()
  (6): Linear(in_features=8, out_features=1, bias=True)
)
tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])

三.⾃定义层

1.不带参数的层

class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, X):
        return X - X.mean()
layer = CenteredLayer() 
layer(torch.FloatTensor([1, 2, 3, 4, 5]))

输出:

(tensor([-2., -1.,  0.,  1.,  2.]))

可以看到,该层不带任何参数,也可以正常工作.

现在可以将层作为组件合并到更复杂的模型中。

net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())
Y = net(torch.rand(4, 8)) 
Y.mean()

现在将自定义的一层网络放在Sequential和线性层进行组合

tensor(-3.7253e-09, grad_fn=<MeanBackward0>)

依然可以正常运行

2.带参数的层

定义完了不带参数的层,现在来尝试定义带参数的层

class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units)) 
        self.bias = nn.Parameter(torch.randn(units,))
    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data 
        return F.relu(linear)

可以看到在这个MyLinear的构造函数里面多了两个参数in_units,units,即输入输出数量参数.

尝试打印网络:

linear = MyLinear(5, 3) 
linear.weight,linear(torch.rand(2, 5))

输出:

(Parameter containing:
 tensor([[-0.2937,  0.7069,  0.3479],
         [-0.6273,  0.1089, -0.2632],
         [ 0.3398,  0.8632, -0.8374],
         [-0.2506,  0.4233,  0.2730],
         [-0.4176, -0.6453,  1.7846]], requires_grad=True),
 tensor([[0.0000, 0.3588, 0.0000],
         [0.0000, 1.3026, 0.0000]]))

依然可以正常运行,并且可以自己尝试不同参数的实例化

四.读写⽂件

这一节的内容和预备内容中的DataFrame的使用有点相似,都是对文件的读写等等,就不过多描述

1.加载和保存张量

保存读取单个张量

x = torch.arange(4) 
torch.save(x, 'x-file')
x2 = torch.load('x-file')
x2

输出:

tensor([0, 1, 2, 3])

保存读取张量列表

y = torch.zeros(4) 
torch.save([x, y],'x-files') 
x2, y2 = torch.load('x-files') 
(x2, y2)

输出

(tensor([0, 1, 2, 3]), tensor([0., 0., 0., 0.]))

保存读取张量的字典

mydict = {'x': x, 'y': y} 
torch.save(mydict, 'mydict') 
mydict2 = torch.load('mydict') 
mydict2

输出

{'x': tensor([0, 1, 2, 3]), 'y': tensor([0., 0., 0., 0.])}

2.加载和保存模型参数

简单写一个网络模型:

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256) 
        self.output = nn.Linear(256, 10)
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))
net = MLP()

然后将其保存后加载输出:

X = torch.randn(size=(2, 20)) 
Y = net(X)
torch.save(net.state_dict(), 'mlp.params')
clone = MLP() 
clone.load_state_dict(torch.load('mlp.params')) 
clone.eval()

输出:

MLP(
  (hidden): Linear(in_features=20, out_features=256, bias=True)
  (output): Linear(in_features=256, out_features=10, bias=True)
)

此外,书中还有一节"GPU",但是里面只讲了一些简单的,使用GPU存储张量的示例代码语句,并没有说明如何使用GPU来训练以节省时间,并没有特别有用,所以这里没有提到.

五.总结

这次学习主要学习了深度学习计算的一些方法,这一章的代码比较少,没有新的模型或数据集，有一点像预备知识.虽然没有学习其他的模型,但是网络模型的理解更加深刻,有了焕然一新的不同认知.

吴耀好

关注

29
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录