动手学深度学习5.4 自定义层-笔记&练习（PyTorch）_动手学深度学习 5.4 练习题-CSDN博客

本文链接：https://blog.csdn.net/scdifsn/article/details/139932002

以下内容为结合李沐老师的课程和教材补充的学习笔记，以及对课后练习的一些思考，自留回顾，也供同学之人交流参考。

本节教材地址：5.4. 自定义层 — 动手学深度学习 2.0.0 documentation (d2l.ai)

本节开源代码：...>d2l-zh>pytorch>chapter_multilayer-perceptrons>custom-layer.ipynb

自定义层

深度学习成功背后的一个因素是神经网络的灵活性：我们可以用创造性的方式组合不同的层，从而设计出适用于各种任务的架构。例如，研究人员发明了专门用于处理图像、文本、序列数据和执行动态规划的层。有时我们会遇到或要自己发明一个现在在深度学习框架中还不存在的层。在这些情况下，必须构建自定义层。本节将展示如何构建自定义层。

不带参数的层

首先，我们(构造一个没有任何参数的自定义层)。回忆一下在 5.1节对块的介绍，这应该看起来很眼熟。下面的CenteredLayer类要从其输入中减去均值。要构建它，我们只需继承基础层类并实现前向传播功能。

import torch
import torch.nn.functional as F
from torch import nn


class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

让我们向该层提供一些数据，验证它是否能按预期工作。

layer = CenteredLayer()
layer(torch.FloatTensor([1, 2, 3, 4, 5]))

输出结果：

tensor([-2., -1., 0., 1., 2.])

现在，我们可以[将层作为组件合并到更复杂的模型中]。

net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())

作为额外的健全性检查，我们可以在向该网络发送随机数据后，检查均值是否为0。由于我们处理的是浮点数，因为存储精度的原因，我们仍然可能会看到一个非常小的非零数。

Y = net(torch.rand(4, 8))
Y.mean()

输出结果：

tensor(-3.7253e-09, grad_fn=<MeanBackward0>)

[带参数的层]

以上我们知道了如何定义简单的层，下面我们继续定义具有参数的层，这些参数可以通过训练进行调整。我们可以使用内置函数来创建参数，这些函数提供一些基本的管理功能。比如管理访问、初始化、共享、保存和加载模型参数。这样做的好处之一是：我们不需要为每个自定义层编写自定义的序列化程序。

现在，让我们实现自定义版本的全连接层。回想一下，该层需要两个参数，一个用于表示权重，另一个用于表示偏置项。在此实现中，我们使用修正线性单元作为激活函数。该层需要输入参数：in_units和units，分别表示输入数和输出数。

class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))
    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

接下来，我们实例化MyLinear类并访问其模型参数。

linear = MyLinear(5, 3)
print(linear.weight)
print(linear.bias)

输出结果：

Parameter containing:
tensor([[ 0.6946, 0.7929, 0.9740],
[-0.1089, 0.7256, 0.1643],
[-0.0668, 0.5950, -1.5859],
[ 0.6931, -0.1791, -0.7406],
[-1.8750, 0.6077, -1.2801]], requires_grad=True)
Parameter containing:
tensor([-0.3876, -0.1714, -1.2957], requires_grad=True)

我们可以[使用自定义层直接执行前向传播计算]。

linear(torch.rand(2, 5))

输出结果：

tensor([[0.0000, 1.3590, 0.0000],
[0.1137, 1.2107, 0.0000]])

我们还可以(使用自定义层构建模型)，就像使用内置的全连接层一样使用自定义层。

net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
net(torch.rand(2, 64))

输出结果：

tensor([[3.7411],
[9.1770]])

小结

我们可以通过基本层类设计自定义层。这允许我们定义灵活的新层，其行为与深度学习框架中的任何现有层不同。
在自定义层定义完成后，我们就可以在任意环境和网络架构中调用该自定义层。
层可以有局部参数，这些参数可以通过内置函数创建。

练习

设计一个接受输入并计算张量降维的层，它返回 $y_k = \sum_{i, j} W_{ijk} x_i x_j$ 。

解：
代码如下：

# 维度循环+矩阵乘法
class Linear5_4_1(nn.Module):
    def __init__(self, in_units, out_units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, in_units, out_units))
    def forward(self, X):
        b = X.shape[0] # b = batch_size
        o = self.weight.shape[2] # o = out_units
        y = torch.zeros(b,o) # y的shape为(batch_size，out_units)
        for k in range(o):
            for i in range(b):
                # 矩阵乘法维度分别为：(1*4), (4*4), (4*1)
                y[i,k] = torch.matmul(torch.matmul(X[i,:],self.weight[:,:,k]),X[i,:].unsqueeze(1))
        return y
X = torch.randn(2, 4)
linear = Linear5_4_1(4, 2)

print(linear.weight.shape)
print(linear(X))

输出结果：

torch.Size([4, 4, 2])
tensor([[-1.3495, 0.6424],
[-6.5971, -5.1928]], grad_fn=<CopySlices>)

# 或者用einsum简化
class Linear5_4_1_simplify(nn.Module):
    def __init__(self, in_units, out_units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(out_units, in_units, in_units))

    def forward(self, X):
        X1 = X.unsqueeze(1)  # X1.shape = (b, 1, i)
        X2 = X.unsqueeze(2)  # X2.shape = (b, i, 1)
        # einsum识别第一个维度(batch_size，扩展维，input_units)
        y = torch.einsum('bij,ojk,bkl->bo', X1, self.weight, X2)  
        return y
linear = Linear5_4_1_simplify(4, 2)

print(linear.weight.shape)
print(linear(X))

输出结果：

torch.Size([2, 4, 4])
tensor([[ 5.0040, -1.2855],
[ 3.5123, 1.1651]], grad_fn=<ViewBackward0>)

2. 设计一个返回输入数据的傅立叶系数前半部分的层。

解：
傅立叶系数： $f(t) = \frac{a_0}{2} + \sum_{n=1}^{\infty}[a_n\cos(n\omega t)+b_n\sin(n\omega t)]$

前半部分即 $\frac{a_0}{2} = \frac{1}{T} \int_0 ^T f(t)dt$

代码如下：

# 直接根据上述公式计算
class FourierFrontHalf_1(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim # 指定计算维度
    def forward(self, X):

        return X.mean(dim = self.dim)
X = torch.randn(2,2)
X_fft = FourierFrontHalf_1(-1)

print(X_fft(X))

输出结果：

tensor([-0.5247, 0.1935])

# 或者用torch.fft计算后取直流分量的实部
import torch.fft

class FourierFrontHalf_2(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim # 指定计算维度
    def forward(self, X):
        X_fft = torch.fft.fft(X, dim=self.dim)
        # 索引为0的元素为直流分量
        return X_fft[:,0].real/2
X_fft = FourierFrontHalf_2(-1)

print(X_fft(X))

输出结果：

tensor([-0.5247, 0.1935])