torch.nn.Linear维度解析

泡沫不是茶香

已于 2022-05-26 23:12:32 修改

阅读量4.1k

点赞数 7

文章标签：深度学习 python

于 2022-05-26 23:12:08 首次发布

本文链接：https://blog.csdn.net/qq_44076218/article/details/124993303

版权

问题：为什么Linear参数只有一个一维的输入维度和输出维度，而输入的张量可以是多维？
1. nn.Linear的原理:
从名称就可以看出来，nn.Linear表示的是线性变换，原型就是初级数学里学到的线性函数：y=kx+b
不过在深度学习中，变量都是多维张量，乘法就是矩阵乘法，加法就是矩阵加法
2.nn.Linear的源代码

class Linear(Module):
    r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b`

    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.

    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        bias: If set to ``False``, the layer will not learn an additive bias.
            Default: ``True``

    Shape:
        - Input: :math:`(*, H_{in})` where :math:`*` means any number of
          dimensions including none and :math:`H_{in} = \text{in\_features}`.
        - Output: :math:`(*, H_{out})` where all but the last dimension
          are the same shape as the input and :math:`H_{out} = \text{out\_features}`.
	Attributes:
        weight: the learnable weights of the module of shape
            :math:`(\text{out\_features}, \text{in\_features})`. The values are
            initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
            :math:`k = \frac{1}{\text{in\_features}}`
        bias:   the learnable bias of the module of shape :math:`(\text{out\_features})`.
                If :attr:`bias` is ``True``, the values are initialized from
                :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
                :math:`k = \frac{1}{\text{in\_features}}`

    Examples::

        >>> m = nn.Linear(20, 30)
        >>> input = torch.randn(128, 20)
        >>> output = m(input)
        >>> print(output.size())
        torch.Size([128, 30])
    """
    __constants__ = ['in_features', 'out_features']
    in_features: int
    out_features: int
    weight: Tensor
    def __init__(self, in_features: int, out_features: int, bias: bool = True,
                 device=None, dtype=None) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
        if bias:
            self.bias = Parameter(torch.empty(out_features, **factory_kwargs))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()
    def reset_parameters(self) -> None:
        # Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
        # uniform(-1/sqrt(in_features), 1/sqrt(in_features)). For details, see
        # https://github.com/pytorch/pytorch/issues/57109
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            init.uniform_(self.bias, -bound, bound)

    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self) -> str:
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

从初始化init函数中可以看到权重矩阵和偏置矩阵的形状，分别是
weight=（out_features, in_features）
bias=(out_features)
也就是说权重矩阵是二维的，偏置矩阵是一维的

举一个例子说明运算过程：

m = nn.Linear(128, 64)
input = torch.randn(512, 3,128,128)
output = m(input)
print(output.size())  # [(512, 3,128,64))

这里输入张量input是5123128128的，那weight矩阵也就是64128的，现在inputweight属于高维矩阵乘低维矩阵，在源码中的注释第一行看到实现的数学运算是y = xA^T + b，A的转置是12864，所以运算过程是input后两维组成的二维子矩阵分别与weight转置后的二维矩阵相乘，结果是二维12864，然后结果再按原顺序拼接起来变成四维，最终结果是512312864，然后再加上偏置矩阵b，偏置矩阵是一维，长度为64，这里属于多维矩阵与一维矩阵相加，在下面举个多维与一维相加的例子：在这里插入图片描述
所以说高维与一维相加的时候，其实是一维矩阵以你高维矩阵最后一维为单位相加，2行3列，那我一维矩阵只需要长度为3，与你逐行相加即可，更高维一样。回到上面的例子，所以结果512312864的矩阵与偏置矩阵64相加结果形状还是512312864。
**3.总结：**Linear输入和输出的维度可以是任意，即不论你是二维，还是三维，甚至是 n 维度都是可以的。这里，通过 nn.Linear 后的输出形状除了最后一个维度，其他的均与输出一样，且这个输出值不会有任何变化。比如 [1， 2，5] 形状的张量，通过 nn.Linear(5, 18) 的线性层，其输出的形状是 [1, 2, 18]，函数内部用到的原理是高维矩阵与低维矩阵相乘和高维矩阵与低维矩阵相加。