torch.nn.Linear维度解析

问题:为什么Linear参数只有一个一维的输入维度和输出维度,而输入的张量可以是多维?
1. nn.Linear的原理:
从名称就可以看出来,nn.Linear表示的是线性变换,原型就是初级数学里学到的线性函数:y=kx+b
不过在深度学习中,变量都是多维张量,乘法就是矩阵乘法,加法就是矩阵加法
2.nn.Linear的源代码

class Linear(Module):
    r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b`

    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.

    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        bias: If set to ``False``, the layer will not learn an additive bias.
            Default: ``True``

    Shape:
        - Input: :math:`(*, H_{in})` where :math:`*` means any number of
          dimensions including none and :math:`H_{in} = \text{in\_features}`.
        - Output: :math:`(*, H_{out})` where all but the last dimension
          are the same shape as the input and :math:`H_{out} = \text{out\_features}`.
	Attributes:
        weight: the learnable weights of the module of shape
            :math:`(\text{out\_features}, \text{in\_features})`. The values are
            initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
            :math:`k = \frac{1}{\text{in\_features}}`
        bias:   the learnable bias of the module of shape :math:`(\text{out\_features})`.
                If :attr:`bias` is ``True``, the values are initialized from
                :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
                :math:`k = \frac{1}{\text{in\_features}}`

    Examples::

        >>> m = nn.Linear(20, 30)
        >>> input = torch.randn(128, 20)
        >>> output = m(input)
        >>> print(output.size())
        torch.Size([128, 30])
    """
    __constants__ = ['in_features', 'out_features']
    in_features: int
    out_features: int
    weight: Tensor
    def __init__(self, in_features: int, out_features: int, bias: bool = True,
                 device=None, dtype=None) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
        if bias:
            self.bias = Parameter(torch.empty(out_features, **factory_kwargs))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()
    def reset_parameters(self) -> None:
        # Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
        # uniform(-1/sqrt(in_features), 1/sqrt(in_features)). For details, see
        # https://github.com/pytorch/pytorch/issues/57109
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            init.uniform_(self.bias, -bound, bound)

    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self) -> str:
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

从初始化init函数中可以看到权重矩阵和偏置矩阵的形状,分别是
weight=(out_features, in_features)
bias=(out_features)
也就是说权重矩阵是二维的,偏置矩阵是一维的

举一个例子说明运算过程:

m = nn.Linear(128, 64)
input = torch.randn(512, 3,128,128)
output = m(input)
print(output.size())  # [(512, 3,128,64))

这里输入张量input是5123128128的,那weight矩阵也就是64128的,现在inputweight属于高维矩阵乘低维矩阵,在源码中的注释第一行看到实现的数学运算是y = xA^T + b,A的转置是12864,所以运算过程是input后两维组成的二维子矩阵分别与weight转置后的二维矩阵相乘,结果是二维12864,然后结果再按原顺序拼接起来变成四维,最终结果是512312864,然后再加上偏置矩阵b,偏置矩阵是一维,长度为64,这里属于多维矩阵与一维矩阵相加,在下面举个多维与一维相加的例子:在这里插入图片描述
所以说高维与一维相加的时候,其实是一维矩阵以你高维矩阵最后一维为单位相加,2行3列,那我一维矩阵只需要长度为3,与你逐行相加即可,更高维一样。回到上面的例子,所以结果512312864的矩阵与偏置矩阵64相加结果形状还是512312864。
**3.总结:**Linear输入和输出的维度可以是任意,即不论你是二维,还是三维,甚至是 n 维度都是可以的。这里,通过 nn.Linear 后的输出形状除了最后一个维度,其他的均与输出一样,且这个输出值不会有任何变化。比如 [1, 2,5] 形状的张量,通过 nn.Linear(5, 18) 的线性层,其输出的形状是 [1, 2, 18],函数内部用到的原理是高维矩阵与低维矩阵相乘和高维矩阵与低维矩阵相加。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

泡沫不是茶香

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值