详解torch.dot()、torch.outer()、torch.mul()、torch.mm()和torch.matmul()等函数使用技巧

奋进的LY

已于 2024-08-06 14:55:12 修改

阅读量6.3k

点赞数 5

分类专栏： pytorch基础知识文章标签： pytorch

于 2023-03-24 20:02:39 首次发布

本文链接：https://blog.csdn.net/li1784506/article/details/129756233

版权

pytorch基础知识专栏收录该内容

5 篇文章

订阅专栏

本文介绍了PyTorch中几个关键的矩阵和向量运算函数，包括torch.matmul()用于向量和矩阵的乘法，支持多种维度组合；torch.mul()执行逐元素乘法；torch.mm()和torch.mv()分别处理矩阵乘法和矩阵向量乘法；torch.bmm()用于3维张量的矩阵乘法。这些函数在深度学习和数值计算中扮演重要角色。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

引言 torch中的tensor张量之间乘法操作分为向量乘法和矩阵乘法。向量乘法分为内积运算和外积运算，矩阵乘法又分为元素级乘法(Hadamard积)和传统矩阵乘法（第一矩阵列数等于第二矩阵行数），向量和矩阵乘法运算对于初学者而言很容易混淆和错误使用。结合本人在实践操作中的经验，将pytorch中常用torch.dot()、torch.outer()、torch.mul()、torch.mm()和torch.matmul()等函数的用法进行详细介绍和举例说明。

一、向量运算

1.1 内积运算

1.2 外积运算

二、torch.mul()矩阵元素级乘法函数

三、torch.mm()二维矩阵乘法函数

四、torch.matmul()矩阵乘法函数

4.1 1维向量×1维向量的内积运算

4.2 1维向量×2维矩阵或2维矩阵×1维向量

一、向量运算

1.1 内积运算

内积运算(inner product)是两个向量各元素相乘相加，结果是一个标量scalar。对于n维向量 $\vec{a} = (a_{1},a_{2},...,a_{n}) \in R^{^{n}}$ ， $\vec{b} = (b_{1},b_{2},...,b_{n})\in R^{^{n}}$ ， $\vec{a}\cdot \vec{b} = (a_{1}b_{1}+a_{2}b_{2}+...+a_{n}b_{n})$ 。在阅读文献时内积表达式为 $a^{T}b$ , where $a^{T}\in R^{1\ast n}$ , $b\in R^{n\ast 1}$ ，（向量不特殊说明一般指代列向量）。

在torch中使用torch.dot()函数或者torch.matmul()函数(在matmul部分会对向量内积运算进行详细介绍)，示例代码。

x = torch.tensor([2,3])
y = torch.tensor([2,2])

out = torch.dot(x,y)    # out = tensor(10)

对于numpy array 调用函数 numpy.inner(x,y)计算两个向量的内积。

1.2 外积运算

两个向量的外积运算(outer product)是向量中的每一个元素与另外一个向量中的每一个元素相乘，结果不是一个标量，而是一个矩阵。对于m维向量 $\vec{a} = (a_{1},a_{2},...,a_{m}) \in R^{^{m}}$ ，n维向量 $\vec{b} = (b_{1},b_{2},...,b_{n})\in R^{^{n}}$ , $\vec{a}\odot\vec{b} \in R^{m\ast n}$ 。在阅读外文文献时向量外积表达式为 $ab^{T}$ , where $a\in R^{ m\ast 1}$ , $b^{T}\in R^{1\ast n}$ 。

$\vec{a}\odot\vec{b} = \begin{bmatrix} a_{1}b_{1}& a_{1} b_{2}&\cdots &a_{1} b_{n}& \\ a_{2}b_{1}& a_{2} b_{2}&\cdots &a_{2} b_{n}& \\ \vdots &\vdots & \ddots &\vdots \\ a_{m}b_{1}& a_{m} b_{2}&\cdots &a_{m} b_{n}& \end{bmatrix}$

两个向量的外积运算其核心本质是Hadamard积运算，需先对向量a和 $b^{T}$ 采用广播机制变成m*n维度的矩阵，然后每个对应位置元素相乘。两个向量外积运算可以直接使用torch.outer()函数，也可以使用torch.mul()函数（在torch.mul()函数部分会展示Hadamard积的示例代码），但是要注意对向量进行增加维度、转置和广播，保证参与torch.mul()运算的两个向量变形后的矩阵维度相同。

示例代码

# 1.直接使用torch.outer函数
x = torch.tensor([2,3,4,5])
y = torch.tensor([2,2,2])

out = torch.outer(x,y)
print(out)

# 2.对向量进行增加维度和转置操作，随后使用torch.mul函数计算Hadamard积
x = torch.tensor([2,3,4,5])
y = torch.tensor([2,2,2])

x = x.view(-1,1)    #(4,1)
y = y[None,:]   #transpose  (1,3)
print(torch.mul(x,y))

对于numpy array 调用np.outer(x,y)函数，计算两个向量的外积。（注：两个向量的元素数量可以不同）

如果两个矩阵进行外积运算其核心算法是克罗内克积（Kronecker积）。 $A\epsilon R^{m\times n}$ ， $B\epsilon R^{p\times q}$ ,其Kronecker积为 $A\otimes B\in R^{mp\ast nq}$ 。

示例：

矩阵 $A = \begin{bmatrix} a_{1} & a_{2}\\ a_{3}& a_{4} \end{bmatrix}$ ，矩阵 $B = \begin{bmatrix} b_{1} & b_{2}& b_{3}\\ b_{4}& b_{5}& b_{6} \end{bmatrix}$ ,

$A\odot B = \begin{bmatrix} a_{1}B & a_{2}B\\ a_{3}B&a_{4}B \end{bmatrix} = \begin{bmatrix} a_{1}b_{1}& a_{1}b_{2} & a_{1}b_{3}& \vdots & a_{2}b_{1} &a_{2}b_{2}& a_{2}b_{3} \\ a_{1}b_{4}& a_{1}b_{5} & a_{1}b_{6}& \vdots&a_{2}b_{4} &a_{2}b_{5}& a_{2}b_{6} \\ \cdots &\cdots &\cdots&\cdots&\cdots&\cdots&\cdots\\a_{3}b_{1}&a_{3}b_{2} & a_{3}b_{3}& \vdots&a_{4}b_{1} &a_{4}b_{2}& a_{4}b_{3} \\ a_{3}b_{4} & a_{3}b_{5} & a_{3}b_{6}& \vdots&a_{4}b_{4} & a_{4}b_{5} &a_{4}b_{6}\end{bmatrix}$

可以直接调用torch.kron(x,y)函数，也可以自行编写函数。示例代码：

# 第一种方法计算kronecker积函数
def kronecker(x,y):

    res = torch.zeros(x.shape[0]*y.shape[0],x.shape[1]*y.shape[1],dtype=torch.int)
    print(res.shape)
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            res[i*y.shape[0]:(i+1)*y.shape[0],j*y.shape[1]:(j+1)*y.shape[1]] = torch.mul(x[i,j][None,None],y)

    return res


# 第二种方法计算kronecker积函数
def kronecker_product(a, b):
    """
    Kronecker product of matrices a and b with leading batch dimensions.
    Batch dimensions are broadcast. The number of them mush
    :type a: torch.Tensor
    :type b: torch.Tensor
    :rtype: torch.Tensor
    """
    #return torch.stack([torch.kron(ai, bi) for ai, bi in zip(a,b)], dim=0)

    siz1 = torch.Size(torch.tensor(a.shape[-2:]) * torch.tensor(b.shape[-2:])) #[m*p,n*q]
    res = a.unsqueeze(-1).unsqueeze(-3) * b.unsqueeze(-2).unsqueeze(-4)        #a:[m,1,n,1] , b:[1,p,1,q]
    siz0 = res.shape[:-4]
    out = res.reshape(siz0 + siz1)
    return out


# 第三种方法计算kronecker积函数
def kronecker_product_einsum_batched(A: torch.Tensor, B: torch.Tensor):
    """
    Batched Version of Kronecker Products
    :param A: has shape (b, a, c)
    :param B: has shape (b, k, p)
    :return: (b, ak, cp)
    """
    assert A.dim() == 3 and B.dim() == 3
    res = torch.einsum('bac,bkp->bakcp', A, B).view(A.size(0),
                                                    A.size(1)*B.size(1),
                                                    A.size(2)*B.size(2))
    return res


# 向量x,y
x = torch.tensor([[2,3],[4,5]])
y = torch.tensor([[1,1,1],[2,2,2]])

out_1 = kronecker(x,y)
print(out_1)
out_2 = kronecker_product(x,y)
print(out_2)
out_3 = kronecker_product_einsum_batched(x.unsqueeze(0),y.unsqueeze(0))
print(out_3)

#####
tensor([[ 2,  2,  2,  3,  3,  3],
        [ 4,  4,  4,  6,  6,  6],
        [ 4,  4,  4,  5,  5,  5],
        [ 8,  8,  8, 10, 10, 10]])

在调用unsqueeze()增加维度时，实参为正数增加维度为指定位置的前一个维度，为负数时，增加维度位置为指定位置的后一个维度。

x = torch.randn(4,5)
print(x.unsqueeze(0).shape)    #shape:torch.Size([1, 4, 5])
print(x.unsqueeze(1).shape)    #shape:torch.Size([4, 1, 5])
print(x.unsqueeze(2).shape)    #shape:torch.Size([4, 5, 1])
print(x.unsqueeze(-1).shape)   #shape:torch.Size([4, 5, 1])
print(x.unsqueeze(-3).shape)   #shape:torch.Size([1, 4, 5])

注意：在中文文献中exterior product 也翻译成外积，但指的是空间解析几何中的向量积，结果是一个向量。通常也称为矢量积或者叉积(cross product)。在英文文献中严格区分区分cross product和outer product，所以阅读文献时要特别注意。

二、torch.mul()矩阵元素级乘法函数

torch.mul()函数主要对矩阵中的元素实施Hadamard积运算，该运算属于元素级乘法操作。可以直接使用“ * ”替换torch.mul()函数。在矩阵运算中，要求两个矩阵的维度相同，矩阵 $A\epsilon R^{m\times n}$ , $B\epsilon R^{m\times n}$ ，矩阵A和B的Hadamard积 $A\odot B\in R^{m*n}$ 。

矩阵元素级乘法也可以用于【向量×矩阵】的情况，此时要求向量的长度与矩阵最后一个维度相同，采用广播机制将向量变成与矩阵相同的形状，随后进行逐元素相乘操作。

x = torch.tensor([[1,1],[3,3],[4,4]])
y = torch.tensor([2,2])
out1 = torch.mul(x,y)    #等价于out1 = x*y


#结果
tensor([[2, 2],
        [6, 6],
        [8, 8]])

三、torch.mm()二维矩阵乘法函数

torch.mm()只适合于二维矩阵乘法运算，如果矩阵维度超过两个维度则会报错。二维矩阵乘法运算要求第一个矩阵的列数与第二个矩阵的行数相同。

import torch

A = torch.randint(1,5,size=(2,3))
B = torch.randint(1,5,(3,2))
print('A: \n',A)
print('B: \n',B)
result = torch.mm(A,B)
print('result: \n {}'.format(result))

##结果##
A: 
 tensor([[2, 3, 2],
        [1, 4, 4]])
B: 
 tensor([[2, 2],
        [4, 4],
        [2, 3]])
result: 
 tensor([[20, 22],
        [26, 30]])

四、torch.matmul()矩阵乘法函数

torch.matmul()属于广义矩阵乘法函数操作，适用形式有：1维向量×1维向量，1维向量×2维矩阵，2维矩阵×1维向量，任意维度矩阵相乘等。每种情况的具体使用会结合示例代码逐一介绍。

4.1 1维向量×1维向量的内积运算

torch.matmul()函数作用于两个1维向量运算时，两个向量长度相同，主要对两个1维向量进行内积运算（结果为标量scalar）。功能与torch.dot()函数相同(torch.dot()函数只适用于1维向量运算)。

x = torch.tensor([2,3,4])
y = torch.tensor([2,2,2])
out1 = torch.matmul(x,y)  #out1 : tensor(18)
out2 = torch.dot(x,y)     #out2 : tensor(18)

4.2 1维向量×2维矩阵或2维矩阵×1维向量

向量与矩阵做矩阵乘法运算时，需对向量进行增维操作，将其变成2维矩阵，矩阵相乘结束后，结果中增加的维度需要被删除。

1）向量 $a\in R^{m}$ 与矩阵 $B\in R^{m*n}$ 相乘，需先将向量变成矩阵 $A\in R^{1*m}$ ，矩阵乘法维度变化：(1×m)×(m×n)->(1×n)，乘法运算结果矩阵 $R^{1*n}$ 需删除新增维度，删除后的结果变成长度为n的1维向量 $R^{n}$ 。

x = torch.tensor([2,3])
y = torch.tensor([[1,1,1],[2,2,2]])
out = torch.matmul(x,y)  #out:tensor([8, 8, 8])
print(out.shape)         #torch.Size([3])

2）矩阵 $B\in R^{m*n}$ 与向量 $a\in R^{n}$ 相乘，则将向量a增维成矩阵 $A\in R^{n*1}$ ，矩阵乘法维度变化：(m×n)×(n×1)->(m×1),运算结果 $R^{m*1}$ 需删除新增维度，降维成1维向量 $R^{m}$ 。

x = torch.tensor([[3,3,3],[4,4,4]])
y = torch.tensor([2,2,2])
out = torch.matmul(x,y)  #out:tensor([18, 24])
print(out.shape)         #out.shape:torch.Size([2])

4.3 2维矩阵×2维矩阵

两个矩阵相乘时，torch.matmul()函数等价于torch.mm()函数：(m,n)×(n,t)->(m,t)

x = torch.tensor([[1,1],[3,3],[4,4]])
y = torch.tensor([[2,2,2],[5,5,5]])

out1 = torch.matmul(x,y)
print(f"out1: {out1}")

out2 = torch.mm(x,y)
print(f"out2: {out2}")

##结果##
out1: tensor([[ 7,  7,  7],
        [21, 21, 21],
        [28, 28, 28]])
out2: tensor([[ 7,  7,  7],
        [21, 21, 21],
        [28, 28, 28]])

4.4 三维矩阵相乘

对于高于二维的矩阵，第一个矩阵最后一个维度必须和第二个矩阵的倒数第二维度相同。如果是两个三维矩阵相乘，也可以使用torch.bmm()。

x = torch.randn(3,4,5)
y = torch.randn(3,5,2)
result = torch.matmul(x,y)
print(result.shape)    #shape: torch.Size([3, 4, 2])

五、torch.mv()矩阵向量乘法函数

torch.mv()用于执行2维矩阵×1维向量操作，矩阵的最后一个维度与向量长度必须相同。内部运算机理是先对向量末尾进行增维操作变成矩阵，执行矩阵乘法操作后，删除结果的最后一个维度。也可以采用上文提到的torch.matmul()。

x = torch.randint(1,4,(3,5))
y = torch.randint(1,4,(5,))
print(f"x: {x}")
print(f"y: {y}")
result = torch.mv(x,y)
print("result: {}".format(result))
print(result.shape)

##结果##
x: tensor([[2, 1, 1, 1, 2],
        [1, 1, 1, 2, 3],
        [1, 2, 1, 3, 2]])
y: tensor([1, 1, 3, 2, 2])
result: tensor([12, 15, 16])
torch.Size([3])

六、@运算符的矩阵乘法

若mat1和mat2都是两个一维向量，那么对应操作就是torch.dot()
若mat1是二维矩阵，mat2是一维向量，那么对应操作就是torch.mv()
若mat1和mat2都是两个二维矩阵，那么对应操作就是torch.mm()

七、总结与注意

1.向量的运算分为内积和外积运算，内积运算结果为标量，外积运算结果为矩阵(Hadamard积)，如果是矩阵的外积运算其实质就是克罗内克积（Kronecker积）。在使用外积运算时，注意区分cross product和outer product。

2.torch.mul()属于元素级操作，参与运算的矩阵要求形状相同，如果是向量与矩阵相乘，要求向量的长度与矩阵最后一个维度相同。torch.mm()只能执行二维矩阵运算，torch.matmul()适用于多维度矩阵乘法运算。

详解torch.dot()、torch.outer()、torch.mul()、torch.mm()和torch.matmul()等函数使用技巧

一、 向量运算

1.1 内积运算

1.2 外积运算

二、torch.mul()矩阵元素级乘法函数

三、torch.mm()二维矩阵乘法函数

四、torch.matmul()矩阵乘法函数

4.1 1维向量×1维向量的内积运算

4.2 1维向量×2维矩阵或2维矩阵×1维向量

4.3 2维矩阵×2维矩阵

4.4 三维矩阵相乘

五、torch.mv()矩阵向量乘法函数

六、@运算符的矩阵乘法

七、总结与注意

一、向量运算