矢量化操作

最新推荐文章于 2024-09-17 23:15:58 发布

lupinjia

最新推荐文章于 2024-09-17 23:15:58 发布

阅读量257

点赞数 2

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/svfsvadfv/article/details/142312622

版权

Python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

约定

本文中的”向量”均指一维数组/张量，”矩阵”均值二维数组/张量

前言

在ML当中，向量和矩阵非常常见。由于之前使用C语言的惯性，本人经常会从标量的角度考虑向量和矩阵的运算，也就是用for循环来完成向量或矩阵的运算。实际上，for循环的风格比python内置的操作或pytorch的函数要更加耗费时间并且实现起来更复杂。

接下来我将会通过加法和乘法演示矢量化的效果。

加法

首先定义一个矢量加法函数备用：

def tensor_add(a, b):
    '''
    only up to 2D tensor
    do not support broadcasting
    '''
    if a.dim() == 1 and b.dim() == 1: # both are vectors
        length = a.size(0)
        c = torch.zeros(length, dtype=a.dtype)
        for i in range(length):
            c[i] = a[i] + b[i]
        return c
    elif a.dim() == 2 and b.dim() == 2: # both are matrices
        rows, cols = a.size()
        c = torch.zeros(rows, cols, dtype=a.dtype)
        for i in range(rows):
            for j in range(cols):
                c[i, j] = a[i, j] + b[i, j]
        return c
    else:
        raise Exception('Unsupported tensor dimensions')

然后比较for循环和python内置加号的性能：

n = 1000
a = torch.ones(n)
b = torch.ones(n)

start_time = time.time()
c = tensor_add(a, b)
end_time = time.time()
print("Time taken(for-loop):", end_time - start_time)

start_time = time.time()
c = a + b
end_time = time.time()
print("Time taken(vectorized):", end_time - start_time)

Time taken(for-loop): 0.005979299545288086
Time taken(vectorized): 0.0001308917999267578

矩阵加法：

A = torch.ones((200, 200))
B = torch.ones((200, 200))

start_time = time.time()
C = tensor_add(A, B)
end_time = time.time()
print("Time taken(for-loop):", end_time - start_time)

start_time = time.time()
C = A + B
end_time = time.time()
print("Time taken(vectorized):", end_time - start_time)

Time taken(for-loop): 0.214155912399292
Time taken(vectorized): 0.0006866455078125

我们可以看到随着张量的尺寸变大，for循环和矢量化间的时间消耗差距也在变大。

乘法

对于乘法，我们也是首先定义一个函数来实施for循环乘法：

def tensor_multiply(a, b):
    '''
    only up to 2D tensor
    for vector-vector, cross product is computed
    for matrix-matrix, matrix multiplication is computed
    do not support matrix-vector multiplication
    '''
    if a.dim() == 1 and b.dim() == 1: # both are vectors
        if a.size(0)!= b.size(0):
            raise Exception('Vector dimensions do not match')
        length = a.size(0)
        c = torch.zeros((length, length), dtype=a.dtype)
        for i in range(length):
            for j in range(length):
                c[i, j] = a[i] * b[j]
        return c
    elif a.dim() == 2 and b.dim() == 2: # both are matrices
        if a.size(1)!= b.size(0):
            raise Exception('Matrix dimensions do not match')
        rows = a.size(0)
        cols = b.size(1)
        c = torch.zeros(rows, cols, dtype=a.dtype)
        for i in range(rows):
            for j in range(cols):
                for k in range(a.size(1)): # 3 loops for matrix multiplication
                    c[i, j] += a[i, k] * b[k, j]
        return c
    else:
        raise Exception('Unsupported tensor dimensions')

然后，比较for循环和矢量化方式的性能：

start_time = time.time()
c = tensor_multiply(a, b)
end_time = time.time()
print("Time taken(for-loop):", end_time - start_time)

start_time = time.time()
c = torch.outer(a, b) # outer product is different from cross product https://blog.csdn.net/Dust_Evc/article/details/127502272
end_time = time.time()
print("Time taken(vectorized):", end_time - start_time)

Time taken(for-loop): 3.7471888065338135
Time taken(vectorized): 0.00025653839111328125

对于矩阵：

start_time = time.time()
C = tensor_multiply(A, B)
end_time = time.time()
print("Time taken(for-loop):", end_time - start_time)

start_time = time.time()
C = A@B
end_time = time.time()
print("Time taken(vectorized):", end_time - start_time)

Time taken(for-loop): 54.04433226585388
Time taken(vectorized): 0.011148929595947266

总结

我们可以看到，使用python内置方法或pytorch函数可以极大地加速向量和矩阵运算。

资源

本博客的jupyter notebook文件

lupinjia

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录