python 多线程并行矩阵乘法_python - NumPy / SciPy中的多线程整数矩阵乘法 - 堆栈内存溢出...

最新推荐文章于 2024-07-15 12:47:31 发布

weixin_39867893

最新推荐文章于 2024-07-15 12:47:31 发布

阅读量246

点赞数 1

文章标签： python 多线程并行矩阵乘法

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39867893/article/details/111435261

版权

本文介绍了如何使用Python的多线程和NumPy库进行矩阵乘法的并行计算。通过将矩阵产品分区并用轻量级线程并行执行，实现了比单线程`np.dot`更快的执行速度，实测显示在具有4个物理核心的机器上，加速比例约为4倍。

摘要由CSDN通过智能技术生成

选项5 - 滚动自定义解决方案：将矩阵产品分区为几个子产品并并行执行。使用标准Python模块可以相对容易地实现这一点。子产品使用numpy.dot计算，它释放全局解释器锁。因此，可以使用相对轻量级的线程并且可以从主线程访问阵列以提高存储器效率。

执行：

import numpy as np

from numpy.testing import assert_array_equal

import threading

from time import time

def blockshaped(arr, nrows, ncols):

"""

Return an array of shape (nrows, ncols, n, m) where

n * nrows, m * ncols = arr.shape.

This should be a view of the original array.

"""

h, w = arr.shape

n, m = h // nrows, w // ncols

return arr.reshape(nrows, n, ncols, m).swapaxes(1, 2)

def do_dot(a, b, out):

#np.dot(a, b, out) # does not work. maybe because out is not C-contiguous?

out[:] = np.dot(a, b) # less efficient because the output is stored in a temporary array?

def pardot(a, b, nblocks, mblocks, dot_func=do_dot):

"""

Return the matrix product a * b.

The product is split into nblocks * mblocks partitions that are performed

in parallel threads.

"""

n_jobs = nblocks * mblocks

print('running {} jobs in parallel'.format(n_jobs))

out = np.empty((a.shape[0], b.shape[1]), dtype=a.dtype)

out_blocks = blockshaped(out, nblocks, mblocks)

a_blocks = blockshaped(a, nblocks, 1)

b_blocks = blockshaped(b, 1, mblocks)

threads = []

for i in range(nblocks):

for j in range(mblocks):

th = threading.Thread(target=dot_func,

args=(a_blocks[i, 0, :, :],

b_blocks[0, j, :, :],

out_blocks[i, j, :, :]))

th.start()

threads.append(th)

for th in threads:

th.join()

return out

if __name__ == '__main__':

a = np.ones((4, 3), dtype=int)

b = np.arange(18, dtype=int).reshape(3, 6)

assert_array_equal(pardot(a, b, 2, 2), np.dot(a, b))

a = np.random.randn(1500, 1500).astype(int)

start = time()

pardot(a, a, 2, 4)

time_par = time() - start

print('pardot: {:.2f} seconds taken'.format(time_par))

start = time()

np.dot(a, a)

time_dot = time() - start

print('np.dot: {:.2f} seconds taken'.format(time_dot))

通过这种实现，我获得了大约x4的加速，这是我的机器中的核心物理数量：

running 8 jobs in parallel

pardot: 5.45 seconds taken

np.dot: 22.30 seconds taken

weixin_39867893

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。