numpy 加速矩阵相加_如何加速numpy数组和非常大的矩阵之间的余弦相似度？

最新推荐文章于 2022-09-23 10:40:27 发布

犀牛故事陈墨

最新推荐文章于 2022-09-23 10:40:27 发布

阅读量184

点赞数

文章标签： numpy 加速矩阵相加

本文链接：https://blog.csdn.net/weixin_31570865/article/details/113635826

版权

我有一个问题,需要计算一个numpy形状(1,300)和一个形状矩阵(5000000,300)之间的余弦相似性.我尝试了多种不同的代码,现在我想知道是否有办法大幅减少运行时间：

版本1：我将我的大矩阵分成5个较小的矩阵,每个矩阵大小为1Mil：

from scipy import spatial

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def cos_matrix_multiplication(vector,matrix_1):

v = vector.reshape(1, -1)

scores1=spatial.distance.cdist(matrix_1, v, 'cosine')

return((scores1[:1]))

pool = ThreadPoolExecutor(8)

URLS=[mat_small1,mat_small2,mat_small3,mat_small4,mat_small5]

neighbors=[]

with concurrent.futures.ThreadPoolExecutor(max_workers=30) as executor:

# Start the load operations and mark each future with its URL

future_to_url = {executor.submit(cos_matrix_multiplication,vec,mat_col): mat_col for mat_col in URLS}

for future in concurrent.futures.as_completed(future_to_url):

url = future_to_url[future]

data = future.result()

neighbors.append(data)

运行时间：2.48秒

版本2：使用Numba jit：受此SO answer的启发

@numba.jit('void(f4, f4)',nogil=True)

def cosine_sim(A,B):

scores = np.zeros(A.shape[0])

for i in range(A.shape[0]):

v = A[i]

m = B.shape[1]

udotv = 0

u_norm = 0

v_norm = 0

for j in range(m):

udotv += B[0][j] * v[j]

u_norm += B[0][j] * B[0][j]

v_norm += v[j] * v[j]

ratio = udotv/((u_norm*v_norm)**0.5)

scores[i] = ratio

i += 1

return scores

cosine_sim(matrix,vec)

运行时间2.34秒

版本3：使用Cuda jit(每次都无法以可重现的方式工作)

@cuda.jit

def cosine_sim(A,B,C):

#scores = np.zeros(A.shape[0])

for i in range(A.shape[0]):

v = A[i]

m = B.shape[1]

udotv = 0

u_norm = 0

v_norm = 0

for j in range(m):

udotv += B[0][j] * v[j]

u_norm += B[0][j] * B[0][j]

v_norm += v[j] * v[j]

u_norm = math.sqrt(u_norm)

v_norm = math.sqrt(v_norm)

if (u_norm == 0) or (v_norm == 0):

ratio = 1.0

else:

ratio = udotv / (u_norm * v_norm)

C[i,1] = ratio

i += 1

matrix = mat_small1

A_global_mem = cuda.to_device(matrix)

B_global_mem = cuda.to_device(vec)

C_global_mem = cuda.device_array((matrix.shape[0], 1))

threadsperblock = (16, 16)

blockspergrid_x = int(math.ceil(A_global_mem.shape[0] / threadsperblock[0]))

blockspergrid_y = int(math.ceil(B_global_mem.shape[1] / threadsperblock[1]))

blockspergrid = (blockspergrid_x, blockspergrid_y)

cosine_sim[blockspergrid, threadsperblock](A_global_mem, B_global_mem, C_global_mem)

C = C_global_mem.copy_to_host()

结果是：

CudaAPIError：[702]调用cuMemcpyDtoH导致CUDA_ERROR_LAUNCH_TIMEOUT

矩阵是密集的,My GPU是8gb ram,矩阵的总大小约为4.7gb. GPU可以加速吗？

犀牛故事陈墨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
numpy 加速矩阵相加_如何加速numpy数组和非常大的矩阵之间的余弦相似度？

我有一个问题,需要计算一个numpy形状(1,300)和一个形状矩阵(5000000,300)之间的余弦相似性.我尝试了多种不同的代码,现在我想知道是否有办法大幅减少运行时间：版本1：我将我的大矩阵分成5个较小的矩阵,每个矩阵大小为1Mil：from scipy import spatialfrom concurrent.futures import ThreadPoolExecutor, Pro...
复制链接

扫一扫

numpy 加速 矩阵相加_如何加速numpy数组和非常大的矩阵之间的余弦相似度？

numpy 加速矩阵相加_如何加速numpy数组和非常大的矩阵之间的余弦相似度？