方法1:我们可以简单地使用^{}及其{}距离功能-from scipy.spatial.distance import cdist
val_out = 1 - cdist(array1, array2, 'cosine')
方法2:另一种使用^{}-
^{pr2}$
方法3:使用^{}计算另一个的自平方和-def cosine_vectorized_v2(array1, array2):
sumyy = np.einsum('ij,ij->i',array2,array2)
sumxx = np.einsum('ij,ij->i',array1,array1)[:,None]
sumxy = array1.dot(array2.T)
return (sumxy/np.sqrt(sumxx))/np.sqrt(sumyy)
方法4:引入^{} module来卸载另一个方法的square-root计算-import numexpr as ne
def cosine_vectorized_v3(array1, array2):
sumyy = np.einsum('ij,ij->i',array2,array2)
sumxx = np.einsum('ij,ij->i',array1,array1)[:,None]
sumxy = array1.dot(array2.T)
sqrt_sumxx = ne.evaluate('sqrt(sumxx)')
sqrt_sumyy = ne.evaluate('sqrt(sumyy)')
return ne.evaluate('(sumxy/sqrt_sumxx)/sqrt_sumyy')
运行时测试# Using same sizes as stated in the question
In [185]: array1 = np.random.rand(10000,100)
...: array2 = np.random.rand(5000,100)
...:
In [194]: %timeit 1 - cdist(array1, array2, 'cosine')
1 loops, best of 3: 366 ms per loop
In [195]: %timeit cosine_vectorized(array1, array2)
1 loops, best of 3: 287 ms per loop
In [196]: %timeit cosine_vectorized_v2(array1, array2)
1 loops, best of 3: 283 ms per loop
In [197]: %timeit cosine_vectorized_v3(array1, array2)
1 loops, best of 3: 217 ms per loop