由于楼主最近在打比赛,要对目标检测后的结果做分类,下游的分类任务需要与类别标签库中的图片做相似度计算,需要求距离矩阵,目标检测框出的图像特征reshape后是一个479432048的矩阵,总共有近1亿个元素,带查询库中的图像特征是310692048的矩阵,要求这两个超级大矩阵的距离矩阵。
首先想到的最朴素的思想就是for循环;尝试了之后发现需要耗费1200秒时间,但是比赛限制了inference时间要在30min以内,于是我想要加速
经过查询后得知,numpy的广播机制对于矩阵运算的效率提升十分明显,于是将for循环转换成numpy运算如下:
# -*-coding: utf-8-*-
'''
*Author :Jianfeng Zhang
*e-mail :13052931019@163.com
*Blog :https://me.csdn.net/qq_39004111
*Github :https://github.com/JianfengZhang112358
*Data :
*Description:
'''
import numpy as np
import time
def EuclideanDistances(A, B):
BT = B.T
vecProd = A * BT
SqA = A.getA()**2
sumSqA = np.matrix(np.sum(SqA, axis=1))
sumSqAEx = np.tile(sumSqA.transpose(), (1, vecProd.shape[1]))
SqB = B.getA()**2
sumSqB = np.sum(SqB, axis=1)
sumSqBEx = np.tile(sumSqB, (vecProd.shape[0], 1))
SqED = sumSqBEx + sumSqAEx - 2*vecProd
ED = (SqED.getA())**0.5
return np.matrix(ED)
a = np.array(
[[ 4, 9, 9, 12],
[11, 8, 1, 3],
[ 2, 6, 4, 5],
[ 4, 7, 11, 96],
[ 6, 7, 9, 58]])
b = np.array(
[[ 5, 8, 8, 78],
[10, 5, 1, 12],
[ 6, 6, 9, 54],
[ 2, 1, 2, 8],
[ 9, 1, 3, 12],
[ 6, 1, 6, 45],
[ 9, 10, 3, 33],
[10, 4, 8, 7]]
)
#最low的for循环
results = np.zeros((len(a),len(b)))
s = time.time()
for i in range(len(a)):
for j in range(len(b)):
results[i,j] = np.sum((a[i]-b[j])**2,axis=-1)**0.5
e = time.time()
print('Time: %.6f s' % (e - s))
s = time.time()
c = np.sum((a[:,None] - b)**2, axis=-1)**.5 #这个是最快的(但是需要大内存)
e = time.time()
print('Time: %.6f s'%(e-s))
a = np.matrix(a)
b = np.matrix(b)
s = time.time()
c = EuclideanDistances(a,b)
e = time.time()
print('Time: %.6f s'%(e-s))
采用numpy的广播机制后,速度提升了两个数量级,只需要4.8秒,效果十分Nice。