python实现NMS还是很慢的,使用Cython重新编写,可以大幅度加快速度。
现在我们开始解析Cython程序,首先就是导入库了。
import numpy as np
cimport numpy as np
导入python的numpy接口和Cython的numpy接口 参考 ,并不是每一个库都可以这样导入,参考
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
return a if a >= b else b
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
return a if a <= b else b
这里np.float32_t是C语言的类型,没有_t是python的类型, 参考
inline 修饰符,表示为内联函数,inline 只适合涵数体内代码简单的涵数使用,并且函数本身不能是直接递归函数 。参考
min()和max()是python函数。
def py_cpu_nms(np.ndarray[np.float32_t,ndim=2] dets, np.float thresh):
#定义左上角,右下角坐标,和分数矩阵
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:,0]
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:,1]
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:,2]
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:,3]
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
#定义计算面积公式和结果矩阵,然后进行排序得到从大到小的引索值
cdef np.ndarray[np.float32_t, ndim=1] areas = (y2-y1+1) * (x2-x1+1)
cdef np.ndarray[np.int_t, ndim=1] index = scores.argsort()[::-1]
keep = []
#nedts是numb dets的意思,是方框有几个。
cdef int ndets = dets.shape[0]
cdef np.ndarray[np.int_t, ndim=1] suppressed = np.zeros(ndets, dtype=np.int)
cdef int _i, _j
cdef int i, j
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
cdef np.float32_t w, h
cdef np.float32_t overlap, ious
这里是程序开始前的大量定义, python会花费很多时间判断变量的类型,这里使用静态的变量类型就帮python省去了判断变量类型的时间,参考。
在定义矩阵的时候,最好指出其中的数据类型以及矩阵的维度,比如 cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:,0]
j=0
for _i in range(ndets):
#这里i变量表示当前方框属性。
i = index[_i]
#这里suppressed方框对应位置为1表示,这里已经被抛弃了,就是需要合并的。
if suppressed[i] == 1:
continue
keep.append(i)
ix1 = x1[i]
iy1 = y1[i]
ix2 = x2[i]
iy2 = y2[i]
iarea = areas[i]
for _j in range(_i+1, ndets):
#这里_i表示当前方框,_i+1表示第_i后面的所有方框
#这里j变量,表示要和_i进行计算交并比的的方框属性,是更新的
#这里suppressed方框对应位置为1表示,这里已经被抛弃了,就是需要合并的。
j = index[_j]
if suppressed[j] == 1:
continue
xx1 = max(ix1, x1[j])
yy1 = max(iy1, y1[j])
xx2 = max(ix2, x2[j])
yy2 = max(iy2, y2[j])
w = max(0.0, xx2-xx1+1)
h = max(0.0, yy2-yy1+1)
overlap = w*h
ious = overlap / (iarea + areas[j] - overlap)
if ious>thresh:
suppressed[j] = 1
return keep
这里的程序写法和python有点不一样。
python的主体思路:更新剩余方框来进行遍历所有方框的NMS。
具体步骤是:每次去除掉当前和合并的方框
关键代码:>>idx = np.where(ious<=thresh)[0] >>index = index[idx+1]。
cython的做法有点不一样,他的主体思路:直接遍历拥有的所有方框,跳过已经合并的方框。
具体步骤是:额外使用suppressed数组记录下需要合并的方框位置,也可以说不用遍历,废弃的方框。
关键代码:>> if suppressed[i] == 1: continue >> if ious>thresh: suppressed[j] = 1
总程序如下:
import numpy as np
cimport numpy as np
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
return a if a >= b else b
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
return a if a <= b else b
def py_cpu_nms(np.ndarray[np.float32_t,ndim=2] dets, np.float thresh):
# dets:(m,5) thresh:scaler
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:,0]
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:,1]
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:,2]
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:,3]
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
cdef np.ndarray[np.float32_t, ndim=1] areas = (y2-y1+1) * (x2-x1+1)
cdef np.ndarray[np.int_t, ndim=1] index = scores.argsort()[::-1] # can be rewriten
keep = []
cdef int ndets = dets.shape[0]
cdef np.ndarray[np.int_t, ndim=1] suppressed = np.zeros(ndets, dtype=np.int)
cdef int _i, _j
cdef int i, j
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
cdef np.float32_t w, h
cdef np.float32_t overlap, ious
j=0
for _i in range(ndets):
i = index[_i]
if suppressed[i] == 1:
continue
keep.append(i)
ix1 = x1[i]
iy1 = y1[i]
ix2 = x2[i]
iy2 = y2[i]
iarea = areas[i]
for _j in range(_i+1, ndets):
j = index[_j]
if suppressed[j] == 1:
continue
xx1 = max(ix1, x1[j])
yy1 = max(iy1, y1[j])
xx2 = max(ix2, x2[j])
yy2 = max(iy2, y2[j])
w = max(0.0, xx2-xx1+1)
h = max(0.0, yy2-yy1+1)
overlap = w*h
ious = overlap / (iarea + areas[j] - overlap)
if ious>thresh:
suppressed[j] = 1
return keep