论文:Improving Object DetectionWith One Line of Code
论文链接:https://arxiv.org/pdf/1704.04503.pdf
这是ICCV2017的文章,是NMS算法的改进,
原始nms
假设网络预测出了6个矩形框A1~A6,对每个box按照置信度从小到大做排序,排队结果为:A1,A2,A3,A4,A5,A6
(1)从最大置信度矩形框A6开始,分别判断A1,A2,A3,A4,A5与A6的重叠度IOU是否大于某个设定的阈值(假设阈值a=0.7);
(2)假设A2、A3与A6的重叠度超过阈值,那么就扔掉A2、A3;并标记第一个矩形框A6,此时只剩下了矩形框A1,A4,A5,A6;
(3)从剩下的矩形框A1、A4、A5中,选择概率最大的A5,然后判断A1,A4与A5的重叠度,若重叠度大于设定的阈值,那么就扔掉这个box;假设扔掉了A1,则此时只剩下了矩形框A4,A5,A6;
(4)这样就找到所有被保留下来的矩形框A4,A5,A6,而且这些框之间的相互重叠度不高,即iou较小
soft nms
假设网络预测出了6个矩形框A1~A6,对每个box按照置信度从小到大做排序,排队结果为:B0XES=[A1,A2,A3,A4,A5,A6]
假设各自的置信度为(s1,s2,s3,s4,s5,s6)
(1)从最大置信度矩形框A6开始,分别判断A1,A2,A3,A4,A5与A6的重叠度IOU是否大于某个设定的阈值(假设阈值a=0.3);
(2)假设A2、A3与A6的IOU分别为0.4,0.45超过阈值,那么就修改A2、A3的置信度,
s2=s2*(1-iou(A2,A6)=s2*0.6
s3=s3*(1-iou(A3,A6)=s3*0.55
若发现更新后的s2<阈值=0.001,就会舍弃A2,否则就会保留A2,现在假设更新后的s2小于阈值,但s3没有小于阈值=0.001,则A2舍弃,A3保留下来,此时矩形框B0XES=[A1,A3,A4,A5,A6]
(3) 对B0XES按照置信度从小到大做排序,假设排队结果为:B0XES=[A3,A1,A4,A5,A6],注意因为在step2中A3的置信度被降低了,所以这次排队后,A3就可能位置会发生变化
(4)分别判断A3,A1,A2,A4与A5的重叠度IOU是否大于某个设定的阈值,重复步骤(2)
(5) 其他的类同NMS
问题;
1.即使把A2、A3的置信度改小了,造成的后果是按照置信度排队,影响到了排队结果,原来为A1,A2,A3,A4,A5,A6
,修改后可能为A3,A2,A1,A4,A5,A6, 但这个重新排队后,还是需要计算A3,A2,A1,A4,A5与A6的iou, 这时A3,A2与A6的iou没有变,不是照样被舍弃吗?与原始的nms没区别啊
答: 尽管A3,A2与A6的iou没有变,但其置信度被降低,若被降低之后的置信度<给定阈值,A3,A2仍山会被舍弃的,但若被降低之后的置信度>给定阈值,则这个box就会被保留下来,因此相比原始的NMS来讲,soft nms被保留下来的box会比较多.
还是看代码吧,这段代码来自KL-LOSS-master/detectron/utils/cython_nms.pyx-->soft_nms(),
代码地址:https://github.com/yihui-he/KL-Loss
def soft_nms(
np.ndarray[float, ndim=2] boxes_in,
float sigma=0.5,
float Nt=0.3,
float threshold=0.001,
unsigned int method=0
):
boxes = boxes_in.copy()
cdef unsigned int N = boxes.shape[0] #输入的box的个数
cdef float iw, ih, box_area #每个box的宽高,面积
cdef float ua
cdef int pos = 0
cdef float maxscore = 0
cdef int maxpos = 0
cdef float x1, x2, y1, y2, tx1, tx2, ty1, ty2, ts, area, weight, ov
inds = np.arange(N)
for i in range(N):#遍历所有的box
maxscore = boxes[i, 4]#记录的这个box的置信度
maxpos = i #不知道哪个最高的时候,假设第一个就是最大的置信度值
tx1 = boxes[i,0]
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
ti = inds[i]
pos = i + 1
# get max box
while pos < N:
if maxscore < boxes[pos, 4]:#当当前记录的最大置信度小于新出现的box的置信度时
maxscore = boxes[pos, 4] #会更新最大值信度值
maxpos = pos #记录到底是哪个box的置信度最大
pos = pos + 1
# add max box as a detection
boxes[i,0] = boxes[maxpos,0]
boxes[i,1] = boxes[maxpos,1]
boxes[i,2] = boxes[maxpos,2]
boxes[i,3] = boxes[maxpos,3]
boxes[i,4] = boxes[maxpos,4]
inds[i] = inds[maxpos]
# swap ith box with position of max box
boxes[maxpos,0] = tx1
boxes[maxpos,1] = ty1
boxes[maxpos,2] = tx2
boxes[maxpos,3] = ty2
boxes[maxpos,4] = ts
inds[maxpos] = ti
tx1 = boxes[i,0]
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
pos = i + 1
# NMS iterations, note that N changes if detection boxes fall below
# threshold
while pos < N:#现在boxes中第一个box的b1置信度最高,现在把b1与其他的所有box算iou
x1 = boxes[pos, 0]#比如我现在取出了一个box b2
y1 = boxes[pos, 1]
x2 = boxes[pos, 2]
y2 = boxes[pos, 3]
s = boxes[pos, 4]
area = (x2 - x1 + 1) * (y2 - y1 + 1)#计算b2的面积,
iw = (min(tx2, x2) - max(tx1, x1) + 1)#计算重合部分的宽
if iw > 0:
ih = (min(ty2, y2) - max(ty1, y1) + 1)#计算重合部分的高
if ih > 0:
ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)#两个box并集的面积
ov = iw * ih / ua #iou between max box and detection box#两个box重合部分的面积与并集的面积的比值=iou
if method == 1: # linear 这里是soft nms的部分
if ov > Nt:#若两个box的iou大于阈值
weight = 1 - ov#这个box b2的权重就是1-iou, 可以看到iou越大,这个box的权重越小
else: #若两个box的iou小于阈值,也就是这俩box .b2 和b1重合了,但重合部分比较少
weight = 1
elif method == 2: # gaussian
weight = np.exp(-(ov * ov)/sigma)
else: # original NMS
if ov > Nt:
weight = 0
else:
weight = 1
boxes[pos, 4] = weight*boxes[pos, 4]#更新box b2的置信度=score_new=score_old*weight, 可以看出若b2与b1的iou比较大,其置信度会被降低
# if box score falls below threshold, discard the box by
# swapping with last box update N
if boxes[pos, 4] < threshold: #若经过更新后b2的置信度小于了阈值=0.001,
boxes[pos,0] = boxes[N-1, 0]#就会更新b2=bN
boxes[pos,1] = boxes[N-1, 1]
boxes[pos,2] = boxes[N-1, 2]
boxes[pos,3] = boxes[N-1, 3]
boxes[pos,4] = boxes[N-1, 4]
inds[pos] = inds[N-1]
N = N - 1 #把整个boxes的长度减小1,比如原来boxes=[b1,b2,b3,b4,b5,b6]的话,N=6,其中b1是置信度最大的,这里若b2的置信度经过更新后比阈值小,
#b2=b6,N=5,即boxes更新为boxes=[b1,b6,b3,b4,b5],即把原来的b2舍弃了,当然,若b2与b1 iou很大,但更新后的b2的置信度仍然比较大的化,b2就被保留下来,
#而原始的NMS方法不会保留b2,而是直接删除了b2
pos = pos - 1
pos = pos + 1
return boxes[:N], inds[:N]
还有一份python 版,比较直接的
def softnms(dets, sc, Nt=0.3, sigma=0.5, thresh=0.001, method=2):
"""
py_cpu_softnms
:param dets: boexs 坐标矩阵 format [y1, x1, y2, x2]
:param sc: 每个 boxes 对应的分数
:param Nt: iou 交叠门限
:param sigma: 使用 gaussian 函数的方差
:param thresh: 最后的分数门限
:param method: 使用的方法
:return: 留下的 boxes 的 index
"""
# indexes concatenate boxes with the last column
N = dets.shape[0] # K.gather(boxes, index2)#传入的box的总的个数,比若,若传进来的
# box=[[200. 200. 400. 400.]
# [220. 220. 420. 420]], 此时N=2
indexes = np.array([np.arange(N)])#标记每个box的索引
dets = np.concatenate((dets, indexes.T), axis=1)#把box的每一组坐标与索引号拼接起来,现在变为
# [[200. 200. 400. 400. 0.]
# [220. 220. 420. 420. 1.]
# the order of boxes coordinate is [y1,x1,y2,x2]
y1 = dets[:, 0]
x1 = dets[:, 1]
y2 = dets[:, 2]
x2 = dets[:, 3]
scores = sc
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
for i in range(N):
# intermediate parameters for later parameters exchange
tBD = dets[i, :].copy()
tscore = scores[i].copy()
tarea = areas[i].copy()
pos = i + 1
#
if i != N-1:
maxscore = np.max(scores[pos:], axis=0)
maxpos = np.argmax(scores[pos:], axis=0)
else:
maxscore = scores[-1]
maxpos = 0
if tscore < maxscore:
dets[i, :] = dets[maxpos + i + 1, :]
dets[maxpos + i + 1, :] = tBD
tBD = dets[i, :]
scores[i] = scores[maxpos + i + 1]
scores[maxpos + i + 1] = tscore
tscore = scores[i]
areas[i] = areas[maxpos + i + 1]
areas[maxpos + i + 1] = tarea
tarea = areas[i]
# IoU calculatescores[0]
xx1 = np.maximum(dets[i, 1], dets[pos:, 1])
yy1 = np.maximum(dets[i, 0], dets[pos:, 0])
xx2 = np.minimum(dets[i, 3], dets[pos:, 3])
yy2 = np.minimum(dets[i, 2], dets[pos:, 2])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[pos:] - inter)
# Three methods: 1.linear 2.gaussian 3.original NMS
if method == 1: # linear
weight = np.ones(ovr.shape)
weight[ovr > Nt] = weight[ovr > Nt] - ovr[ovr > Nt]
elif method == 2: # gaussian
weight = np.exp(-(ovr * ovr) / sigma)
else: # original NMS
weight = np.ones(ovr.shape)
weight[ovr > Nt] = 0# print('匹配的',keep)
scores[pos:] = weight * scores[pos:]
# select the boxes and keep the corresponding indexes
inds = dets[:, 4][scores > thresh]
# print('inds',inds)
keep = inds.astype(int)[:20]
# print('keep',len(keep),keep)
#
# print('keep box',dets[keep])
# print('dets', len(dets), dets)
#
# print('keepscore',sc[keep])
# print('sc', sc)
return keep
if __name__ == '__main__':
# boxes and scores
boxes = np.array([[200, 200, 400, 400], [220, 220, 420, 420], [200, 240, 400, 440], [240, 200, 440, 400], [1, 1, 2, 2]], dtype=np.float32)
boxscores = np.array([0.9, 0.8, 0.7, 0.6, 0.5], dtype=np.float32)
# tf.image.non_max_suppression 中 boxes 是 [y1,x1,y2,x2] 排序的。
with tf.Session() as sess:
index = sess.run(tf.image.non_max_suppression(boxes=boxes, scores=boxscores, iou_threshold=0.5, max_output_size=5))
print(sess.run(K.gather(boxes, index)))
index2=softnms(boxes, boxscores, method=2)
selected_boxes = sess.run(K.gather(boxes, index2))
print(selected_boxes)#[[200. 200. 400. 400.]
# [ 1. 1. 2. 2.]]