2024年非极大值抑制(NMS)及其变种实现_nms及其变种，2024年最新金九银十旗开得胜

最新推荐文章于 2024-08-15 10:12:07 发布

2401_84976170

最新推荐文章于 2024-08-15 10:12:07 发布

阅读量555

点赞数 17

分类专栏：程序员文章标签： c语言 c++ 学习

本文链接：https://blog.csdn.net/2401_84976170/article/details/138862582

版权

程序员专栏收录该内容

39 篇文章 0 订阅

订阅专栏

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上C C++开发知识点，真正体系化！

由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新

如果你需要这些资料，可以戳这里获取

print(Polygon(np.array([[343, 350], [448, 135],
                        [474, 143], [369, 359]])).area)


**适用范围**  
 适应范围：LNMS一般用于轴对齐的矩形框（即水平bbox），特别是离得很近的倾斜文本  
 当图像中有很多文本时候，就会产生大量的检测框(即下图中中间图中绿色的框，这里总共会产生1400多个绿色框，这里我图片压缩过了，比较模糊)；经过LNMS后，得到最终的结果(即下述中的右图，即蓝色框)


#### 倾斜NMS(INMS)


INMS是在2018的文章中提出的，主要是解决倾斜的文本行检测．


**基本步骤**  
 (rbox代表旋转矩形框)  
 1.对输出的检测框rbox按照得分进行降序排序rbox\_lists；  
 2.依次遍历上述的rbox\_lists．具体的做法是：将当前遍历的rbox与剩余的rbox进行交集运算得到相应的相交点集合，并根据判断相交点集合组成的凸边形的面积，计算每两个rbox的IOU；对于大于设定阈值的rbox进行滤除，保留小于设定阈值的rbox；  
 3.得到最终的检测框


**实现代码**

#coding=utf-8
from future import absolute_import
from future import division
from future import print_function

import numpy as np
import cv2
import tensorflow as tf

def nms_rotate(decode_boxes, scores, iou_threshold, max_output_size,
use_angle_condition=False, angle_threshold=0, use_gpu=False, gpu_id=0):
“”"
:param boxes: format [x_c, y_c, w, h, theta]
:param scores: scores of boxes
:param threshold: iou threshold (0.7 or 0.5)
:param max_output_size: max number of output
:return: the remaining index of boxes
“”"
if use_gpu:
#采用gpu方式
keep = nms_rotate_gpu(boxes_list=decode_boxes,
scores=scores,
iou_threshold=iou_threshold,
angle_gap_threshold=angle_threshold,
use_angle_condition=use_angle_condition,
device_id=gpu_id)

    keep = tf.cond(
        tf.greater(tf.shape(keep)[0], max_output_size),
        true_fn=lambda: tf.slice(keep, [0], [max_output_size]),
        false_fn=lambda: keep)
else:　#采用cpu方式
    keep = tf.py_func(nms_rotate_cpu,
                      inp=[decode_boxes, scores, iou_threshold, max_output_size],
                      Tout=tf.int64)
return keep

def nms_rotate_cpu(boxes, scores, iou_threshold, max_output_size):
keep = []　#保留框的结果集合
order = scores.argsort()[::-1]　#对检测结果得分进行降序排序
num = boxes.shape[0]　#获取检测框的个数

suppressed = np.zeros((num), dtype=np.int)
for _i in range(num):
    if len(keep) >= max_output_size:　　＃若当前保留框集合中的个数大于max_output_size时，直接返回
        break

    i = order[_i]
    if suppressed[i] == 1:　＃对于抑制的检测框直接跳过
        continue
    keep.append(i)　　#保留当前框的索引
    r1 = ((boxes[i, 1], boxes[i, 0]), (boxes[i, 3], boxes[i, 2]), boxes[i, 4])  ＃根据box信息组合成opencv中的旋转bbox
    print("r1:{}".format(r1))
    area_r1 = boxes[i, 2] \* boxes[i, 3]　　＃计算当前检测框的面积
    for _j in range(_i + 1, num):　　＃对剩余的而进行遍历
        j = order[_j]
        if suppressed[i] == 1:
            continue
        r2 = ((boxes[j, 1], boxes[j, 0]), (boxes[j, 3], boxes[j, 2]), boxes[j, 4])
        area_r2 = boxes[j, 2] \* boxes[j, 3]
        inter = 0.0

        int_pts = cv2.rotatedRectangleIntersection(r1, r2)[1]　＃求两个旋转矩形的交集，并返回相交的点集合
        if int_pts is not None:
            order_pts = cv2.convexHull(int_pts, returnPoints=True)　#求点集的凸边形
            int_area = cv2.contourArea(order_pts)　　#计算当前点集合组成的凸边形的面积
            inter = int_area \* 1.0 / (area_r1 + area_r2 - int_area + 0.0000001)

        if inter >= iou_threshold:　　#对大于设定阈值的检测框进行滤除
            suppressed[j] = 1

return np.array(keep, np.int64)

gpu的实现方式

def nms_rotate_gpu(boxes_list, scores, iou_threshold, use_angle_condition=False, angle_gap_threshold=0, device_id=0):
if use_angle_condition:
y_c, x_c, h, w, theta = tf.unstack(boxes_list, axis=1)
boxes_list = tf.transpose(tf.stack([x_c, y_c, w, h, theta]))
det_tensor = tf.concat([boxes_list, tf.expand_dims(scores, axis=1)], axis=1)
keep = tf.py_func(rotate_gpu_nms,
inp=[det_tensor, iou_threshold, device_id],
Tout=tf.int64)
return keep
else:
y_c, x_c, h, w, theta = tf.unstack(boxes_list, axis=1)
boxes_list = tf.transpose(tf.stack([x_c, y_c, w, h, theta]))
det_tensor = tf.concat([boxes_list, tf.expand_dims(scores, axis=1)], axis=1)
keep = tf.py_func(rotate_gpu_nms,
inp=[det_tensor, iou_threshold, device_id],
Tout=tf.int64)
keep = tf.reshape(keep, [-1])
return keep

if name == ‘__main__’:
boxes = np.array([[50, 40, 100, 100, 0],
[60, 50, 100, 100, 0],
[50, 30, 100, 100, -45.],
[200, 190, 100, 100, 0.]])

scores = np.array([0.99, 0.88, 0.66, 0.77])
keep = nms_rotate(tf.convert_to_tensor(boxes, dtype=tf.float32), tf.convert_to_tensor(scores, dtype=tf.float32),
                  0.7, 5)
import os
os.environ["CUDA\_VISIBLE\_DEVICES"] = '0'
with tf.Session() as sess:
    print(sess.run(keep))


**适用范围**  
 适用范围：一般适用于倾斜文本检测(即带方向的文本)


#### 多边形NMS(PNMS)


Polygon NMS是在2017年Detecting Curve Text in the Wild: New Dataset and New Solution文章提出的，主要是针对曲线文本提出的．


**基本步骤**  
 其思路和标准NMS一致，将标准NMS中的矩形替换成多边形即可，这里就就不展开详细说明了


**实现代码**

#coding=utf-8
import numpy as np
from shapely.geometry import *

def py_cpu_pnms(dets, thresh):
# 获取检测坐标点及对应的得分
bbox = dets[:, :4]
scores = dets[:, 4]

#这里文本的标注采用14个点，这里获取的是这14个点的偏移
info_bbox = dets[:, 5:33]   

#保存最终点坐标
pts = []
for i in xrange(dets.shape[0]):
    pts.append([[int(bbox[i, 0]) + info_bbox[i, j], int(bbox[i, 1]) + info_bbox[i, j+1]] for j in xrange(0,28,2)])

areas = np.zeros(scores.shape)
#得分降序
order = scores.argsort()[::-1]
inter_areas = np.zeros((scores.shape[0], scores.shape[0]))

for il in xrange(len(pts)):
    ＃当前点集组成多边形，并计算该多边形的面积
    poly = Polygon(pts[il])
    areas[il] = poly.area

    #多剩余的进行遍历
    for jl in xrange(il, len(pts)):
        polyj = Polygon(pts[jl])

        #计算两个多边形的交集，并计算对应的面积
        inS = poly.intersection(polyj)
        inter_areas[il][jl] = inS.area
        inter_areas[jl][il] = inS.area

#下面做法和nms一样
keep = []
while order.size > 0:
    i = order[0]
    keep.append(i)
    ovr = inter_areas[i][order[1:]] / (areas[i] + areas[order[1:]] - inter_areas[i][order[1:]])
    inds = np.where(ovr <= thresh)[0]
    order = order[inds + 1]

return keep


**适用范围**  
 适用范围：一般适用于不规则形状文本的检测（如曲线文本）


#### 掩膜NMS(MNMS)


MNMS是在FTSN文本检测文章中提出的，基于分割掩膜图的基础上进行IOU计算．如果文本检测采用的是基于分割的方法来的话，个人建议采用该方法：1).它可以很好地区分相近实例文本；2)它可以处理任意形状的文本实例


**具体步骤**  
 1.先将所有的检测按照得分进行降序排序box\_lists；  
 2.对box\_lists进行遍历，每次遍历当前box与剩余box的IOU(它是在掩膜的基础上进行计算的，具体计算公式为  
  
 
 
 
 
 M 
 
 
 M 
 
 
 I 
 
 
 = 
 
 
 max 
 
 
 ⁡ 
 
 
 
 ( 
 
 
 
 I 
 
 
 
 I 
 
 
 A 
 
 
 
 
 , 
 
 
 
 I 
 
 
 
 I 
 
 
 B 
 
 
 
 
 ) 
 
 
 
 
 M M I=\max \left(\frac{I}{I\_{A}}, \frac{I}{I\_{B}}\right) 
 
 
 MMI=max(IAI,IBI)  
 )，对于大于设定阈值的box进行滤除；  
 3.得到最终的检测框


**实现代码**

#coding=utf-8
#############################################

mask nms　实现

#############################################
import cv2
import numpy as np
import imutils
import copy

EPS=0.00001

def get_mask(box,mask):
“”“根据box获取对应的掩膜”“”
tmp_mask=np.zeros(mask.shape,dtype=“uint8”)
tmp=np.array(box.tolist(),dtype=np.int32).reshape(-1,2)
cv2.fillPoly(tmp_mask, [tmp], (255))
tmp_mask=cv2.bitwise_and(tmp_mask,mask)
return tmp_mask,cv2.countNonZero(tmp_mask)

def comput_mmi(area_a,area_b,intersect):
“”"
计算MMI,2018.11.23 add
:param mask_a: 实例文本a的mask的面积
:param mask_b: 实例文本b的mask的面积
:param intersect: 实例文本a和实例文本b的相交面积
:return:
“”"
if area_a0 or area_b0:
area_a+=EPS
area_b+=EPS
print(“the area of text is 0”)
return max(float(intersect)/area_a,float(intersect)/area_b)

def mask_nms(dets, mask, thres=0.3):
“”"
mask nms 实现函数
:param dets: 检测结果，是一个N*9的numpy,
:param mask: 当前检测的mask
:param thres: 检测的阈值
“”"
# 获取bbox及对应的score
bbox_infos=dets[:,:8]
scores=dets[:,8]

keep=[]
order=scores.argsort()[::-1]
print("order:{}".format(order))
nums=len(bbox_infos)
suppressed=np.zeros((nums), dtype=np.int)
print("lens:{}".format(nums))

# 循环遍历
for i in range(nums):
    idx=order[i]
    if suppressed[idx]==1:
        continue
    keep.append(idx)
    mask_a,area_a=get_mask(bbox_infos[idx],mask)
    for j in range(i,nums):
        idx_j=order[j]
        if suppressed[idx_j]==1:
            continue
        mask_b, area_b =get_mask(bbox_infos[idx_j],mask)

        # 获取两个文本的相交面积
        merge_mask=cv2.bitwise_and(mask_a,mask_b)
        area_intersect=cv2.countNonZero(merge_mask)

        #计算MMI
        mmi=comput_mmi(area_a,area_b,area_intersect)
        # print("area\_a:{},area\_b:{},inte:{},mmi:{}".format(area\_a,area\_b,area\_intersect,mmi))

        if mmi >= thres:
            suppressed[idx_j] = 1

return dets[keep]


**适用范围**  
 适用范围：采用分割路线的文本检测，都可以适用该方法


### 总结


在文本检测中，考虑到文本方向的多样化．


* 针对水平文本检测：标准的NMS就可以
* 针对基于分割方法的多方向文本检测，优先推荐Mask NMS，当然也可以采用Polygon NMS和Inclined NMS
* 针对基于检测方法的多方向文本检测，优先推荐Polygon NMS和Inclined NMS


### Soft-NMS


#### Motivation


绝大部分目标检测方法，最后都要用到 NMS-非极大值抑制进行后处理。 通常的做法是将检测框按得分排序，然后保留得分最高的框，同时删除与该框重叠面积大于一定比例的其它框。


这种贪心式方法存在如下图所示的问题： 红色框和绿色框是当前的检测结果，二者的得分分别是0.95和0.80。如果按照传统的NMS进行处理，首先选中得分最高的红色框，然后绿色框就会因为与之重叠面积过大而被删掉。


另一方面，NMS的阈值也不太容易确定，设小了会出现下图的情况（绿色框因为和红色框重叠面积较大而被删掉），设置过高又容易增大误检。


*思路：不要粗鲁地删除所有IOU大于阈值的框，而是降低其置信度。*


#### Method


如下图：如文章题目而言，就是用一行代码来替换掉原来的NMS。按照下图整个处理一遍之后，指定一个置信度阈值，然后最后得分大于该阈值的检测框得以保留.  
 ![在这里插入图片描述](https://img-blog.csdnimg.cn/20190621115817519.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3h1X2Z1X3lvbmc=,size_16,color_FFFFFF,t_70)


原来的NMS可以描述如下：将IOU大于阈值的窗口的得分全部置为0。  
  
 
 
 
 
 
 s 
 
 
 i 
 
 
 
 = 
 
 
 
 { 
 
 
 
 
 
 
 
 
 s 
 
 
 i 
 
 
 
 , 
 
 
 
 
 
 
 
 
 iou 
 
 
 ⁡ 
 
 
 
 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 &lt; 
 
 
 
 N 
 
 
 t 
 
 
 
 
 
 
 
 
 
 
 
 0 
 
 
 , 
 
 
 
 
 
 
 
 
 iou 
 
 
 ⁡ 
 
 
 
 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 ≥ 
 
 
 
 N 
 
 
 t 
 
 
 
 
 
 
 
 
 
 
 s\_{i}=\left\{\begin{array}{ll}{s\_{i},} &amp; {\operatorname{iou}\left(\mathcal{M}, b\_{i}\right)&lt;N\_{t}} \\ {0,} &amp; {\operatorname{iou}\left(\mathcal{M}, b\_{i}\right) \geq N\_{t}}\end{array}\right. 
 
 
 si={si,0,iou(M,bi)<Ntiou(M,bi)≥Nt


文章的改进有两种形式，一种是线性加权的：  
  
 
 
 
 
 
 s 
 
 
 i 
 
 
 
 = 
 
 
 
 { 
 
 
 
 
 
 
 
 
 s 
 
 
 i 
 
 
 
 , 
 
 
 
 
 
 
 
 
  iou  
 
 
 
 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 &lt; 
 
 
 
 N 
 
 
 t 
 
 
 
 
 
 
 
 
 
 
 
 
 s 
 
 
 i 
 
 
 
 
 ( 
 
 
 1 
 
 
 − 
 
 
 iou 
 
 
 ⁡ 
 
 
 
 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 ) 
 
 
 
 , 
 
 
 
 
 
 
 
 
  iou  
 
 
 
 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 ≥ 
 
 
 
 N 
 


![img](https://img-blog.csdnimg.cn/img_convert/3b83f2b7948a82c73f97912f487687a1.png)
![img](https://img-blog.csdnimg.cn/img_convert/f8874b4d4ff1b1e96b285796d9a983bc.png)

**既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上C C++开发知识点，真正体系化！**

**由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新**

**[如果你需要这些资料，可以戳这里获取](https://bbs.csdn.net/topics/618668825)**

 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 &lt; 
 
 
 
 N 
 
 
 t 
 
 
 
 
 
 
 
 
 
 
 
 
 s 
 
 
 i 
 
 
 
 
 ( 
 
 
 1 
 
 
 − 
 
 
 iou 
 
 
 ⁡ 
 
 
 
 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 ) 
 
 
 
 , 
 
 
 
 
 
 
 
 
  iou  
 
 
 
 ( 
 
 
 M 
 
 
 , 
 
 
 
 b 
 
 
 i 
 
 
 
 ) 
 
 
 
 ≥ 
 
 
 
 N 
 


[外链图片转存中...(img-cQKGh6et-1715678350954)]
[外链图片转存中...(img-q9ZwuXPX-1715678350954)]

**既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上C C++开发知识点，真正体系化！**

**由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新**

**[如果你需要这些资料，可以戳这里获取](https://bbs.csdn.net/topics/618668825)**

2401_84976170

关注

17
点赞
踩
23

收藏

觉得还不错? 一键收藏
0
评论
2024年非极大值抑制(NMS)及其变种实现_nms及其变种，2024年最新金九银十旗开得胜

fromfromfromimport cv2“”"“”"#采用gpu方式keep = []　#保留框的结果集合order = scores.argsort()[::-1]　#对检测结果得分进行降序排序num = boxes.shape[0]　#获取检测框的个数。
复制链接

扫一扫