【yolo】非极大值抑制NMS在yolov5的使用与实现详解,含简单示例与代码解析

忆世界

已于 2024-07-02 09:47:03 修改

阅读量3.3k

点赞数 20

分类专栏： YOLO 目标检测文章标签： YOLO 目标检测人工智能

于 2024-07-02 09:44:40 首次发布

本文链接：https://blog.csdn.net/qq_51677409/article/details/140118141

版权

YOLO 目标检测专栏收录该内容

7 篇文章

订阅专栏

yolov5 非极大值抑制NMS

Non-Maximum Suppression (NMS)

iou指标

介绍

🚀交并比（IoU， Intersection over Union）是一种计算不同图像相互重叠比例的算法，经常被用于深度学习领域的目标检测或语义分割任务中。

在我们得到模型输出的预测框位置后，也可以计算输出框与真实框（Ground Truth Bound）之间的 IoU，此时，这个框的取值范围为 0～1，0 表示两个框不相交，1 表示两个框正好重合。

1-IOU 表示真实框与预测框之间的差异，如果用 1-IOU，这时的取值范围还是 0～1，但是变成了 0 表示两个框重合，1 表示两个框不相交，这样也就符合了模型自动求极小值的要求。因此，可以使用1-IOU来表示模型的损失函数（Loss 函数）。

🎯IoU 的定义如下：
$IoU=\frac{A\cap B}{A\cup B}$

✨直观来讲，IoU 就是两个图形面积的交集和并集的比值

在这里插入图片描述

计算

上面介绍了IoU原理,下面是IoU简单计算的原理图，我们需要先计算出相交部分黄色的面积，然后再计算蓝框的面积与绿框围成面积的总和，然后计算两者的比值，如下：假设一个格子的面积是1，交集黄色部分的面积为2x2=4，蓝框与绿框围成面积总和为3x3+4x4-2x2=21，所以IOU=4/21=0.19;

在这里插入图片描述

在代码中并不是采用上面的计算方法，而是使用坐标进行计算，如下图，矩形 AC 与矩形 BD 相交，它们的顶点A、B、C、D，分别是：A(0,0)、B(3,2)、C(6,8)、D(9,10)

在这里插入图片描述

📟此时 IoU 的计算公式应为：

在这里插入图片描述

带入 A、B、C、D 四点的实际坐标后，可以得到：

在这里插入图片描述

import numpy as np

def IoU(box1, box2):
    # 计算中间矩形的宽高
    in_w = min(box1[2], box2[2]) - max(box1[0], box2[0])
    in_h = min(box1[3], box2[3]) - max(box1[1], box2[1])

    # 计算交集、并集面积
    inter = 0 if in_w <= 0 or in_h <= 0 else in_h * in_w
    union = (box2[2] - box2[0]) * (box2[3] - box2[1]) +\
            (box1[2] - box1[0]) * (box1[3] - box1[1]) - inter
    # 计算IoU
    iou = inter / union
    return iou

if __name__ == "__main__":
    box1 = [0, 0, 6, 8]  # [左上角x坐标，左上角y坐标，右下角x坐标，右下角y坐标]
    box2 = [3, 2, 9, 10]
    print(IoU(box1, box2))

运行结果：

0.23076923076923078

非极大值抑制简单算法实现

非极大值抑制（Non-Maximum Suppression，NMS）是一种图像处理中的技术。它通常用于目标检测中，其主要作用是去除检测出来的冗余框，只保留最有可能包含目标物体的框，保留最优的检测结果。

在目标检测中，我们通常使用一个检测器来检测出可能存在的物体，并给出其位置和大小的预测框。然而，同一个物体可能会被多次检测出来，从而产生多个预测框。这时，我们就需要使用NMS来去除掉这些重叠的框，只保留最优的一个。

其基本原理是先在图像中找到所有可能包含目标物体的矩形区域，并按照它们的置信度进行排列。然后从置信度最高的矩形开始，遍历所有的矩形，如果发现当前的矩形与前面任意一个矩形的重叠面积大于一个阈值，则将当前矩形舍去。使得最终保留的预测框数量最少，但同时又能够保证检测的准确性和召回率。具体的实现方法包括以下几个步骤：

对于每个类别，按照预测框的置信度进行排序，将置信度最高的预测框作为基准。
从剩余的预测框中选择一个与基准框的重叠面积最大的框，如果其重叠面积大于一定的阈值，则将其删除。
对于剩余的预测框，重复步骤2，直到所有的重叠面积都小于阈值，或者没有被删除的框剩余为止。

通过这样的方式，NMS可以过滤掉所有与基准框重叠面积大于阈值的冗余框，从而实现检测结果的优化。值得注意的是，NMS的阈值通常需要根据具体的数据集和应用场景进行调整，以兼顾准确性和召回率。

依赖包

import numpy as np 
import matplotlib.pyplot as plt
#安装
#pip install numpy==1.19.5 -i https://pypi.tuna.tsinghua.edu.cn/simple/
#pip install matplotlib==3.2.2 -i https://pypi.tuna.tsinghua.edu.cn/simple/

nms算法

#nms 算法
def py_cpu_nms(dets, thresh):
    #边界框的坐标
    x1 = dets[:, 0]#所有行第一列
    y1 = dets[:, 1]#所有行第二列
    x2 = dets[:, 2]#所有行第三列
    y2 = dets[:, 3]#所有行第四列
    #计算边界框的面积
    areas = (y2 - y1 + 1) * (x2 - x1 + 1) #(第四列 - 第二列 + 1) * (第三列 - 第一列 + 1)
    #执行度，包围盒的信心分数
    scores = dets[:, 4]#所有行第五列
 
    keep = []#保留
 
    #按边界框的置信度得分排序   尾部加上[::-1] 倒序的意思 如果没有[::-1] argsort返回的是从小到大的
    index = scores.argsort()[::-1]#对所有行的第五列进行从大到小排序，返回索引值
 
    #迭代边界框
    while index.size > 0: # 6 > 0,      3 > 0,      2 > 0
        i = index[0]  # every time the first is the biggst, and add it directly每次第一个是最大的，直接加进去
        keep.append(i)#保存
        #计算并集上交点的纵坐标（IOU）
        x11 = np.maximum(x1[i], x1[index[1:]])  # calculate the points of overlap计算重叠点
        y11 = np.maximum(y1[i], y1[index[1:]])  # index[1:] 从下标为1的数开始，直到结束
        x22 = np.minimum(x2[i], x2[index[1:]])
        y22 = np.minimum(y2[i], y2[index[1:]])
 
        #计算并集上的相交面积
        w = np.maximum(0, x22 - x11 + 1)  # the weights of overlap重叠权值、宽度
        h = np.maximum(0, y22 - y11 + 1)  # the height of overlap重叠高度
        overlaps = w * h# 重叠部分、交集
 
        #IoU：intersection-over-union的本质是搜索局部极大值，抑制非极大值元素。即两个边界框的交集部分除以它们的并集。
        #          重叠部分 / （面积[i] + 面积[索引[1:]] - 重叠部分）
        ious = overlaps / (areas[i] + areas[index[1:]] - overlaps)#重叠部分就是交集，iou = 交集 / 并集
        print("ious", ious)
        #               ious <= 0.7
        idx = np.where(ious <= thresh)[0]#判断阈值
        print("idx", idx)
        index = index[idx + 1]  # because index start from 1 因为下标从1开始
    return keep #返回保存的值

全部代码

#导入数组包
import numpy as np
import matplotlib.pyplot as plt#画图包
 
#画图函数
def plot_bbox(dets, c='k'):#c = 颜色 默认黑色
    # 边界框的坐标
    x1 = dets[:, 0]  # 所有行第一列
    y1 = dets[:, 1]  # 所有行第二列
    x2 = dets[:, 2]  # 所有行第三列
    y2 = dets[:, 3]  # 所有行第四列
 
    plt.plot([x1, x2], [y1, y1], c)#绘图
    plt.plot([x1, x1], [y1, y2], c)#绘图
    plt.plot([x1, x2], [y2, y2], c)#绘图
    plt.plot([x2, x2], [y1, y2], c)#绘图
    plt.title("nms")#标题
 
#nms 算法
def py_cpu_nms(dets, thresh):
    #边界框的坐标
    x1 = dets[:, 0]#所有行第一列
    y1 = dets[:, 1]#所有行第二列
    x2 = dets[:, 2]#所有行第三列
    y2 = dets[:, 3]#所有行第四列
    #计算边界框的面积
    areas = (y2 - y1 + 1) * (x2 - x1 + 1) #(第四列 - 第二列 + 1) * (第三列 - 第一列 + 1)
    #执行度，包围盒的信心分数
    scores = dets[:, 4]#所有行第五列
 
    keep = []#保留
 
    #按边界框的置信度得分排序   尾部加上[::-1] 倒序的意思 如果没有[::-1] argsort返回的是从小到大的
    index = scores.argsort()[::-1]#对所有行的第五列进行从大到小排序，返回索引值
 
    #迭代边界框
    while index.size > 0: # 6 > 0,      3 > 0,      2 > 0
        i = index[0]  # every time the first is the biggst, and add it directly每次第一个是最大的，直接加进去
        keep.append(i)#保存
        #计算并集上交点的纵坐标（IOU）
        x11 = np.maximum(x1[i], x1[index[1:]])  # calculate the points of overlap计算重叠点
        y11 = np.maximum(y1[i], y1[index[1:]])  # index[1:] 从下标为1的数开始，直到结束
        x22 = np.minimum(x2[i], x2[index[1:]])
        y22 = np.minimum(y2[i], y2[index[1:]])
 
        #计算并集上的相交面积
        w = np.maximum(0, x22 - x11 + 1)  # the weights of overlap重叠权值、宽度
        h = np.maximum(0, y22 - y11 + 1)  # the height of overlap重叠高度
        overlaps = w * h# 重叠部分、交集
 
        #IoU：intersection-over-union的本质是搜索局部极大值，抑制非极大值元素。即两个边界框的交集部分除以它们的并集。
        #          重叠部分 / （面积[i] + 面积[索引[1:]] - 重叠部分）
        ious = overlaps / (areas[i] + areas[index[1:]] - overlaps)#重叠部分就是交集，iou = 交集 / 并集
        print("ious", ious)
        #               ious <= 0.7
        idx = np.where(ious <= thresh)[0]#判断阈值
        print("idx", idx)
        index = index[idx + 1]  # because index start from 1 因为下标从1开始
    return keep #返回保存的值
 
def main():
    # 创建数组
    boxes = np.array([[100, 100, 210, 210, 0.72],
                      [250, 250, 420, 420, 0.8],
                      [220, 220, 320, 330, 0.92],
                      [100, 100, 210, 210, 0.72],
                      [230, 240, 325, 330, 0.81],
                      [220, 230, 315, 340, 0.9]])
    show(boxes)
 
def show(boxes):
    plt.figure(1)  # 画图窗口、图形
    plt.subplot(1, 2, 1)  # 子图
    plot_bbox(boxes, 'k')  # before nms 使用nms（非极大抑制）算法前
    plt.subplot(1, 2, 2)  # 子图
    keep = py_cpu_nms(boxes, thresh=0.7)  # nms（非极大抑制）算法
    print(keep)
    plot_bbox(boxes[keep], 'r')  # after nms 使用nms（非极大抑制）算法后
    plt.show()  # 显示图像
 
if __name__ == '__main__':
    main()

在这里插入图片描述

yolov5源码解析

首先, yolo 在 detect.py 中进行了如下调用

# NMS
with dt[2]:
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

其中, pred[0]是一个 (1,18900,85) 的Tensor , max_det=100, 其他都是默认值

当前输入图是一个 (640,480) 的特征图, 那么 18900 = (20 * 15 + 40 * 30 + 80 * 60) * 3

在方法中, 具体流程如下

进行检测参数, 并且对类的数量, 候选者序号, 输出类型等等参数进行预处理
开始对每一个batch进行处理
找出置信度大于conf_thres的候选框
将候选框 xywh ==> xyxy 类型
计算出候选框对应的类别(max)
根据置信度从大到小排序
将box的 xyxy 加上类型偏置项, 让其逻辑上每个类别单独进行NMS操作
输出

def non_max_suppression(
    prediction,
    conf_thres=0.25,
    iou_thres=0.45,
    classes=None,
    agnostic=False,
    multi_label=False,
    labels=(),
    max_det=300, # 100
    nm=0,  # number of masks
):

    if isinstance(prediction, (list, tuple)):  # YOLOv5 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output (1,18900,85)

    device = prediction.device
    mps = "mps" in device.type  # Apple MPS
    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
        prediction = prediction.cpu()
    bs = prediction.shape[0]  # batch size 1
    nc = prediction.shape[2] - nm - 5  # number of classes 80
    xc = prediction[..., 4] > conf_thres  # candidates (1,18900)<True|False>

    # Settings
    # min_wh = 2  # (pixels) minimum box width and height
    max_wh = 7680  # (pixels) maximum box width and height
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 0.5 + 0.05 * bs  # seconds to quit after 0.55
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img) False
    merge = False  # use merge-NMS

    t = time.time()
    mi = 5 + nc  # mask start index 85
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs # [(0,6)]
    for xi, x in enumerate(prediction):  # image index, image inference xi: 0 x: (18900,85)
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence (52,85) 找出置信度大于阈值的候选框

        # Cat apriori labels if auto-labelling
        if labels and len(labels[xi]):
            lb = labels[xi]
            v = torch.zeros((len(lb), nc + nm + 5), device=x.device)
            v[:, :4] = lb[:, 1:5]  # box
            v[:, 4] = 1.0  # conf
            v[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box/Mask
        box = xywh2xyxy(x[:, :4])  # center_x, center_y, width, height) to (x1, y1, x2, y2)
        mask = x[:, mi:]  # zero columns if no masks  []

        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label: # False
            i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = x[:, 5:mi].max(1, keepdim=True) # 找检测框对应的类别
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres] # 把分类置信度不合理的也去掉 (53,6) ==> (51,6)

        # Filter by class
        if classes is not None: # None
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Apply finite constraint
        # if not torch.isfinite(x).all():
        #     x = x[torch.isfinite(x).all(1)]

        # Check shape
        n = x.shape[0]  # number of boxes 51
        if not n:  # no boxes
            continue
        x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes 根据置信度从大到小排序

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes 分类类别 * 7680
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores 这里加上c是为了让每个类别可以逻辑上实现分开做NMS
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS, 调用的 tv 的方法 i = tensor([ 0,  2,  4,  5, 31])
        i = i[:max_det]  # limit detections
        if merge and (1 < n < 3e3):  # Merge NMS (boxes merged using weighted mean) False
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]
        if mps:
            output[xi] = output[xi].to(device)
        if (time.time() - t) > time_limit:
            LOGGER.warning(f"WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded")
            break  # time limit exceeded

    return output