pytorch模型构建（五）——各种工具函数介绍（NMS, Kmeans）

最新推荐文章于 2024-06-20 19:31:03 发布

要坚持写博客呀

最新推荐文章于 2024-06-20 19:31:03 发布

阅读量5.2k

点赞数

分类专栏： 4. Pytorch 2. 深度学习文章标签：决策树随机森林机器学习

本文链接：https://blog.csdn.net/weixin_39263657/article/details/121780657

版权

4. Pytorch 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

2. 深度学习

6 篇文章 0 订阅

订阅专栏

一、简介：

本节主要介绍NMS、k-means anchors

二、非极大值抑制 (Non-Maximum Suppression, NMS)：

NMS: 对目标检测任务的输出结果(候选框)进行筛选。
NMS一般流程：
（1）根据最小、最大的宽高阈值，滤除超小和超大的anchor；
（2）根据置信度(是否是一个目标的置信度，obj_conf)，滤除背景目标和置信度极低的目标；
（3）根据类别置信度( $cls\_score=obj\_conf*cls\_conf$ )，对类别进行进行筛选，仅保留类别中最大的置信度；对于多标签的问题，则通过设置一个置信度阈值con_thre进行过滤，则可保留一个目标框对应多个类别(标签)的置信度；
（4）根据阈值过滤一波后，然后进行NMS。NMS时有按照不同的类别分别应用NMS和多个类别一起应用NMS。

不同类别分别应用NMS即对每一个类别进行索引，不同类别的元素之间不会应用NMS。
多类别一起应用NMS，使用偏移量法，在所有边框bbox添加一个偏移量，偏移量大小取决于类别，并且足够大。（一般的，偏移量=类别，即最后的 bbox坐标=4个坐标(相对坐标，0-1范围内) + 对应的类别序号(如0,1,2,3等)）这样就能把不同类别的bbox区分开来做NMS了。

（5）General NMS Steps：

按照类别得到各自的候选bbox列表[ [x_min, y_min, x_max, y_max, score], …]，然后对于每一类的列表，按照score降序排列（或者使用偏移量方法，就不用按照类别分开了，可以直接放到一个列表里，直接NMS。）；
从列表中取出第一个bbox（score最高），计算该bbox与剩下所有bbox的iou，将iou大于设定阈值的bbox从列表中剔除，同时将第一个bbox保留到最终输出的列表中；
对处理过后的降序列表重复执行步骤 2，直到列表为空；

NMS代码实现：


import numpy as np

def NMS(boxes, scores, iou_thresh, score_thresh=0.5):
    '''
    非极大值抑制(NMS)，用于过滤目标检测网络输出大量候选bboxs,
    如果多个类别需要对每一类进行循环求NMS，下面注释中可使用偏移量法
    进行多类别NMS
    numpy类型
    @param boxes: 候选框，shape:(N, 4),bbox:[[x_min, y_min, x_max, y_max]...]
    @param scores: 候选框的的置信度分数，shape:[N,]，tensor
    @param iou_thresh: 设定的IOU置信度
    @param score_thresh: 低score的阈值
    @return:经过NMS过滤后的bbox的index
    '''

    # 使用偏移量多类别分类求NMS， c为bbox的类别，0，1，2，3...，c.shape:[N,]
    # boxes = boxes[:, :4] + c
    # 也可以直接使用torchvision.ops.nms 进行NMS, 返回的i为经过NMS后的index
    # i = torchvision.ops.nms(boxes, scores, iou_thresh)

    # 过滤掉低于分数阈值的预测框
    boxes = boxes[np.where(boxes[:, -1] >= score_thresh)[0]]

    # 获取bbox左上和右下的坐标 x_min, y_min, x_max, y_max
    xmin = boxes[:, 0]   # xmin -> [xmin1, xmin2, ...]
    ymin = boxes[:, 1]   # ymin -> [ymin1, ymin2, ...]
    xmax = boxes[:, 2]   # xmax -> [xmax1, xmax2, ...]
    ymax = boxes[:, 3]   # ymax -> [ymax1, ymax2, ...]
    scores = boxes[:, 4]  # predict bbox class score -> [score1, score2, score3]

    # 按score降序排序，argsort返回降序后的索引。argsort为升序排列，[::-1]为倒序排序，最后即升序排列
    order = scores.argsort()[::-1]

    # 计算每个bbox的面积，+1防止面积为0
    areas = (xmax - xmin + 1) * (ymax - ymin + 1)  # 计算面积
    # 保留最优的结果，即经过NMS保留后的bbox的index
    keep = []

    # 搜索最佳边框
    # 当候选列表中还有目标就执行NMS
    while order.size > 0:
        # 获取当前得分最高的bbox的index
        top1_idx = order[0]
        # 添加到候选列表中
        keep.append(top1_idx)

        # 将得分最高的边框与剩余边框进行比较
        # 以下四个坐标为当前得分最高框与剩下框的交集矩形的左上和左下四个坐标
        xx1 = np.maximum(xmin[top1_idx], xmin[order[1:]])
        yy1 = np.maximum(ymin[top1_idx], ymin[order[1:]])
        xx2 = np.minimum(xmax[top1_idx], xmax[order[1:]])
        yy2 = np.minimum(ymax[top1_idx], ymax[order[1:]])

        # 计算交集
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        intersection = w * h

        # 计算并集
        union = areas[top1_idx] + areas[order[1:]] - intersection

        # 计算交并比
        iou = intersection / union

        # 将重叠度大于给定阈值的边框剔除掉，仅保留剩下的边框，返回相应的下标
        inds = np.where(iou <= iou_thresh)[0]

        # 从剩余的候选框中继续筛选，因为当前最大的score的index已放在keep中，所以order不需要了，所以需要+1
        order = order[inds + 1]

    return keep

三、k-means anchor

1. k-means 聚类算法步骤

（1）初始化K个簇中心；
（2）使用相似性度量（一般是欧式距离），将每个样本分配给距离最近的簇中心；
（3）计算每个簇中所有样本的均值，更新簇中心；
（4）重复（2）（3）步，直到均簇中心不再变化，或者达到了最大迭代次数。

'''
使用 k-means 生成 anchors
'''
import glob
import xml.etree.cElementTree as ET
import numpy as np

'''
步骤：
1. 将bbox的的宽和高使用其对应图像的宽高进行归一化；
2. 开始进行k-means
    （1）初始化K个簇中心；(一般k设为9，yolov5中是三个不同的feature map 各有三个不同尺度的wh，初始的anchors的wh一般是根据 coco或者voc等公共数据集得到的)
    （2）使用相似性度量，将每个样本分配给距离最近的簇中心；(这里一般使用 1-iou 作为距离度量，yolov5中则使用 gt框与anchor对应宽比和高比作为距离度量，与yolov5 NMS筛选的条件一致。
         将N个bbox与这9个anchors作距离计算，最终计算出(N,9)个距离值)
    （3）计算每个簇中所有样本的均值，更新簇中心；(找到每一行中最小的距离值，即当前bbox被分到了哪个簇中，然后计算每个簇(列)的均值以对簇中心进行更新)
    （4）重复（2）（3）步，直到均簇中心不再变化，或者达到了最大迭代次数。
'''


# 1. 对数据集中bbox的宽和高进行归一化
def load_dataset(path):
    '''
    先对bbox的左上和右下的坐标点 使用其对应图像的宽高进行归一化
    然后使用 xmax - xmin, ymax - ymin 得到归一化后的宽高
    @param path:
    @return:
    '''
    dataset = []
    for xml_file in glob.glob("{}/*xml".format(path)):
        tree = ET.parse(xml_file)

        height = int(tree.findtext("./size/height"))
        width = int(tree.findtext("./size/width"))

        for obj in tree.iter("object"):
            xmin = int(obj.findtext("bndbox/xmin")) / width
            ymin = int(obj.findtext("bndbox/ymin")) / height
            xmax = int(obj.findtext("bndbox/xmax")) / width
            ymax = int(obj.findtext("bndbox/ymax")) / height

            dataset.append([xmax - xmin, ymax - ymin])

    return np.array(dataset)

# 2. 距离度量
def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    """
    rows = boxes.shape[0]
    distances = np.empty((rows, k))

    last_clusters = np.zeros((rows,))
    np.random.seed()
    # the Forgy method will fail if the whole array contains the same rows
    # 初始化k个聚类中心（从原始数据集中随机选择k个）
    clusters = boxes[np.random.choice(rows, k, replace=False)]
    while True:
        for row in range(rows):
            # 定义的距离度量公式：d(box,centroid)=1-IOU(box,centroid)。到聚类中心的距离越小越好，
            # 但IOU值是越大越好，所以使用 1 - IOU，这样就保证距离越小，IOU值越大。
            # 计算所有的boxes和clusters的值（row，k）
            # 2-(1), 2-(2), 距离度量
            distances[row] = 1 - iou(boxes[row], clusters)
            # print(distances)
        # 将标注框分配给“距离”最近的聚类中心（也就是这里代码就是选出（对于每一个box）距离最小的那个聚类中心）。
        nearest_clusters = np.argmin(distances, axis=1)
        # 直到聚类中心改变量为0（也就是聚类中心不变了）。
        if (last_clusters == nearest_clusters).all():
            break
        # 计算每个群的中心（这里把每一个类的中位数作为新的聚类中心）
        # 2-(3) 更新簇中心(anchors)，有均值计算得出
        for cluster in range(k):
            # 这一句是把所有的boxes分到k堆数据中,比较别扭，就是分好了k堆数据，每堆求它的中位数作为新的点
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
        last_clusters = nearest_clusters
    return clusters


# 2-(1), 2-(2), 距离度量
def iou(box, clusters):
    '''
    计算当前box与9个anchors的距离度量值
    @param box: box.shape=[2], [width, height]
    @param clusters: clusters.shape=[9, 2], 9个anchors 的 w 和 h,
    @return: iou_.shape = [9, 1]
    '''
    # 计算每个box与9个clusters的iou
    # boxes ： 所有的[[width, height], [width, height], …… ]
    # clusters : 9个随机的中心点[width, height]

    # 计算iou时只需要wh，默认以左上角顶点为原点进行计算,
    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")
    intersection = x * y

    # box的面积
    box_area = box[0] * box[1]
    # 所有anchors的面积
    cluster_area = clusters[:, 0] * clusters[:, 1]

    # 当前box与9个anchors的iou值
    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_


# 评估
def avg_iou(boxes, clusters):
    '''
    评估由 k-means 生成的anchors与数据集中的bboxs的重合度
    @param boxes:    boxes.shape = [N, 2], N个bbox的 w 和 h
    @param clusters: clusters.shape = [9, 2], 9个anchors 的 w 和 h,
    @return:
    '''
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])