Yolov3中先验框生成

最新推荐文章于 2024-08-03 01:09:57 发布

萌萌萌虎

最新推荐文章于 2024-08-03 01:09:57 发布

阅读量3k

点赞数 2

文章标签：深度学习

本文链接：https://blog.csdn.net/m0_56171249/article/details/118219437

版权

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

前言
一、K-means聚类先验框
二、先验框生成
- 代码
总结

前言

看了Yolov3系列已经有段时间了，网上很多有关yolov3的讲解，在此，我将个人对预测框参数理解讲一下，如果有错误的地方，希望大家指正。

提示：以下是本篇文章正文内容，下面案例可供参考

一、K-means先验框

1、K-means算法简介

聚类是一个将数据集中在某些方面相似的数据成员进行分类组织的过程，聚类就是一种发现这种内在结构的技术，聚类技术经常被称为无监督学习。

k均值聚类是最著名的划分聚类算法，由于简洁和效率使得他成为所有聚类算法中最广泛使用的。给定一个数据点集合和需要的聚类数目k，k由用户指定，k均值算法根据某个距离函数反复把数据分入k个聚类中。

Yolov3的先验框是通过K-means聚类得到的，Yolov3一共有三个特征层，每个特征层有三种不同类型的先验框，所以Yolov3聚类的先验框个数是九；下面讲解一下K-means聚类先验框的步骤：

1、在所有真实框里随机选取9个框作为初始聚类中心-先验框；

2、计算每个真实框和1中初始化先验框1，2.......9的交并比值（IOU）,在Yolov3中使用交并比的大小来判断聚类的距离大小；传统的K-means聚类方法是通过欧氏距离来判断；

$D=1-IOU$

3、经过上一步计算得到每一个真实框和所有先验框的距离D，将其中小于某一阈值的真实框留下来，代表这些框属于该先验框；

4、经过上一步确定每一个先验框及其所属真实框；针对先验框所属的真实框，对其按宽高进行排序，取中间值作为新的先验框的尺寸，按照此种方法对所有先验框进行更新；

5、重复2、3、4步，直至先验框的尺寸不再变化；有些人在最开始选初始化聚类中心的时候使用了不同的方法，提出不同的优化。

下图为传统K-means算法聚类过程图。

二、K-means更新先验框

代码

代码如下（示例）：

from seaborn import load_dataset
import glob
import xml.etree.ElementTree as ET
import numpy as np

ANNOTATIONS_PATH = "存放Annotations文件的位置"
CLUSTERS = 9

def iou(box, clusters):

    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")

    intersection = x * y
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_


def avg_iou(boxes, clusters):
    """
    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: average IoU as a single float
    """
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])




def translate_boxes(boxes):
    """
    Translates all the boxes to the origin.
    :param boxes: numpy array of shape (r, 4)
    :return: numpy array of shape (r, 2)
    """
    new_boxes = boxes.copy()
    for row in range(new_boxes.shape[0]):
        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
    return np.delete(new_boxes, [0, 1], axis=1)


def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.（）
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    """
    rows = boxes.shape[0]

    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))

    np.random.seed()


    clusters = boxes[np.random.choice(rows, k, replace=False)]
   

    while True:
        for row in range(rows):
            distances[row] = 1 - iou(boxes[row], clusters)
            # 定义的距离度量公式：d(box,centroid)=1-IOU(box,centroid)。

        nearest_clusters = np.argmin(distances, axis=1)

        # 直到聚类中心不变了
        if (last_clusters == nearest_clusters).all():
            break

        # 更新聚类中心（这里把每一个聚类框数值的中位数作为新的聚类中心）
        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        last_clusters = nearest_clusters

    return clusters


def load_dataset(path):
    dataset = []
    for xml_file in glob.glob("{}/*xml".format(path)):
        tree = ET.parse(xml_file)

        height = int(tree.findtext("./size/height"))
        width = int(tree.findtext("./size/width"))

        for obj in tree.iter("object"):
            xmin = int(obj.findtext("bndbox/xmin")) / width
            ymin = int(obj.findtext("bndbox/ymin")) / height
            xmax = int(obj.findtext("bndbox/xmax")) / width
            ymax = int(obj.findtext("bndbox/ymax")) / height

            xmin = np.float64(xmin)
            ymin = np.float64(ymin)
            xmax = np.float64(xmax)
            ymax = np.float64(ymax)
            if xmax == xmin or ymax == ymin:
                print(xml_file)
            dataset.append([xmax - xmin, ymax - ymin])
    return np.array(dataset)

#--------------------------------------------------------------------------------#
            #得到的结果可以替换yolo_anchors.txt文件
#--------------------------------------------------------------------------------#

if __name__ == '__main__':
    # print(__file__)
    data = load_dataset(ANNOTATIONS_PATH)
    out = kmeans(data, k=CLUSTERS)
    print(out)
    print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))
    print("Boxes:\n {}-{}".format(out[:, 0] * 416, out[:, 1] * 416))

该处使用的url网络请求的数据。