YoloV3使用k-means聚类产生矛框大小(以VOC数据集为例)

import matplotlib.pyplot as plt
import numpy as np
import os, cv2
%matplotlib inline

LABELS = ['aeroplane',  'bicycle', 'bird',  'boat',      'bottle', 
          'bus',        'car',      'cat',  'chair',     'cow',
          'diningtable','dog',    'horse',  'motorbike', 'person',
          'pottedplant','sheep',  'sofa',   'train',   'tvmonitor']

Dowload VOC-dataset(It is not shown be here)

train_image_folder = "../data/VOC/train/VOCdevkit/VOC2012/JPEGImages/"
train_annot_folder = "../data/VOC/train/VOCdevkit/VOC2012/Annotations/"
import xml.etree.ElementTree as ET

def parse_annotation(ann_dir, img_dir, labels=[]):
    '''
    output:
    - Each element of the train_image is a dictionary containing the annoation infomation of an image.
    - seen_train_labels is the dictionary containing
            (key, value) = (the object class, the number of objects found in the images)
    '''
    all_imgs = []
    seen_labels = {}
    
    for ann in sorted(os.listdir(ann_dir)):
        if "xml" not in ann:
            continue
        img = {'object':[]}

        tree = ET.parse(ann_dir + ann)
        
        for elem in tree.iter():
            if 'filename' in elem.tag:
                path_to_image = img_dir + elem.text
                img['filename'] = path_to_image
                ## make sure that the image exists:
                if not os.path.exists(path_to_image):
                    assert False, "file does not exist!\n{}".format(path_to_image)
            if 'width' in elem.tag:
                img['width'] = int(elem.text)
            if 'height' in elem.tag:
                img['height'] = int(elem.text)
            if 'object' in elem.tag or 'part' in elem.tag:
                obj = {}
                
                for attr in list(elem):
                    if 'name' in attr.tag:
                        
                        obj['name'] = attr.text
                        
                        if len(labels) > 0 and obj['name'] not in labels:
                            break
                        else:
                            img['object'] += [obj]
                            
                        

                        if obj['name'] in seen_labels:
                            seen_labels[obj['name']] += 1
                        else:
                            seen_labels[obj['name']]  = 1
                        

                            
                    if 'bndbox' in attr.tag:
                        for dim in list(attr):
                            if 'xmin' in dim.tag:
                                obj['xmin'] = int(round(float(dim.text)))
                            if 'ymin' in dim.tag:
                                obj['ymin'] = int(round(float(dim.text)))
                            if 'xmax' in dim.tag:
                                obj['xmax'] = int(round(float(dim.text)))
                            if 'ymax' in dim.tag:
                                obj['ymax'] = int(round(float(dim.text)))

        if len(img['object']) > 0:
            all_imgs += [img]
                        
    return all_imgs, seen_labels

## Parse annotations 
train_image, seen_train_labels = parse_annotation(train_annot_folder,train_image_folder, labels=LABELS)
print("N train = {}".format(len(train_image)))

Output : train_image

  • train_image是一个字典,它包含了图片以及标注信息

 

# show the first two terms of the dictionary
train_image[:2]

Visualize output : seen_train_labels

  • VOC数据集一共有20个类别,下面将这些类别的数量分布情况可视化出来:
y_pos = np.arange(len(seen_train_labels))
fig = plt.figure(figsize=(13,10))
ax = fig.add_subplot(1,1,1)
ax.barh(y_pos,list(seen_train_labels.values()))
ax.set_yticks(y_pos)
ax.set_yticklabels(list(seen_train_labels.keys()))
ax.set_title("The total number of objects = {} in {} images".format(
    np.sum(list(seen_train_labels.values())),len(train_image)
))
plt.show()

 

K-means clustering

在论文YOLO9000:Better, Faster, Stronger 强烈建议我们使用聚类分析得到先验anchor的尺寸大小,原文这样说到:

Dimension Clusters: we encounter two issues with anchor boxes when using them with YOLO. The first is that the box dimensions are hand picked. the network can learn to adjust the boxes appropriately but if we pick better priors for the network to start with, we can make it easier for the network to learn to predict good detections.
Instead of choosing priors by hand, we run k-means clustering on the training set bounding boxes to automatically find good priors. If we use standard k-means with Euclidean distance learger boxes generate more error than smaller boxes. However, what we really want are priors that lead to good IOU scores, which is indepedndent of the size of the box. Thus for our distance metric we use 1 - IOU(box,centroid)

因此,让我们首先为K-means聚类准备要输入数据。 输入数据指的是ground truth bounding box的宽度和高度来作为特征。 考虑到在不同尺度下的场景中,每个boundingbox的尺寸不一。因此,非常有必要来标准化边界框的宽度和高度与图像的宽度和高度。

 

wh = []
for anno in train_image:
    aw = float(anno['width'])  # width of the original image
    ah = float(anno['height']) # height of the original image
    for obj in anno["object"]:
        w = (obj["xmax"] - obj["xmin"])/aw # make the width range between [0,GRID_W)
        h = (obj["ymax"] - obj["ymin"])/ah # make the width range between [0,GRID_H)
        temp = [w,h]
        wh.append(temp)
wh = np.array(wh)
print("clustering feature data is ready. shape = (N object, width and height) =  {}".format(wh.shape))

Visualize the clustering data

先来看看归一化后的anchor尺寸分布情况:

 

plt.figure(figsize=(10,10))
plt.scatter(wh[:,0],wh[:,1],alpha=0.3)
plt.title("Clusters",fontsize=20)
plt.xlabel("normalized width",fontsize=20)
plt.ylabel("normalized height",fontsize=20)
plt.show()

Intersection over union

在介绍使用K-means对先验边界框进行聚类时,非常有必要来讨论下iou的概念,因为后面我们会用它来衡量两个boundingbox之间的距离。iou是一种测量在特定数据集中检测相应物体准确度的一个标准。我们可以在很多物体检测挑战中,例如PASCAL VOC challenge中看多很多使用该标准的做法。我们计算两个bounding box的iou时,只需要使用它们的4个位置参数(xmin,ymin, width, height),这里引用别人一张图:

def iou(box, clusters):
    '''
    :param box:      np.array of shape (2,) containing w and h
    :param clusters: np.array of shape (N cluster, 2) 
    '''
    x = np.minimum(clusters[:, 0], box[0]) 
    y = np.minimum(clusters[:, 1], box[1])

    intersection = x * y
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_

 

The k-means clustering

K-means的聚类方法很简单,它主要包含两个步骤:

首先初始化类别数量和聚类中心:

  • Step 1: 计算每个boundingbox与所有聚类中心的距离(1-iou),选择最近的那个聚类中心作为它的类别
  • Step 2: 使用每个类别簇的均值来作为下次迭代计算的类别中心

重复步骤1和2,直至每个类别的中心位置不再发生变化。

 

def kmeans(boxes, k, dist=np.median,seed=1):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    """
    rows = boxes.shape[0]

    distances     = np.empty((rows, k)) ## N row x N cluster
    last_clusters = np.zeros((rows,))

    np.random.seed(seed)

    # initialize the cluster centers to be k items
    clusters = boxes[np.random.choice(rows, k, replace=False)]

    while True:
        # Step 1: allocate each item to the closest cluster centers
        for icluster in range(k): # I made change to lars76's code here to make the code faster
            distances[:,icluster] = 1 - iou(clusters[icluster], boxes)

        nearest_clusters = np.argmin(distances, axis=1)

        if (last_clusters == nearest_clusters).all():
            break
            
        # Step 2: calculate the cluster centers as mean (or median) of all the cases in the clusters.
        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        last_clusters = nearest_clusters

    return clusters,nearest_clusters,distances

The number of Clusters

一般来说,anchor聚类的类别越多,那么yolo算法就越能在不同尺度下与真实框进行回归,但是这样也增加了很多计算量。(这对于一个号称 real-time 目标检测框架来说是极其尴尬的,因此作者也尽量减少boundingbox的数目)。

 

kmax = 10
dist = np.mean
results = {}

for k in range(2,kmax):
    clusters, nearest_clusters, distances = kmeans(wh,k,seed=2,dist=dist)
    WithinClusterMeanDist = np.mean(distances[np.arange(distances.shape[0]),nearest_clusters])
    result = {"clusters":             clusters,
              "nearest_clusters":     nearest_clusters,
              "distances":            distances,
              "WithinClusterMeanDist": WithinClusterMeanDist}
    print("{:2.0f} clusters: mean IoU = {:5.4f}".format(k,1-result["WithinClusterMeanDist"]))
    results[k] = result

类别的数量越多,每个聚类簇的均值iou就越大,说明聚类簇里的boundingbox愈加紧贴在一起。有时候很难决定类别的数目,这也是k-means的一大痛点!在yolov2论文里设置了5个先验anchor,因此先来看看聚类数目从5到8的效果吧!

 

Visualization of k-means results

def plot_cluster_result(plt,clusters,nearest_clusters,WithinClusterSumDist,wh,k):
    for icluster in np.unique(nearest_clusters):
        pick = nearest_clusters==icluster
        c = current_palette[icluster]
        plt.rc('font', size=8) 
        plt.plot(wh[pick,0],wh[pick,1],"p",
                 color=c,
                 alpha=0.5,label="cluster = {}, N = {:6.0f}".format(icluster,np.sum(pick)))
        plt.text(clusters[icluster,0],
                 clusters[icluster,1],
                 "c{}".format(icluster),
                 fontsize=20,color="red")
        plt.title("Clusters=%d" %k)
        plt.xlabel("width")
        plt.ylabel("height")
    plt.legend(title="Mean IoU = {:5.4f}".format(WithinClusterSumDist))  
    
import seaborn as sns
current_palette = list(sns.xkcd_rgb.values())

figsize = (15,35)
count =1 
fig = plt.figure(figsize=figsize)
for k in range(5,9):
    result               = results[k]
    clusters             = result["clusters"]
    nearest_clusters     = result["nearest_clusters"]
    WithinClusterSumDist = result["WithinClusterMeanDist"]
    
    ax = fig.add_subplot(kmax/2,2,count)
    plot_cluster_result(plt,clusters,nearest_clusters,1 - WithinClusterSumDist,wh,k)
    count += 1
plt.show()

### 回答1: b'yolov3k-means聚类算法'是YOLOv3目标检测算法中用于确定先验(anchor boxes)尺寸和位置的聚类算法。它基于k-means聚类算法,通过对训练集中的目标进行聚类计算,确定出适合目标尺寸和形状的先验。这些先验用于检测算法中的坐标回归和分类,提高检测精度。 ### 回答2: YOLOv3是一种常用的目标检测算法,其目标检测过程中需要借助k-means聚类算法进行锚的选取。锚是用来检测目标的一种,能够将图片划分成多个小块,在每个小块上识别目标。在YOLOv2中,提出了使用k-means聚类算法来确定锚,而在YOLOv3中,继续沿用了这一方法。 K-means是一种聚类算法,根据数据点之间的距离将它们划分为不同的聚类YOLOv3中的k-means聚类算法主要用于确定锚大小和比例。具体而言,将所有的标注(即真实的目标)的宽高比和面积进行归一化处理后,随机选择若干作为初始的聚类中心。然后,将所有标注分别与这些中心计算距离,每个标注都将被分配到距离最近的聚类中心所在的簇中。接着,计算每个簇的均值,将均值作为新的聚类中心,重复以上步骤,直至聚类中心不再发生变化或者达到一定的迭代次数。 最终得到的聚类中心就是我们需要的锚大小和比例。这些锚可以根据输入图片的大小进行缩放,以适应不同尺寸的目标。在检测过程中,首先将图片划分成多个小块,然后在每个小块上使用多个不同尺寸和比例的锚进行目标检测。通过与标注的比对,最终确定每个锚中是否存在目标,并将目标类别和位置信息输出。 总之,YOLOv3中的k-means聚类算法是目标检测中非常重要的一步,能够帮助我们选取最优的锚,提高模型的检测性能和精度。 ### 回答3: yolov3是一种流行的目标检测算法,其中k-means聚类算法用于确定用于训练神经网络的锚尺寸。这是因为yolov3算法使用了锚来预测图像中的目标位置和大小聚类是一种常见的机器学习技术,用于将数据分为不同的组或簇,以便进行进一步分析。在yolov3中,k-means聚类算法用于在训练数据集中找到最适合的锚大小。 这个过程包括以下步骤: 1. 收集目标检测训练集,并确定在图像中查找目标的步长,即“步幅”。 2. 对于每个训练集图片,提取其中的目标边界,并记录下它们的宽度和高度。 3. 根据给定的k值(通常为5到10),使用k-means聚类算法对边界宽度和高度进行聚类,以找到最佳的k个锚尺寸。 4. 将这些锚尺寸保存为yolo模型的一部分,这样在训练模型时就可以使用它们来预测目标边界。 通过使用k-means聚类算法,yolov3算法可以确定最适合目标检测任务的锚大小,并提高模型的性能和准确性。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

MarkJhon

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值