YOLOv3 代码详解(6) —— anchorboxes的获取 kmeans.py

最新推荐文章于 2023-12-15 16:59:10 发布

magic_ll

最新推荐文章于 2023-12-15 16:59:10 发布

阅读量726

点赞数 1

分类专栏： yolo系列

本文链接：https://blog.csdn.net/magic_ll/article/details/107710224

版权

yolo系列专栏收录该内容

22 篇文章 36 订阅

订阅专栏

前言：

yolo系列的论文阅读
论文阅读 || 深度学习之目标检测重磅出击YOLOv3
论文阅读 || 深度学习之目标检测yolov2
论文阅读 || 深度学习之目标检测yolov1

该篇讲解的工程连接是：
tensorflow的yolov3：https://github.com/YunYang1994/tensorflow-yolov3

自己对该工程的解析博客：
YOLOv3 || 1. 使用自己的数据集训练yolov3
YOLOv3 || 2. dataset.py解析
 YOLOv3 || 3. dataset.py的多进程改造
 YOLOv3 || 4. yolov3.py 网络结构的搭建和loss的定义
 YOLOv3 || 5. train.py
YOLOv3 || 6. anchorboxes的获取 kmeans.py
YOLOv3 || 7. yolov3的pb文件的测试

该篇博客介绍的是yolov3的anchorboxes的获取

1 原理讲解

yolov3 训练的时候，需要提前设置 anchor boxes，不同的数据集会有不同的anchor boxes，COCO数据集官方已经提供。当我们训练自己的数据集时，就需要自己提前获取自己数据集的，具体方法如下：

step1：获取数据集中所有的 bounding box
因为想要的只是anchor 的大小信息，所以舍去了位置信息、类别信息。
将每张图片中的所有的 bbox的坐标汇总，不分类的。标签形式由坐标【左上角-右下角】，转换成【w,h】
step2：初始化k个anchor box
随机从所有 bounding boxes中选取k个anchor boxes 作为初始值。
step3：计算每个bounding box 与每个anchor box的 IOU值
常用的聚类方法使用欧氏距离来衡量差异，可直接对 bounding box的宽高进行聚类，产生k个宽高组合的 anchorbox。这样的结果在box尺寸较大时，误差会更大。
以IOU为基准的聚类方式，能够避免这个问题。我们已经知道 IOU为两个框的交并比，该值越大越好，所以使用【d = 1-IOU】来作为误差，来获取每一个bounding box与k个基准anchor box的偏差d
step4：将所有的bounding box进行分类
迭代每个 bounding box，在与 k个anchorbox的误差 ${ d_{i,1},d_{1,2},...d_{i,k}\}$ 中选取误差最小的anchor。比如 $boundingbox_{0}$ 的 $d_{0,3}$ 最小，则 $boundingbox_{0}$ 属于第3个anchor。
这样我们就将所有的 boundingbox分为了k组
step5：anchorbox的更新
针对每个anchorbox下的boundingbox，我们计算这些boundingbox的宽高的均值，作为该anchorbox的新的尺寸值
step6：重复分类boundingbox-更新anchorbox 的操作
重复step4-step5，使的anchorbox的值不在更新（也就是boundingbox的分类不再更新）
step7：将获取的k个anchor从大到小排序并保存
另外，也计算下anchorbox的精确度。
使用最后得到的anchor boxes与每个bounding box计算其IOU值，对于每个bounding box选取其最高的那个IOU值（代表其属于某一个anchor box类），然后求所有bounding box该IOU值的平均值也即最后的精确度值。

2 代码讲解

已知文本中的某一行的数据为【./data/images/2_561.png 392,92,887,776,0】
其中第一项为图片的路径，后面数字为boxes的左上角右下角的坐标，以及目标所属类别。
distances = 1 - self.iou(boxes, clusters)

class YOLO_Kmeans:

   def __init__(self, cluster_number, filename):
       """初始化"""
       self.cluster_number = cluster_number
       self.filename = filename
       
   def iou(self, boxes, clusters):  # 1 box -> k clusters
       """计算iou"""    
       n = boxes.shape[0]
       k = self.cluster_number

       box_area = boxes[:, 0] * boxes[:, 1]
       box_area = box_area.repeat(k)
       box_area = np.reshape(box_area, (n, k))

       cluster_area = clusters[:, 0] * clusters[:, 1]
       cluster_area = np.tile(cluster_area, [1, n])
       cluster_area = np.reshape(cluster_area, (n, k))

       box_w_matrix = np.reshape(boxes[:, 0].repeat(k), (n, k))
       cluster_w_matrix = np.reshape(np.tile(clusters[:, 0], (1, n)), (n, k))
       min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix)

       box_h_matrix = np.reshape(boxes[:, 1].repeat(k), (n, k))
       cluster_h_matrix = np.reshape(np.tile(clusters[:, 1], (1, n)), (n, k))
       min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix)
       inter_area = np.multiply(min_w_matrix, min_h_matrix)

       result = inter_area / (box_area + cluster_area - inter_area)
       return result
       
   def kmeans(self, boxes, k, dist=np.median):
       """对boxes进行聚类"""    
       box_number = boxes.shape[0]
       print(boxes.shape)
       distances = np.empty((box_number, k))    # 记录boxes距离每个的预选框中的距离
       last_nearest = np.zeros((box_number,))    # 记录boxes距离预选框中最近的编号
       np.random.seed()

       # 随机选出k个预选框
       clusters = boxes[np.random.choice(box_number, k, replace=False)]  # init k clusters

       while True:
           # 计算所有框和k个预选框的(1-iou)。其中预选框随着循环不断被更新
           distances = 1 - self.iou(boxes, clusters)

           # 获取每个boxes与9个预选框中哪个距离最近
           current_nearest = np.argmin(distances, axis=1)

           if (last_nearest == current_nearest).all():
               break  # clusters won't change

           for cluster in range(k):
               # 更新预选框
               clusters[cluster] = dist(boxes[current_nearest == cluster], axis=0)
               # 其中current_nearest为向量，cluster为值。
               # current_nearest == cluster，返回的是与current_nearest一样shape的内容为true、false的数组
               # a = boxes[current_nearest == cluster]，是将为true的位置上的数值获取出来，赋值给a
           last_nearest = current_nearest
           
       return clusters
       
   def getAllBoxes(self):
       """从文本中获取所有的boxes的高宽"""    
       
       f = open(self.filename, 'r')
       dataSet = []
       for line in f:
           infos = line.split(" ")
           length = len(infos)

           # 数据集的图片的尺寸可能存在大小不一，
           # 对数据集进行聚类，要保证图片的尺寸保持一致的情况下。
           image = cv2.imread(infos[0])
           ih, iw = [416, 416]
           h, w, _ = image.shape
           scale = min(iw / w, ih / h)
           # nw, nh = int(scale * w), int(scale * h)
           # # 将输入图片的长边缩放的target_size，并保持原图片的比例
           # image_resized = cv2.resize(image, (nw, nh))

           for i in range(1, length):
               width = int(float(infos[i].split(",")[2])) - int(float(infos[i].split(",")[0]))
               height = int(float(infos[i].split(",")[3])) - int(float(infos[i].split(",")[1]))

               width, height = width*scale, height*scale
               dataSet.append([width, height])
       result = np.array(dataSet)
       f.close()
       return result
       
   def avg_iou(self, boxes, clusters):
       accuracy = np.mean([np.max(self.iou(boxes, clusters), axis=1)])
       return accuracy


   def result2txt(self, data):
       f = open("./data/anchors/yolo_anchors416.txt", 'w')
       row = np.shape(data)[0]
       for i in range(row):
           if i == 0:
               x_y = "%d,%d" % (data[i][0], data[i][1])
           else:
               x_y = ", %d,%d" % (data[i][0], data[i][1])
          f.write(x_y)
       f.close()
      
     
     
if __name__ == "__main__":
   cluster_number = 9
   filename = "./data/dataset/train.txt"
   
   kmeans = YOLO_Kmeans(cluster_number, filename)
   all_boxes = kmeans.getAllBoxes()                             # step1：获取所有的框的边长
   result = kmeans.kmeans(all_boxes, k=self.cluster_number)     # step2-6：聚类得到9个代表框
   
   result = result[np.lexsort(result.T[0, None])]               # step1：对9个框进行排序
   kmeans.result2txt(result)                                    # 保存结果
   print("K anchors:\n {}".format(result))
   print("Accuracy: {:.2f}%".format(self.avg_iou(all_boxes, result) * 100))