yolov3 调试(3)： k-means 聚类算法生成对应自己样本的 anchor box 尺寸的代码

最新推荐文章于 2022-01-06 13:47:35 发布

贝猫说python

最新推荐文章于 2022-01-06 13:47:35 发布

阅读量3.2k

点赞数 1

分类专栏： K-Means boxes

boxes 同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

K-Means

2 篇文章 0 订阅

订阅专栏

yolov3 cfg/yolov3-voc.data 的默认 anchor box 尺寸是基于 ImageNet（具体是ImageNet or coco or voc懒得查了）训练集，使用 k-means 聚类算法获得的。在实际应用中，我们可能会检测一些形状特殊的物体，比如长尺，这时候，通用的anchor box尺寸会对最终训练模型的准确度产生影响，这时我们需要根据自己的样本生成对应的 anchor box 尺寸，替代默认值

下面是已经封装好的通过 k-means聚类算法获得自己样本的 anchor box 尺寸的脚本

yolov3 k-means anchor box 封装好的脚本

# coding=utf-8
# k-means ++ for YOLOv3 anchors
# 通过k-means ++ 算法获取YOLOv3需要的anchors的尺寸
import numpy as np

# 定义Box类，描述bounding box的坐标
class Box():
    def __init__(self, x, y, w, h):
        self.x = x
        self.y = y
        self.w = w
        self.h = h


# 计算两个box在某个轴上的重叠部分
# x1是box1的中心在该轴上的坐标
# len1是box1在该轴上的长度
# x2是box2的中心在该轴上的坐标
# len2是box2在该轴上的长度
# 返回值是该轴上重叠的长度
def overlap(x1, len1, x2, len2):
    len1_half = len1 / 2
    len2_half = len2 / 2

    left = max(x1 - len1_half, x2 - len2_half)
    right = min(x1 + len1_half, x2 + len2_half)

    return right - left


# 计算box a 和box b 的交集面积
# a和b都是Box类型实例
# 返回值area是box a 和box b 的交集面积
def box_intersection(a, b):
    w = overlap(a.x, a.w, b.x, b.w)
    h = overlap(a.y, a.h, b.y, b.h)
    if w < 0 or h < 0:
        return 0

    area = w * h
    return area


# 计算 box a 和 box b 的并集面积
# a和b都是Box类型实例
# 返回值u是box a 和box b 的并集面积
def box_union(a, b):
    i = box_intersection(a, b)
    u = a.w * a.h + b.w * b.h - i
    return u


# 计算 box a 和 box b 的 iou
# a和b都是Box类型实例
# 返回值是box a 和box b 的iou
def box_iou(a, b):
    return box_intersection(a, b) / box_union(a, b)


# 使用k-means ++ 初始化 centroids，减少随机初始化的centroids对最终结果的影响
# boxes是所有bounding boxes的Box对象列表
# n_anchors是k-means的k值
# 返回值centroids 是初始化的n_anchors个centroid
def init_centroids(boxes,n_anchors):
    centroids = []
    boxes_num = len(boxes)

    centroid_index = np.random.choice(boxes_num, 1)
    centroids.append(boxes[centroid_index])

    print(centroids[0].w,centroids[0].h)

    for centroid_index in range(0,n_anchors-1):

        sum_distance = 0
        distance_thresh = 0
        distance_list = []
        cur_sum = 0

        for box in boxes:
            min_distance = 1
            for centroid_i, centroid in enumerate(centroids):
                distance = (1 - box_iou(box, centroid))
                if distance < min_distance:
                    min_distance = distance
            sum_distance += min_distance
            distance_list.append(min_distance)

        distance_thresh = sum_distance*np.random.random()

        for i in range(0,boxes_num):
            cur_sum += distance_list[i]
            if cur_sum > distance_thresh:
                centroids.append(boxes[i])
                print(boxes[i].w, boxes[i].h)
                break

    return centroids


# 进行 k-means 计算新的centroids
# boxes是所有bounding boxes的Box对象列表
# n_anchors是k-means的k值
# centroids是所有簇的中心
# 返回值new_centroids 是计算出的新簇中心
# 返回值groups是n_anchors个簇包含的boxes的列表
# 返回值loss是所有box距离所属的最近的centroid的距离的和
def do_kmeans(n_anchors, boxes, centroids):
    loss = 0
    groups = []
    new_centroids = []
    for i in range(n_anchors):
        groups.append([])
        new_centroids.append(Box(0, 0, 0, 0))

    for box in boxes:
        min_distance = 1
        group_index = 0
        for centroid_index, centroid in enumerate(centroids):
            distance = (1 - box_iou(box, centroid))
            if distance < min_distance:
                min_distance = distance
                group_index = centroid_index
        groups[group_index].append(box)
        loss += min_distance
        new_centroids[group_index].w += box.w
        new_centroids[group_index].h += box.h

    for i in range(n_anchors):
        new_centroids[i].w /= len(groups[i])
        new_centroids[i].h /= len(groups[i])

    return new_centroids, groups, loss


# 计算给定bounding boxes的n_anchors数量的centroids
# label_path是训练集列表文件地址
# n_anchors 是anchors的数量
# loss_convergence是允许的loss的最小变化值
# grid_size * grid_size 是栅格数量
# iterations_num是最大迭代次数
# plus = 1时启用k means ++ 初始化centroids
def compute_centroids(label_path,n_anchors,loss_convergence,grid_size,iterations_num,plus):

    boxes = []
    label_files = []
    f = open(label_path)
    for line in f:
        #label_path = line.rstrip().replace('images', 'labels')
        #label_path = label_path.replace('JPEGImages', 'labels')
        #label_path = label_path.replace('.jpg', '.txt')
        #label_path = label_path.replace('.JPEG', '.txt')
        #label_files.append(label_path)
        label_files.append(line.rstrip())

    f.close()

    for label_file in label_files:
        f = open(label_file)
        for line in f:
            temp = line.strip().split(" ")
            if len(temp) > 1:
                boxes.append(Box(0, 0, float(temp[3]), float(temp[4])))

    if plus:
        centroids = init_centroids(boxes, n_anchors)
    else:
        centroid_indices = np.random.choice(len(boxes), n_anchors)
        centroids = []
        for centroid_index in centroid_indices:
            centroids.append(boxes[centroid_index])

    # iterate k-means
    centroids, groups, old_loss = do_kmeans(n_anchors, boxes, centroids)
    iterations = 1
    while (True):
        centroids, groups, loss = do_kmeans(n_anchors, boxes, centroids)
        iterations = iterations + 1
        print("loss = %f" % loss)
        if abs(old_loss - loss) < loss_convergence or iterations > iterations_num:
            break
        old_loss = loss

        for centroid in centroids:
            print(centroid.w * grid_size * 32, centroid.h * grid_size * 32)

    # print result
    for centroid in centroids:
        print("k-means result：\n")
        print(centroid.w * grid_size * 32, centroid.h * grid_size * 32)


label_path = "train_txt.txt"
n_anchors = 9
loss_convergence = 1e-6
grid_size = 13
iterations_num = 1000
plus = 0
compute_centroids(label_path,n_anchors,loss_convergence,grid_size,iterations_num,plus)
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199

解释

只需修改代码最后的 label_path 为自己的路径即可
其中 label_path 存储的是训练样本所有标注文本txt的地址
n_anchors 默认为9，可以自己改

运行脚本后，在终端显示9个 anchor box 的值，用来替代yolov3源码中 /cfg 目录下 .cfg 后缀文件（如yolov3-voc.cfg）中的，基于 ImageNet 训练获得的的anchor box 尺寸

具体在 darknet 中的修改

/cfg/yolov3-voc.cfg文件中：

[convolutional]
size=1
stride=1
pad=1
filters=18 # (5+class_num) * 3 这里我只检测一个物体
activation=linear

[yolo]
mask = 6,7,8
# anchors 替换成 k-means 获得的anchor box尺寸
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326 
classes=1
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

注意在该文件中，有这样的三处地方需要做同样的修改

Charlie
8.23
杭州

        <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/markdown_views-ea0013b516.css">
            </div>

贝猫说python

关注

1
点赞
踩
19

收藏

觉得还不错? 一键收藏
6
评论
yolov3 调试(3)： k-means 聚类算法生成对应自己样本的 anchor box 尺寸的代码

yolov3 cfg/yolov3-voc.data 的默认 anchor box 尺寸是基于 ImageNet（具体是ImageNet or coco or voc懒得查了）训练集，使用 k-means 聚类算法获得的。在实际应用中，我们可能会检测一些形状特殊的物体，比如...
复制链接

扫一扫