训练YOLO2时会用到cfg文件,这个网络结构文件里面的Region层有一个anchors参数就是论文中对应的用k-means方法产生的5个box的信息,这些数据的使用在YOLO2代码中可见:
代码里有一段替换字符串的功能,要求你的图像(我图像放在JPEGImages)和labels文件夹要放在同一目录下,这样才能找到labels文件,这里是pascal voc的格式
1、解析是在parse.c中的parse_region
2、使用是调用get_region_boxes函数,其中get_region_box
box get_region_box(float *x, float *biases, int n, int index, int i, int j, int w, int h)
{
box b;
b.x = (i + logistic_activate(x[index + 0])) / w;
b.y = (j + logistic_activate(x[index + 1])) / h;
b.w = exp(x[index + 2]) * biases[2*n];
b.h = exp(x[index + 3]) * biases[2*n+1];
if(DOABS){
b.w = exp(x[index + 2]) * biases[2*n] / w;
b.h = exp(x[index + 3]) * biases[2*n+1] / h;
}
return b;
}
上面代码的解释如下:
x[index + 0]是网络预测的偏移量,即论文中的tx,同理x[index + 1]是预测偏移量ty,x[index + 2]是预测偏移量tw,x[index + 3]是预测偏移量th。
i,j就是cell左上角到图像左上角点的偏移,对应论文中cx,cy,这里即特征图中的像素点坐标。
biases[2*n]就是anchors的宽,论文中的pw,biases[2*n+1]是ph。
除以特征图宽高w和h是要求预测框坐标在特征图中的比例,在最后计算在原始图中的坐标时,只要乘以原始图的长宽即可,见image.c中int left = (b.x-b.w/2.)*im.w。因为特征图和原始图也是一个倍数关系,所以虽然预测框坐标是相对于特征图的坐标,但是比例一样,这里乘以原图的宽高即可得到预测框在原图中的坐标。
重点是根据自己的训练数据,怎么得到anchors,当然可以用手动设计。这里介绍k-means方法生成。见anchors生成,
# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import numpy as np
import os
import random
from tqdm import tqdm
import sklearn.cluster as cluster
def iou(x, centroids):
dists = []
for centroid in centroids:
c_w, c_h = centroid
w, h = x
if c_w >= w and c_h >= h:
dist = w * h / (c_w * c_h)
elif c_w >= w and c_h <= h:
dist = w * c_h / (w * h + (c_w - w) * c_h)
elif c_w <= w and c_h >= h:
dist = c_w * h / (w * h + c_w * (c_h - h))
else: # means both w,h are bigger than c_w and c_h respectively
dist = (c_w * c_h) / (w * h)
dists.append(dist)
return np.array(dists)
def avg_iou(x, centroids):
n, d = x.shape
sums = 0.
for i in range(x.shape[0]):
# note IOU() will return array which contains IoU for each centroid and X[i]
# slightly ineffective, but I am too lazy
sums += max(iou(x[i], centroids))
return sums / n
def write_anchors_to_file(centroids, distance, anchor_file):
anchors = centroids * 416 / 32 # I do not know whi it is 416/32
anchors = [str(i) for i in anchors.ravel()]
print(
"\n",
"Cluster Result:\n",
"Clusters:", len(centroids), "\n",
"Average IoU:", distance, "\n",
"Anchors:\n",
", ".join(anchors)
)
with open(anchor_file, 'w') as f:
f.write(", ".join(anchors))
f.write('\n%f\n' % distance)
def k_means(x, n_clusters, eps):
init_index = [random.randrange(x.shape[0]) for _ in range(n_clusters)]
centroids = x[init_index]
d = old_d = []
iterations = 0
diff = 1e10
c, dim = centroids.shape
while True:
iterations += 1
d = np.array([1 - iou(i, centroids) for i in x])
if len(old_d) > 0:
diff = np.sum(np.abs(d - old_d))
print('diff = %f' % diff)
if diff < eps or iterations > 1000:
print("Number of iterations took = %d" % iterations)
print("Centroids = ", centroids)
return centroids
# assign samples to centroids
belonging_centroids = np.argmin(d, axis=1)
# calculate the new centroids
centroid_sums = np.zeros((c, dim), np.float)
for i in range(belonging_centroids.shape[0]):
centroid_sums[belonging_centroids[i]] += x[i]
for j in range(c):
centroids[j] = centroid_sums[j] / np.sum(belonging_centroids == j)
old_d = d.copy()
def get_file_content(fnm):
with open(fnm) as f:
return [line.strip() for line in f]
def main(args):
print("Reading Data ...")
file_list = []
for f in args.file_list:
file_list.extend(get_file_content(f))
data = []
for one_file in tqdm(file_list):
one_file = one_file.replace('images', 'labels') \
.replace('JPEGImages', 'labels') \
.replace('.png', '.txt') \
.replace('.jpg', '.txt')
#print("one_file is:\n",one_file)
for line in get_file_content(one_file):
clazz, xx, yy, w, h = line.split()
data.append([float(w),float(h)])
data = np.array(data)
if args.engine.startswith("sklearn"):
if args.engine == "sklearn":
km = cluster.KMeans(n_clusters=args.num_clusters, tol=args.tol, verbose=True)
elif args.engine == "sklearn-mini":
km = cluster.MiniBatchKMeans(n_clusters=args.num_clusters, tol=args.tol, verbose=True)
km.fit(data)
result = km.cluster_centers_
# distance = km.inertia_ / data.shape[0]
distance = avg_iou(data, result)
else:
result = k_means(data, args.num_clusters, args.tol)
distance = avg_iou(data, result)
write_anchors_to_file(result, distance, args.output)
if "__main__" == __name__:
parser = argparse.ArgumentParser()
parser.add_argument('file_list', nargs='+', help='TrainList')
parser.add_argument('--num_clusters', '-n', default=5, type=int, help='Number of Clusters')
parser.add_argument('--output', '-o', default='../results/anchor.txt', type=str, help='Result Output File')
parser.add_argument('--tol', '-t', default=0.005, type=float, help='Tolerate')
parser.add_argument('--engine', '-m', default='sklearn', type=str,
choices=['original', 'sklearn', 'sklearn-mini'], help='Method to use')
args = parser.parse_args()
main(args)
生成结果,Average IOU就是论文中评价好坏的指标,5是指用5个,你也可以甚至其他数字,IOU效果不一样就是。
Cluster Result:
Clusters: 5
Average IoU: 0.633343079996
Anchors:
1.81218901641, 2.0756480568, 11.101054447, 9.88710144146, 3.27746965391, 5.96042296557, 4.84211551027, 8.96603606529, 9.99847701672, 6.52768518575
提示:
代码里有一段替换字符串的功能,要求你的图像(我图像放在JPEGImages)和labels文件夹要放在同一目录下,这样才能找到labels文件,这里是pascal voc的格式