前言
网上找了几个复现的python代码,就数这个keras-yolo3好了,由于想体验下yolov3作者给出的权重文件经历的过程,所以想自己走一遍(c++又菜),看后面计算loss时候的矩阵操作看的头大,还有这个输入给模型的数据结构也不清晰,写个文章记录下,顺便整理下思路,有错误的帮忙指正,谢谢啦~(我用的是windows平台)
一、网络模型复现
首先下载c++对应的代码并运行,由于作者只提供了linux下的代码,网上有大神提供了windows下的,对应地址: AlexeyAB/darknet
相关的配置参考
https://blog.csdn.net/baidu_36669549/article/details/79798587
配置完成后编译生成对应的darknet.exe执行文件,执行如下命令查看网络模型
darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights -i 0 -thresh 0.25 dog.jpg
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
4 Shortcut Layer: 1
5 conv 128 3 x 3 / 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF
6 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
7 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
8 Shortcut Layer: 5
9 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
10 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
11 Shortcut Layer: 8
12 conv 256 3 x 3 / 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF
13 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
14 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
15 Shortcut Layer: 12
16 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
17 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
18 Shortcut Layer: 15
19 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
20 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
21 Shortcut Layer: 18
22 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
23 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
24 Shortcut Layer: 21
25 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
26 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
27 Shortcut Layer: 24
28 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
29 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
30 Shortcut Layer: 27
31 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
32 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
33 Shortcut Layer: 30
34 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
35 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
36 Shortcut Layer: 33
37 conv 512 3 x 3 / 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF
38 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
39 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
40 Shortcut Layer: 37
41 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
42 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
43 Shortcut Layer: 40
44 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
45 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
46 Shortcut Layer: 43
47 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
48 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
49 Shortcut Layer: 46
50 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
51 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
52 Shortcut Layer: 49
53 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
54 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
55 Shortcut Layer: 52
56 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
57 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
58 Shortcut Layer: 55
59 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
60 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
61 Shortcut Layer: 58
62 conv 1024 3 x 3 / 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF
63 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
64 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
65 Shortcut Layer: 62
66 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
67 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
68 Shortcut Layer: 65
69 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
70 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
71 Shortcut Layer: 68
72 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
73 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
74 Shortcut Layer: 71
75 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
76 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
77 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
78 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
79 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
80 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
81 conv 255 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 255 0.088 BF
82 yolo
83 route 79
84 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF
85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256
86 route 85 61
87 conv 256 1 x 1 / 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF
88 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
89 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
90 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
91 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
92 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
93 conv 255 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 255 0.177 BF
94 yolo
95 route 91
96 conv 128 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128
98 route 97 36
99 conv 128 1 x 1 / 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF
100 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
101 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
102 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
103 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
104 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
105 conv 255 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BF
106 yolo
网络结构及原理参考链接
https://www.cnblogs.com/makefile/p/YOLOv3.html
https://blog.csdn.net/chandanyan8568/article/details/81089083
Shortcut Layer(残差),route(直接网络跳到的行数)这两个后面跟的是行数,upsample(上采样)
对应到yolo层的代码
def yolo_body(images, num_classes=80):
with tf.variable_scope('yolo'):
with slim.arg_scope([slim.conv2d, slim.conv2d_transpose, slim.fully_connected], activation_fn=tf.nn.leaky_relu,
weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
weights_regularizer=slim.l2_regularizer(0.0005)):
net = slim.conv2d(images, 32, 3, scope='conv_1')
first_layer = slim.conv2d(net, 64, 3, 2, scope='conv_2')
net = slim.conv2d(first_layer, 32, 1, scope='conv_3')
net = slim.conv2d(net, 64, 3, scope='conv_4')
net = tf.nn.leaky_relu(tf.add(net, first_layer), alpha=0.2)
second_layer = slim.conv2d(net, 128, 3, 2, scope='conv_5')
for i in range(2):
net = slim.conv2d(second_layer, 64, 1, scope='conv_%s' % (str(6 + i * 2)))
net = slim.conv2d(net, 128, 3, scope='conv_%s' % (str(7 + i * 2)))
second_layer = tf.nn.leaky_relu(tf.add(net, second_layer), alpha=0.2)
third_layer = slim.conv2d(second_layer, 256, 3, 2, scope='conv_10')
for i in range(8):
net = slim.conv2d(third_layer, 128, 1, scope='conv_%s' % (str(11 + i * 2)))
net = slim.conv2d(net, 256, 3, scope='conv_%s' % (str(12 + i * 2)))
third_layer = tf.nn.leaky_relu(tf.add(net, third_layer), alpha=0.2)
fourth_layer = slim.conv2d(third_layer, 512, 3, 2, scope='conv_27')
for i in range(8):
net = slim.conv2d(fourth_layer, 256, 1, scope='conv_%s' % (str(28 + i * 2)))
net = slim.conv2d(net, 512, 3, scope='conv_%s' % (str(29 + i * 2)))
fourth_layer = tf.nn.leaky_relu(tf.add(net, fourth_layer), alpha=0.2)
fifth_layer = slim.conv2d(fourth_layer, 1024, 3, 2, scope='conv_44')
for i in range(4):
net = slim.conv2d(fifth_layer, 512, 1, scope='conv_%s' % (str(45 + i * 2)))
net = slim.conv2d(net, 1024, 3, scope='conv_%s' % (str(46 + i * 2)))
fifth_layer = tf.nn.leaky_relu(tf.add(net, fifth_layer), alpha=0.2)
net = slim.conv2d(fifth_layer, 512, 1, scope='conv_53')
net = slim.conv2d(net, 1024, 3, scope='conv_54')
net = slim.conv2d(net, 512, 1, scope='conv_55')
net = slim.conv2d(net, 1024, 3, scope='conv_56')
scale_one = slim.conv2d(net, 512, 1, scope='conv_57')
net = slim.conv2d(scale_one, 1024, 3, scope='conv_58')
detection_one = slim.conv2d(net, 3 * (5 + num_classes), 3, scope='conv_59')
scale_two = slim.conv2d(scale_one, 256, 3, scope='conv_60')
scale_two = slim.conv2d_transpose(scale_two, 256, 3, 2, scope='conv2d_transpose1')
net = tf.concat([scale_two, fourth_layer], axis=3)
net = slim.conv2d(net, 256, 1, scope='conv_61')
net = slim.conv2d(net, 512, 3, scope='conv_62')
net = slim.conv2d(net, 256, 1, scope='conv_63')
net = slim.conv2d(net, 512, 3, scope='conv_64')
scale_two = slim.conv2d(net, 256, 1, scope='conv_65')
net = slim.conv2d(scale_two, 512, 3, scope='conv_66')
detection_two = slim.conv2d(net, 3 * (5 + num_classes), 1, scope='conv_67')
scale_three = slim.conv2d(scale_two, 128, 1, scope='conv_68')
scale_three = slim.conv2d_transpose(scale_three, 128, 3, 2, scope='conv2d_transpose2')
net = tf.concat([scale_three, third_layer], axis=3)
net = slim.conv2d(net, 128, 1, scope='conv_69')
net = slim.conv2d(net, 256, 3, scope='conv_70')
net = slim.conv2d(net, 128, 1, scope='conv_71')
net = slim.conv2d(net, 256, 3, scope='conv_72')
net = slim.conv2d(net, 128, 1, scope='conv_73')
net = slim.conv2d(net, 256, 3, scope='conv_74')
detection_three = slim.conv2d(net, 3 * (5 + num_classes), 1, scope='conv_75')
return detection_one, detection_two, detection_three
网络模型有了后还要计算loss,目前还是大部分的keras项目的代码
def yolo_loss(feats, num_classes, y_true, ignore_thresh=.5):
# y_true = [Input(shape=(416 // {0: 32, 1: 16, 2: 8}[l], 416 // {0: 32, 1: 16, 2: 8}[l], \
# 9 // 3, num_classes + 5)) for l in range(3)]
loss = 0
m = K.shape(feats[0])[0] # batch size, tensor
mf = K.cast(m, K.dtype(feats[0]))
grid_shapes = [K.cast(K.shape(feats[l])[1:3], K.dtype(y_true[0])) for l in range(3)]
input_shape = K.cast(K.shape(feats[0])[1:3] * 32, K.dtype(y_true[0]))
# 10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326
anchors = [[[10, 13], [16, 30], [33, 23]], [[30, 61], [62, 45], [59, 119]], [[116, 90], [156, 198], [373, 326]]]
for i in range(3):
object_mask = y_true[i][..., 4:5]
true_class_probs = y_true[i][..., 5:]
# 13 * 13, 16 * 16, 32 * 32 预测的box的大小及位置
grid, raw_pred, pred_xy, pred_wh = yolo_head(feats[i], anchors[i], num_classes, calc_loss=True)
pred_box = tf.concat([pred_xy, pred_wh], axis=-1)
# Darknet raw box to calculate loss.
raw_true_xy = y_true[i][..., :2]*grid_shapes[i][::-1] - grid
raw_true_wh = K.log(y_true[i][..., 2:4] / anchors[i] * input_shape[::-1])
raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
box_loss_scale = 2 - y_true[i][...,2:3] * y_true[i][...,3:4]
# Find ignore mask, iterate over each of batch.
ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
object_mask_bool = K.cast(object_mask, 'bool')
def loop_body(b, ignore_mask):
true_box = tf.boolean_mask(y_true[i][b, ..., 0:4], object_mask_bool[b, ..., 0])
iou = box_iou(pred_box[b], true_box)
best_iou = tf.reduce_max(iou, axis=-1, keepdims=False)
ignore_mask = ignore_mask.write(b, tf.cast(best_iou < ignore_thresh, true_box.dtype))
return b + 1, ignore_mask
_, ignore_mask = K.control_flow_ops.while_loop(lambda b, *args: b < m, loop_body, [0, ignore_mask])
ignore_mask = ignore_mask.stack()
ignore_mask = K.expand_dims(ignore_mask, -1)
# K.binary_crossentropy is helpful to avoid exp overflow.
xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2],
from_logits=True)
wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4])
confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + \
(1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5],
from_logits=True) * ignore_mask
class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[..., 5:], from_logits=True)
xy_loss = K.sum(xy_loss) / mf
wh_loss = K.sum(wh_loss) / mf
confidence_loss = K.sum(confidence_loss) / mf
class_loss = K.sum(class_loss) / mf
loss += xy_loss + wh_loss + confidence_loss + class_loss
loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)],
message='loss: ')
return loss
loss有了后就可以定义优化函数进行训练。
二、构建输入数据
安装coco数据集,参考链接:https://blog.csdn.net/oYouHuo/article/details/81114875
安装测试后使用如下代码处理数据集,因为我只检测人,所以写了一个种类
from pycocotools.coco import COCO
dataType = 'train2017'
annFile = './annotations/instances_{}.json'.format(dataType)
def deal_data():
coco = COCO(annFile)
cat_ids = coco.getCatIds(catNms=['person'])
img_ids = coco.getImgIds(catIds=cat_ids)
with open('./deal_data.txt', 'w') as f:
for img_id in img_ids:
img = coco.loadImgs(img_id)[0]
f.write('./images/{}/{}\t'.format(dataType, img['file_name']))
annIds = coco.getAnnIds(imgIds=img['id'], catIds=cat_ids, iscrowd=None)
anns = coco.loadAnns(annIds)
for ann in anns:
f.write('{},{},{},{}'.format(ann['bbox'][0], ann['bbox'][1], ann['bbox'][0] + ann['bbox'][2], ann['bbox'][1] + ann['bbox'][3]))
for index in range(len(cat_ids)):
if ann['category_id'] == cat_ids[index]:
f.write(',{}'.format(index))
break
f.write('\t')
f.write('\n')
def main():
deal_data()
if __name__ == '__main__':
main()
上面这段代码执行完后会得到数据集的预处理文件,样子如下:
注意:上图中的矩形框的意思分别是left, top, right, bottom,因为我一开始错误处理成了x, y, width, height 上面的标注图片又懒得换了,所以结果和你不一样。
上面对数据初步处理后还要将数据处理成yolo对应的数据结构,因为13, 26, 52三个尺寸,每个尺寸对应3个输出,所以对应的数据结构分别是(batch_size, 13, 13, 3, 5+classes_size), (batch_size, 26, 26, 3, 5+classes_size), (batch_size, 52, 52, 3, 5+classes_size), 使用如下代码实现(keras-yolo3作者的代码很6,拿来主义):
import numpy as np
from PIL import Image
from matplotlib.colors import rgb_to_hsv, hsv_to_rgb
def rand(a=0, b=1):
return np.random.rand()*(b-a) + a
def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
'''random preprocessing for real-time data augmentation'''
line = annotation_line.split()
image = Image.open(line[0])
iw, ih = image.size
h, w = input_shape
box = np.array([np.array(list(map(float,box.split(',')))) for box in line[1:]])
box = np.floor(box)
box = box.astype(np.int16)
if not random:
# resize image
scale = min(w/iw, h/ih)
nw = int(iw*scale)
nh = int(ih*scale)
dx = (w-nw)//2
dy = (h-nh)//2
image_data=0
if proc_img:
image = image.resize((nw,nh), Image.BICUBIC)
new_image = Image.new('RGB', (w,h), (128,128,128))
new_image.paste(image, (dx, dy))
image_data = np.array(new_image)/255.
# correct boxes
box_data = np.zeros((max_boxes,5))
if len(box)>0:
np.random.shuffle(box)
if len(box)>max_boxes: box = box[:max_boxes]
box[:, [0,2]] = box[:, [0,2]]*scale + dx
box[:, [1,3]] = box[:, [1,3]]*scale + dy
box_data[:len(box)] = box
return image_data, box_data
# resize image
new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
scale = rand(.25, 2)
if new_ar < 1:
nh = int(scale*h)
nw = int(nh*new_ar)
else:
nw = int(scale*w)
nh = int(nw/new_ar)
image = image.resize((nw,nh), Image.BICUBIC)
# place image
dx = int(rand(0, w-nw))
dy = int(rand(0, h-nh))
new_image = Image.new('RGB', (w,h), (128,128,128))
new_image.paste(image, (dx, dy))
image = new_image
#
# image or not
flip = rand()<.5
if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
# distort image
hue = rand(-hue, hue)
sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
val = rand(1, val) if rand()<.5 else 1/rand(1, val)
x = rgb_to_hsv(np.array(image)/255.)
x[..., 0] += hue
x[..., 0][x[..., 0]>1] -= 1
x[..., 0][x[..., 0]<0] += 1
x[..., 1] *= sat
x[..., 2] *= val
x[x>1] = 1
x[x<0] = 0
image_data = hsv_to_rgb(x) # numpy array, 0 to 1
# correct boxes
box_data = np.zeros((max_boxes,5))
if len(box)>0:
np.random.shuffle(box)
box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
if flip: box[:, [0,2]] = w - box[:, [2,0]]
box[:, 0:2][box[:, 0:2]<0] = 0
box[:, 2][box[:, 2]>w] = w
box[:, 3][box[:, 3]>h] = h
box_w = box[:, 2] - box[:, 0]
box_h = box[:, 3] - box[:, 1]
box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
if len(box)>max_boxes: box = box[:max_boxes]
box_data[:len(box)] = box
return image_data, box_data
def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
'''Preprocess true boxes to training input format
Parameters
----------
true_boxes: array, shape=(m, T, 5)
Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
input_shape: array-like, hw, multiples of 32
anchors: array, shape=(N, 2), wh
num_classes: integer
Returns
-------
y_true: list of array, shape like yolo_outputs, xywh are reletive value
'''
assert (true_boxes[..., 4]<num_classes).all(), 'class id must be less than num_classes'
num_layers = len(anchors)//3 # default setting
anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
true_boxes = np.array(true_boxes, dtype='float32')
input_shape = np.array(input_shape, dtype='int32')
boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
true_boxes[..., 0:2] = boxes_xy/input_shape[::-1]
true_boxes[..., 2:4] = boxes_wh/input_shape[::-1]
m = true_boxes.shape[0]
grid_shapes = [input_shape//{0:32, 1:16, 2:8}[l] for l in range(num_layers)]
y_true = [np.zeros((m,grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5+num_classes),
dtype='float32') for l in range(num_layers)]
# Expand dim to apply broadcasting.
anchors = np.expand_dims(anchors, 0)
anchor_maxes = anchors / 2.
anchor_mins = -anchor_maxes
valid_mask = boxes_wh[..., 0] > 0
for b in range(m):
# Discard zero rows.
wh = boxes_wh[b, valid_mask[b]]
if len(wh) == 0:
continue
# Expand dim to apply broadcasting.
wh = np.expand_dims(wh, -2)
box_maxes = wh / 2.
box_mins = -box_maxes
intersect_mins = np.maximum(box_mins, anchor_mins)
intersect_maxes = np.minimum(box_maxes, anchor_maxes)
intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
box_area = wh[..., 0] * wh[..., 1]
anchor_area = anchors[..., 0] * anchors[..., 1]
iou = intersect_area / (box_area + anchor_area - intersect_area)
# Find best anchor for each true box
best_anchor = np.argmax(iou, axis=-1)
for t, n in enumerate(best_anchor):
for l in range(num_layers):
if n in anchor_mask[l]:
i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')
j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')
k = anchor_mask[l].index(n)
c = true_boxes[b,t, 4].astype('int32')
y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
y_true[l][b, j, i, k, 4] = 1
y_true[l][b, j, i, k, 5+c] = 1
return y_true
# '''data generator for fit_generator'''
def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
n = len(annotation_lines)
# 每个epoch随机用
i = 0
while True:
image_data = []
box_data = []
for b in range(batch_size):
if i == 0:
np.random.shuffle(annotation_lines)
image, box = get_random_data(annotation_lines[i], input_shape, random=False)
image_data.append(image)
box_data.append(box)
i = (i+1) % n
image_data = np.array(image_data)
box_data = np.array(box_data)
y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
yield [image_data, *y_true], np.zeros(batch_size)
def get_anchors():
with open('./yolo_anchors.txt', 'r') as f:
anchors = f.readline()
anchors = [float(x) for x in anchors.split(',')]
return np.array(anchors).reshape(-1, 2)
def main():
is_training = True
with open('./deal_data.txt', 'r') as f:
lines = f.readlines()
np.random.seed(10101)
np.random.shuffle(lines)
val_split = 0.1
num_val = int(len(lines)*val_split)
num_train = len(lines) - num_val
if is_training:
lines = lines[:num_train]
else:
lines = lines[num_train:]
anchors = get_anchors()
for data in data_generator(lines, 10, (416, 416), anchors=anchors, num_classes=1):
print('aaa')
if __name__ == '__main__':
main()
data就是一个batch的数据。着重看一下get_random_data,preprocess_true_boxes,处理的很精彩。anchors是在之前聚类出的九个类别尺寸。这个和待识别目标尺寸息息相关。
另外get_random_data里面的max_boxes=20,所以如果你的目标识别在一张图中数目超过这个数字,修改一下。另外对数据的处理方式也对模型的训练有关系,所以可以看情况自己写处理方式。c++的源码没看,有兴趣的可以看看。