tensorflow复现yolov3（参考keras-yolo3）

最新推荐文章于 2024-08-05 18:41:17 发布

CrazyStoneZw

最新推荐文章于 2024-08-05 18:41:17 发布

阅读量9.4k

点赞数 5

分类专栏： TensorFlow

本文链接：https://blog.csdn.net/oYouHuo/article/details/82151787

版权

TensorFlow 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

前言

网上找了几个复现的python代码，就数这个keras-yolo3好了，由于想体验下yolov3作者给出的权重文件经历的过程，所以想自己走一遍（c++又菜），看后面计算loss时候的矩阵操作看的头大，还有这个输入给模型的数据结构也不清晰，写个文章记录下，顺便整理下思路，有错误的帮忙指正，谢谢啦~（我用的是windows平台）

一、网络模型复现

首先下载c++对应的代码并运行，由于作者只提供了linux下的代码，网上有大神提供了windows下的，对应地址: AlexeyAB/darknet
相关的配置参考
https://blog.csdn.net/baidu_36669549/article/details/79798587
配置完成后编译生成对应的darknet.exe执行文件，执行如下命令查看网络模型

darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights -i 0 -thresh 0.25 dog.jpg

layer     filters    size              input                output
   0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32 0.299 BF
   1 conv     64  3 x 3 / 2   416 x 416 x  32   ->   208 x 208 x  64 1.595 BF
   2 conv     32  1 x 1 / 1   208 x 208 x  64   ->   208 x 208 x  32 0.177 BF
   3 conv     64  3 x 3 / 1   208 x 208 x  32   ->   208 x 208 x  64 1.595 BF
   4 Shortcut Layer: 1
   5 conv    128  3 x 3 / 2   208 x 208 x  64   ->   104 x 104 x 128 1.595 BF
   6 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64 0.177 BF
   7 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128 1.595 BF
   8 Shortcut Layer: 5
   9 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64 0.177 BF
  10 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128 1.595 BF
  11 Shortcut Layer: 8
  12 conv    256  3 x 3 / 2   104 x 104 x 128   ->    52 x  52 x 256 1.595 BF
  13 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  14 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  15 Shortcut Layer: 12
  16 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  17 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  18 Shortcut Layer: 15
  19 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  20 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  21 Shortcut Layer: 18
  22 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  23 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  24 Shortcut Layer: 21
  25 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  26 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  27 Shortcut Layer: 24
  28 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  29 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  30 Shortcut Layer: 27
  31 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  32 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  33 Shortcut Layer: 30
  34 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
  35 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
  36 Shortcut Layer: 33
  37 conv    512  3 x 3 / 2    52 x  52 x 256   ->    26 x  26 x 512 1.595 BF
  38 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  39 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  40 Shortcut Layer: 37
  41 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  42 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  43 Shortcut Layer: 40
  44 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  45 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  46 Shortcut Layer: 43
  47 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  48 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  49 Shortcut Layer: 46
  50 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  51 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  52 Shortcut Layer: 49
  53 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  54 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  55 Shortcut Layer: 52
  56 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  57 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  58 Shortcut Layer: 55
  59 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  60 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  61 Shortcut Layer: 58
  62 conv   1024  3 x 3 / 2    26 x  26 x 512   ->    13 x  13 x1024 1.595 BF
  63 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512 0.177 BF
  64 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024 1.595 BF
  65 Shortcut Layer: 62
  66 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512 0.177 BF
  67 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024 1.595 BF
  68 Shortcut Layer: 65
  69 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512 0.177 BF
  70 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024 1.595 BF
  71 Shortcut Layer: 68
  72 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512 0.177 BF
  73 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024 1.595 BF
  74 Shortcut Layer: 71
  75 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512 0.177 BF
  76 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024 1.595 BF
  77 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512 0.177 BF
  78 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024 1.595 BF
  79 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512 0.177 BF
  80 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024 1.595 BF
  81 conv    255  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 255 0.088 BF
  82 yolo
  83 route  79
  84 conv    256  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 256 0.044 BF
  85 upsample            2x    13 x  13 x 256   ->    26 x  26 x 256
  86 route  85 61
  87 conv    256  1 x 1 / 1    26 x  26 x 768   ->    26 x  26 x 256 0.266 BF
  88 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  89 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  90 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  91 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256 0.177 BF
  92 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512 1.595 BF
  93 conv    255  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 255 0.177 BF
  94 yolo
  95 route  91
  96 conv    128  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 128 0.044 BF
  97 upsample            2x    26 x  26 x 128   ->    52 x  52 x 128
  98 route  97 36
  99 conv    128  1 x 1 / 1    52 x  52 x 384   ->    52 x  52 x 128 0.266 BF
 100 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
 101 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
 102 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
 103 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128 0.177 BF
 104 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256 1.595 BF
 105 conv    255  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 255 0.353 BF
 106 yolo

网络结构及原理参考链接
https://www.cnblogs.com/makefile/p/YOLOv3.html
https://blog.csdn.net/chandanyan8568/article/details/81089083
Shortcut Layer（残差），route（直接网络跳到的行数）这两个后面跟的是行数，upsample（上采样）
对应到yolo层的代码

def yolo_body(images, num_classes=80):
   with tf.variable_scope('yolo'):
       with slim.arg_scope([slim.conv2d, slim.conv2d_transpose, slim.fully_connected], activation_fn=tf.nn.leaky_relu,
                           weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                           weights_regularizer=slim.l2_regularizer(0.0005)):
           net = slim.conv2d(images, 32, 3, scope='conv_1')
           first_layer = slim.conv2d(net, 64, 3, 2, scope='conv_2')
           net = slim.conv2d(first_layer, 32, 1, scope='conv_3')
           net = slim.conv2d(net, 64, 3, scope='conv_4')
           net = tf.nn.leaky_relu(tf.add(net, first_layer), alpha=0.2)
           second_layer = slim.conv2d(net, 128, 3, 2, scope='conv_5')
           for i in range(2):
               net = slim.conv2d(second_layer, 64, 1, scope='conv_%s' % (str(6 + i * 2)))
               net = slim.conv2d(net, 128, 3, scope='conv_%s' % (str(7 + i * 2)))
               second_layer = tf.nn.leaky_relu(tf.add(net, second_layer), alpha=0.2)
           third_layer = slim.conv2d(second_layer, 256, 3, 2, scope='conv_10')
           for i in range(8):
               net = slim.conv2d(third_layer, 128, 1, scope='conv_%s' % (str(11 + i * 2)))
               net = slim.conv2d(net, 256, 3, scope='conv_%s' % (str(12 + i * 2)))
               third_layer = tf.nn.leaky_relu(tf.add(net, third_layer), alpha=0.2)
           fourth_layer = slim.conv2d(third_layer, 512, 3, 2, scope='conv_27')
           for i in range(8):
               net = slim.conv2d(fourth_layer, 256, 1, scope='conv_%s' % (str(28 + i * 2)))
               net = slim.conv2d(net, 512, 3, scope='conv_%s' % (str(29 + i * 2)))
               fourth_layer = tf.nn.leaky_relu(tf.add(net, fourth_layer), alpha=0.2)
           fifth_layer = slim.conv2d(fourth_layer, 1024, 3, 2, scope='conv_44')
           for i in range(4):
               net = slim.conv2d(fifth_layer, 512, 1, scope='conv_%s' % (str(45 + i * 2)))
               net = slim.conv2d(net, 1024, 3, scope='conv_%s' % (str(46 + i * 2)))
               fifth_layer = tf.nn.leaky_relu(tf.add(net, fifth_layer), alpha=0.2)
           net = slim.conv2d(fifth_layer, 512, 1, scope='conv_53')
           net = slim.conv2d(net, 1024, 3, scope='conv_54')
           net = slim.conv2d(net, 512, 1, scope='conv_55')
           net = slim.conv2d(net, 1024, 3, scope='conv_56')
           scale_one = slim.conv2d(net, 512, 1, scope='conv_57')
           net = slim.conv2d(scale_one, 1024, 3, scope='conv_58')
           detection_one = slim.conv2d(net, 3 * (5 + num_classes), 3, scope='conv_59')

           scale_two = slim.conv2d(scale_one, 256, 3, scope='conv_60')
           scale_two = slim.conv2d_transpose(scale_two, 256, 3, 2, scope='conv2d_transpose1')
           net = tf.concat([scale_two, fourth_layer], axis=3)
           net = slim.conv2d(net, 256, 1, scope='conv_61')
           net = slim.conv2d(net, 512, 3, scope='conv_62')
           net = slim.conv2d(net, 256, 1, scope='conv_63')
           net = slim.conv2d(net, 512, 3, scope='conv_64')
           scale_two = slim.conv2d(net, 256, 1, scope='conv_65')
           net = slim.conv2d(scale_two, 512, 3, scope='conv_66')
           detection_two = slim.conv2d(net, 3 * (5 + num_classes), 1, scope='conv_67')

           scale_three = slim.conv2d(scale_two, 128, 1, scope='conv_68')
           scale_three = slim.conv2d_transpose(scale_three, 128, 3, 2, scope='conv2d_transpose2')
           net = tf.concat([scale_three, third_layer], axis=3)
           net = slim.conv2d(net, 128, 1, scope='conv_69')
           net = slim.conv2d(net, 256, 3, scope='conv_70')
           net = slim.conv2d(net, 128, 1, scope='conv_71')
           net = slim.conv2d(net, 256, 3, scope='conv_72')
           net = slim.conv2d(net, 128, 1, scope='conv_73')
           net = slim.conv2d(net, 256, 3, scope='conv_74')
           detection_three = slim.conv2d(net, 3 * (5 + num_classes), 1, scope='conv_75')

   return detection_one, detection_two, detection_three

网络模型有了后还要计算loss，目前还是大部分的keras项目的代码

def yolo_loss(feats, num_classes, y_true, ignore_thresh=.5):
    # y_true = [Input(shape=(416 // {0: 32, 1: 16, 2: 8}[l], 416 // {0: 32, 1: 16, 2: 8}[l], \
    #                        9 // 3, num_classes + 5)) for l in range(3)]

    loss = 0
    m = K.shape(feats[0])[0] # batch size, tensor
    mf = K.cast(m, K.dtype(feats[0]))
    grid_shapes = [K.cast(K.shape(feats[l])[1:3], K.dtype(y_true[0])) for l in range(3)]
    input_shape = K.cast(K.shape(feats[0])[1:3] * 32, K.dtype(y_true[0]))
    # 10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326
    anchors = [[[10, 13], [16, 30], [33, 23]], [[30, 61], [62, 45], [59, 119]], [[116, 90], [156, 198], [373, 326]]]
    for i in range(3):
        object_mask = y_true[i][..., 4:5]
        true_class_probs = y_true[i][..., 5:]
        # 13 * 13, 16 * 16, 32 * 32 预测的box的大小及位置
        grid, raw_pred, pred_xy, pred_wh = yolo_head(feats[i], anchors[i], num_classes, calc_loss=True)
        pred_box = tf.concat([pred_xy, pred_wh], axis=-1)

        # Darknet raw box to calculate loss.
        raw_true_xy = y_true[i][..., :2]*grid_shapes[i][::-1] - grid
        raw_true_wh = K.log(y_true[i][..., 2:4] / anchors[i] * input_shape[::-1])
        raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
        box_loss_scale = 2 - y_true[i][...,2:3] * y_true[i][...,3:4]

        # Find ignore mask, iterate over each of batch.
        ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
        object_mask_bool = K.cast(object_mask, 'bool')

        def loop_body(b, ignore_mask):
            true_box = tf.boolean_mask(y_true[i][b, ..., 0:4], object_mask_bool[b, ..., 0])
            iou = box_iou(pred_box[b], true_box)
            best_iou = tf.reduce_max(iou, axis=-1, keepdims=False)
            ignore_mask = ignore_mask.write(b, tf.cast(best_iou < ignore_thresh, true_box.dtype))
            return b + 1, ignore_mask

        _, ignore_mask = K.control_flow_ops.while_loop(lambda b, *args: b < m, loop_body, [0, ignore_mask])
        ignore_mask = ignore_mask.stack()
        ignore_mask = K.expand_dims(ignore_mask, -1)

        # K.binary_crossentropy is helpful to avoid exp overflow.
        xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2],
                                                                       from_logits=True)
        wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4])
        confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + \
                          (1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5],
                                                                    from_logits=True) * ignore_mask
        class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[..., 5:], from_logits=True)

        xy_loss = K.sum(xy_loss) / mf
        wh_loss = K.sum(wh_loss) / mf
        confidence_loss = K.sum(confidence_loss) / mf
        class_loss = K.sum(class_loss) / mf
        loss += xy_loss + wh_loss + confidence_loss + class_loss
        loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)],
                        message='loss: ')
    return loss

loss有了后就可以定义优化函数进行训练。

二、构建输入数据

安装coco数据集，参考链接：https://blog.csdn.net/oYouHuo/article/details/81114875
安装测试后使用如下代码处理数据集，因为我只检测人，所以写了一个种类

from pycocotools.coco import COCO
dataType = 'train2017'
annFile = './annotations/instances_{}.json'.format(dataType)


def deal_data():
    coco = COCO(annFile)
    cat_ids = coco.getCatIds(catNms=['person'])
    img_ids = coco.getImgIds(catIds=cat_ids)
    with open('./deal_data.txt', 'w') as f:
        for img_id in img_ids:
            img = coco.loadImgs(img_id)[0]
            f.write('./images/{}/{}\t'.format(dataType, img['file_name']))
            annIds = coco.getAnnIds(imgIds=img['id'], catIds=cat_ids, iscrowd=None)
            anns = coco.loadAnns(annIds)
            for ann in anns:
                f.write('{},{},{},{}'.format(ann['bbox'][0], ann['bbox'][1], ann['bbox'][0] + ann['bbox'][2], ann['bbox'][1] + ann['bbox'][3]))
                for index in range(len(cat_ids)):
                    if ann['category_id'] == cat_ids[index]:
                        f.write(',{}'.format(index))
                        break
                f.write('\t')
            f.write('\n')


def main():
    deal_data()


if __name__ == '__main__':
    main()

上面这段代码执行完后会得到数据集的预处理文件，样子如下：
这里写图片描述
注意：上图中的矩形框的意思分别是left, top, right, bottom，因为我一开始错误处理成了x, y, width, height 上面的标注图片又懒得换了，所以结果和你不一样。
上面对数据初步处理后还要将数据处理成yolo对应的数据结构，因为13， 26， 52三个尺寸，每个尺寸对应3个输出，所以对应的数据结构分别是（batch_size, 13, 13, 3, 5+classes_size), （batch_size, 26, 26, 3, 5+classes_size), （batch_size, 52, 52, 3, 5+classes_size), 使用如下代码实现（keras-yolo3作者的代码很6，拿来主义）：

import numpy as np
from PIL import Image
from matplotlib.colors import rgb_to_hsv, hsv_to_rgb


def rand(a=0, b=1):
    return np.random.rand()*(b-a) + a


def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
    '''random preprocessing for real-time data augmentation'''
    line = annotation_line.split()
    image = Image.open(line[0])
    iw, ih = image.size
    h, w = input_shape
    box = np.array([np.array(list(map(float,box.split(',')))) for box in line[1:]])
    box = np.floor(box)
    box = box.astype(np.int16)

    if not random:
        # resize image
        scale = min(w/iw, h/ih)
        nw = int(iw*scale)
        nh = int(ih*scale)
        dx = (w-nw)//2
        dy = (h-nh)//2
        image_data=0
        if proc_img:
            image = image.resize((nw,nh), Image.BICUBIC)
            new_image = Image.new('RGB', (w,h), (128,128,128))
            new_image.paste(image, (dx, dy))
            image_data = np.array(new_image)/255.

        # correct boxes
        box_data = np.zeros((max_boxes,5))
        if len(box)>0:
            np.random.shuffle(box)
            if len(box)>max_boxes: box = box[:max_boxes]
            box[:, [0,2]] = box[:, [0,2]]*scale + dx
            box[:, [1,3]] = box[:, [1,3]]*scale + dy
            box_data[:len(box)] = box

        return image_data, box_data

    # resize image
    new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
    scale = rand(.25, 2)
    if new_ar < 1:
        nh = int(scale*h)
        nw = int(nh*new_ar)
    else:
        nw = int(scale*w)
        nh = int(nw/new_ar)
    image = image.resize((nw,nh), Image.BICUBIC)

    # place image
    dx = int(rand(0, w-nw))
    dy = int(rand(0, h-nh))
    new_image = Image.new('RGB', (w,h), (128,128,128))
    new_image.paste(image, (dx, dy))
    image = new_image

    #
    #  image or not
    flip = rand()<.5
    if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)

    # distort image
    hue = rand(-hue, hue)
    sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
    val = rand(1, val) if rand()<.5 else 1/rand(1, val)
    x = rgb_to_hsv(np.array(image)/255.)
    x[..., 0] += hue
    x[..., 0][x[..., 0]>1] -= 1
    x[..., 0][x[..., 0]<0] += 1
    x[..., 1] *= sat
    x[..., 2] *= val
    x[x>1] = 1
    x[x<0] = 0
    image_data = hsv_to_rgb(x) # numpy array, 0 to 1

    # correct boxes
    box_data = np.zeros((max_boxes,5))
    if len(box)>0:
        np.random.shuffle(box)
        box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
        box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
        if flip: box[:, [0,2]] = w - box[:, [2,0]]
        box[:, 0:2][box[:, 0:2]<0] = 0
        box[:, 2][box[:, 2]>w] = w
        box[:, 3][box[:, 3]>h] = h
        box_w = box[:, 2] - box[:, 0]
        box_h = box[:, 3] - box[:, 1]
        box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
        if len(box)>max_boxes: box = box[:max_boxes]
        box_data[:len(box)] = box

    return image_data, box_data


def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
    '''Preprocess true boxes to training input format

    Parameters
    ----------
    true_boxes: array, shape=(m, T, 5)
        Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
    input_shape: array-like, hw, multiples of 32
    anchors: array, shape=(N, 2), wh
    num_classes: integer

    Returns
    -------
    y_true: list of array, shape like yolo_outputs, xywh are reletive value

    '''
    assert (true_boxes[..., 4]<num_classes).all(), 'class id must be less than num_classes'
    num_layers = len(anchors)//3 # default setting
    anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]

    true_boxes = np.array(true_boxes, dtype='float32')
    input_shape = np.array(input_shape, dtype='int32')

    boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
    boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
    true_boxes[..., 0:2] = boxes_xy/input_shape[::-1]
    true_boxes[..., 2:4] = boxes_wh/input_shape[::-1]

    m = true_boxes.shape[0]
    grid_shapes = [input_shape//{0:32, 1:16, 2:8}[l] for l in range(num_layers)]
    y_true = [np.zeros((m,grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5+num_classes),
        dtype='float32') for l in range(num_layers)]

    # Expand dim to apply broadcasting.
    anchors = np.expand_dims(anchors, 0)
    anchor_maxes = anchors / 2.
    anchor_mins = -anchor_maxes
    valid_mask = boxes_wh[..., 0] > 0

    for b in range(m):
        # Discard zero rows.
        wh = boxes_wh[b, valid_mask[b]]
        if len(wh) == 0:
            continue
        # Expand dim to apply broadcasting.
        wh = np.expand_dims(wh, -2)
        box_maxes = wh / 2.
        box_mins = -box_maxes

        intersect_mins = np.maximum(box_mins, anchor_mins)
        intersect_maxes = np.minimum(box_maxes, anchor_maxes)
        intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
        intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
        box_area = wh[..., 0] * wh[..., 1]
        anchor_area = anchors[..., 0] * anchors[..., 1]
        iou = intersect_area / (box_area + anchor_area - intersect_area)

        # Find best anchor for each true box
        best_anchor = np.argmax(iou, axis=-1)

        for t, n in enumerate(best_anchor):
            for l in range(num_layers):
                if n in anchor_mask[l]:
                    i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')
                    j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')
                    k = anchor_mask[l].index(n)
                    c = true_boxes[b,t, 4].astype('int32')
                    y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
                    y_true[l][b, j, i, k, 4] = 1
                    y_true[l][b, j, i, k, 5+c] = 1

    return y_true


# '''data generator for fit_generator'''
def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
    n = len(annotation_lines)
    # 每个epoch随机用
    i = 0
    while True:
        image_data = []
        box_data = []
        for b in range(batch_size):
            if i == 0:
                np.random.shuffle(annotation_lines)
            image, box = get_random_data(annotation_lines[i], input_shape, random=False)
            image_data.append(image)
            box_data.append(box)
            i = (i+1) % n
        image_data = np.array(image_data)
        box_data = np.array(box_data)
        y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
        yield [image_data, *y_true], np.zeros(batch_size)


def get_anchors():
    with open('./yolo_anchors.txt', 'r') as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    return np.array(anchors).reshape(-1, 2)


def main():
    is_training = True
    with open('./deal_data.txt', 'r') as f:
        lines = f.readlines()
    np.random.seed(10101)
    np.random.shuffle(lines)
    val_split = 0.1
    num_val = int(len(lines)*val_split)
    num_train = len(lines) - num_val
    if is_training:
        lines = lines[:num_train]
    else:
        lines = lines[num_train:]
    anchors = get_anchors()
    for data in data_generator(lines, 10, (416, 416), anchors=anchors, num_classes=1):
        print('aaa')


if __name__ == '__main__':
    main()

data就是一个batch的数据。着重看一下get_random_data，preprocess_true_boxes，处理的很精彩。anchors是在之前聚类出的九个类别尺寸。这个和待识别目标尺寸息息相关。
另外get_random_data里面的max_boxes=20，所以如果你的目标识别在一张图中数目超过这个数字，修改一下。另外对数据的处理方式也对模型的训练有关系，所以可以看情况自己写处理方式。c++的源码没看，有兴趣的可以看看。