Darknet - How to improve object detection? - 如何提升物体检测?

Darknet - How to improve object detection? - 如何提升物体检测?


1. before training

  • set flag random=1 in your .cfg-file - it will increase precision by training YOLO for different resolutions.
    .cfg 文件中设置标志 random = 1,通过对 YOLO 进行不同分辨率的训练,它将提高精度。

  • increase network resolution in your .cfg-file (height=608, width=608 or any value multiple of 32) - it will increase precision.
    增加 .cfg 文件中的网络分辨率 (height=608, width=608 或 32 的任意倍数),它将提高精度。

  • check that each object that you want to detect is mandatory labeled in your dataset - no one object in your data set should not be without label. In the most training issues - there are wrong labels in your dataset (got labels by using some conversion script, marked with a third-party tool, …). Always check your dataset by using: https://github.com/AlexeyAB/Yolo_mark
    检查您要检测的每个目标是否在数据集中被强制标记,数据集中的任何目标都不应没有标签。在大多数训练问题中,数据集中有错误的标签 (通过使用某些转换脚本,使用第三方工具标记的标签来获得标签,…)。

mandatory [ˈmændətəri; mænˈdeɪtəri]:adj. 强制的,托管的,命令的 n. 受托者
mandatary ['mændət(ə)rɪ]:n. 委托人,受委任统治国
  • my Loss is very high and mAP is very low, is training wrong? Run training with -show_imgs flag at the end of training command, do you see correct bounded boxes of objects (in windows or in files aug_...jpg)? If no - your training dataset is wrong.
    我的 Loss 非常高,而 mAP 却很低,训练错了吗?在训练命令末尾使用 -show_imgs 标志运行训练,您是否看到正确的 bounded boxes (in windows or in files aug_...jpg)?如果否,您的训练数据集有误。

  • for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at different: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more.
    对于您要检测的每个目标,训练数据集中必须至少有一个相似的目标,它们具有大致相同的情形:形状、目标的侧面、相对大小、旋转角度、倾斜度、光照。理想条件是,您的训练数据集包含不同情形下目标的图像:比例、旋转、照明、来自不同侧面、处于不同背景的图像。对于每个类别,您最好拥有 2000 张或者更多不同的图像,并且应该训练 2000*classes 或者更多的迭代。

  • desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects.
    希望您的训练数据集包含带有不想检测的未标记目标的图像,无 bounded box 的负样本 (空的 .txt 文件),使用与带有目标的图像一样多的负样本图像。

  • What is the best way to mark objects: label only the visible part of the object, or label the visible and overlapped part of the object, or label a little more than the entire object (with a little gap)? Mark as you like - how would you like it to be detected.
    标记目标的最佳方法是什么:仅标记目标的可见部分,或标记目标的可见和重叠部分,或标记比整个目标多一点 (有一点间隙)?根据您希望如何检测它的进行标记。

  • for training with a large number of objects in each image, add the parameter max=200 or higher value in the last [yolo]-layer or [region]-layer in your cfg-file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375*(width*height) where are width and height are parameters from [net] section in cfg-file)。
    要使用包含大量目标的图像进行训练,请在 cfg 文件的最后一个 [yolo] 层或 [region] 层中添加参数 max=200 或更高的值 (全局最大值 YoloV3 可以检测到的目标数是 0,0615234375*(width*height),其中 width 和 height 是 cfg 文件中 [net] 的参数)。

  • for training for small objects (smaller than 16 x 16 after the image is resized to 416 x 416) - set layers = -1, 11 instead of darknet/cfg/yolov3.cfg - L720 - layers = -1, 36 and set stride=4 instead of darknet/cfg/yolov3.cfg - L717 - stride=2

  • for training for both small and large objects use modified models:
    Full-model: 5 yolo layers: AlexeyAB/darknet/master/cfg/yolov3_5l.cfg
    Tiny-model: 3 yolo layers: AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg
    Spatial-full-model: 3 yolo layers: AlexeyAB/darknet/master/cfg/yolov3-spp.cfg

  • If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right-turn on road signs, …) then for disabling flip data augmentation - add flip=0 here: darknet/cfg/yolov3.cfg - L17
    如果训练模型以将左目标和右目标区分为单独的类 (左/右手,左/右转道路标志,…),则禁用翻转数据增强,在此处添加 flip=0darknet/cfg/yolov3.cfg - L17

  • General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size:
对于来自 Test 数据集的每个目标,Training 数据集中必须至少有一个目标具有相同的 class_id 且相对大小大约相同:

object width in percent from Training dataset ~= object width in percent from Test dataset

That is, if only objects that occupied 80-90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1-10% of the image.
也就是说,如果在训练集中仅存在占据图像的 80-90% 的目标,则训练的网络将不能检测占据图像的 1-10% 的目标。

  • to speedup training (with decreasing detection accuracy) do Fine-Tuning instead of Transfer-Learning, set param stopbackward=1 here: darknet/cfg/yolov3.cfg - L548 then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81, then train by using weights file yolov3.conv.81 instead of darknet53.conv.74.
    为了加快训练速度 (降低检测精度),请执行微调 (Fine-Tuning) 而不是迁移学习 (Transfer-Learning),请在此处设置参数 stopbackward=1darknet/cfg/yolov3.cfg - L548,然后执行以下命令:./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 将被创建为文件 yolov3.conv.81,然后使用权重文件 yolov3.conv.81 而不是 darknet53.conv.74 进行训练。

  • each: model of object, side, illumination, scale, each 30 grad of the turn and inclination angles - these are different objects from an internal perspective of the neural network. So the more different objects you want to detect, the more complex network model should be used.
    每个:目标样式,侧面,光照,比例,转角和倾斜角度各 30 度,从神经网络的内部角度来看,它们是不同的目标。因此,要检测的目标越多,应使用越复杂的网络模型。

inclination [ˌɪnklɪˈneɪʃn]:n. 倾向,爱好,斜坡
grad [ɡræd]:n. 毕业生
  • to make the detected bounded boxes more accurate, you can add 3 parameters ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou to each [yolo] layer and train, it will increase mAP@0.9, but decrease mAP@0.5.
    为了使检测到的边界框更准确,您可以向每个 [yolo] 层添加 3 个参数 ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou 并进行训练,它将增加 mAP@0.9,但降低 mAP@0.5

  • Only if you are an expert in neural detection networks - recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)*<number of mask> before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.
    仅当您是神经检测网络专家时,为 width and height from cfg-file 重新计算数据集的 anchors:darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416,然后设置相同的 9 个 anchors 在 cfg 文件的 3 个 [yolo] 层中的每个层中。但是,您应该为每个 [yolo] 层更改 anchors masks= 的索引,以便第一个 [yolo] 层具有大于 60x60 的 anchors,第二个大于 30x30 的 anchors,剩下为第三层分配。同样,您应该在每个 [yolo] 层之前更改 filters=(classes + 5)*<number of mask>。如果许多计算出的 anchors 不适合在适当的 [yolo] 层下,则只需尝试使用所有默认 anchors 即可。

2. after training

  • Increase network-resolution by set in your .cfg-file (height=608 and width=608) or (height=832 and width=832) or (any value multiple of 32) - this increases the precision and makes it possible to detect small objects.
    通过在 .cfg 文件中设置 (height=608 and width=608) or (height=832 and width=832) or (任何 32 的倍数) 来提高网络分辨率,这可以提高精度,并可以检测到小尺寸物体。
# Testing
# Training
# batch=64
# subdivisions=16

it is not necessary to train the network again, just use .weights-file already trained for 416x416 resolution.
无需再次训练网络,只需使用已经针对 416x416 分辨率进行训练的 .weights 文件。

but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64.
但是要获得更高的精度,您应该使用更高分辨率的 608x608 或 832x832 进行训练,请注意:如果发生错误 内存不足,则在 .cfg 文件中,应增加 subdivisions=16, 32 or 64。

发布了416 篇原创文章 · 获赞 1602 · 访问量 97万+


©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客