Darknet - How to improve object detection? - 如何提升物体检测？

最新推荐文章于 2024-07-02 19:48:21 发布

Yongqiang Cheng

最新推荐文章于 2024-07-02 19:48:21 发布

阅读量1.2k

点赞数 2

分类专栏： Darknet 文章标签： Darknet

世上没有白读的书，每一页都算数。

本文链接：https://blog.csdn.net/chengyq116/article/details/104001286

版权

Darknet 专栏收录该内容

40 篇文章 2 订阅

订阅专栏

Darknet - How to improve object detection? - 如何提升物体检测？

https://github.com/AlexeyAB/darknet

1. before training

set flag random=1 in your .cfg-file - it will increase precision by training YOLO for different resolutions.
在 .cfg 文件中设置标志 random = 1，通过对 YOLO 进行不同分辨率的训练，它将提高精度。
increase network resolution in your .cfg-file (height=608, width=608 or any value multiple of 32) - it will increase precision.
增加 .cfg 文件中的网络分辨率 (height=608, width=608 或 32 的任意倍数)，它将提高精度。
check that each object that you want to detect is mandatory labeled in your dataset - no one object in your data set should not be without label. In the most training issues - there are wrong labels in your dataset (got labels by using some conversion script, marked with a third-party tool, …). Always check your dataset by using: https://github.com/AlexeyAB/Yolo_mark
检查您要检测的每个目标是否在数据集中被强制标记，数据集中的任何目标都不应没有标签。在大多数训练问题中，数据集中有错误的标签 (通过使用某些转换脚本，使用第三方工具标记的标签来获得标签，…)。

mandatory [ˈmændətəri; mænˈdeɪtəri]：adj. 强制的，托管的，命令的 n. 受托者
mandatary ['mændət(ə)rɪ]：n. 委托人，受委任统治国

my Loss is very high and mAP is very low, is training wrong? Run training with -show_imgs flag at the end of training command, do you see correct bounded boxes of objects (in windows or in files aug_...jpg)? If no - your training dataset is wrong.
我的 Loss 非常高，而 mAP 却很低，训练错了吗？在训练命令末尾使用 -show_imgs 标志运行训练，您是否看到正确的 bounded boxes (in windows or in files aug_...jpg)？如果否，您的训练数据集有误。
for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at different: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more.
对于您要检测的每个目标，训练数据集中必须至少有一个相似的目标，它们具有大致相同的情形：形状、目标的侧面、相对大小、旋转角度、倾斜度、光照。理想条件是，您的训练数据集包含不同情形下目标的图像：比例、旋转、照明、来自不同侧面、处于不同背景的图像。对于每个类别，您最好拥有 2000 张或者更多不同的图像，并且应该训练 2000*classes 或者更多的迭代。
desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects.
希望您的训练数据集包含带有不想检测的未标记目标的图像，无 bounded box 的负样本 (空的 .txt 文件)，使用与带有目标的图像一样多的负样本图像。
What is the best way to mark objects: label only the visible part of the object, or label the visible and overlapped part of the object, or label a little more than the entire object (with a little gap)? Mark as you like - how would you like it to be detected.
标记目标的最佳方法是什么：仅标记目标的可见部分，或标记目标的可见和重叠部分，或标记比整个目标多一点 (有一点间隙)？根据您希望如何检测它的进行标记。
for training with a large number of objects in each image, add the parameter max=200 or higher value in the last [yolo]-layer or [region]-layer in your cfg-file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375*(width*height) where are width and height are parameters from [net] section in cfg-file)。
要使用包含大量目标的图像进行训练，请在 cfg 文件的最后一个 [yolo] 层或 [region] 层中添加参数 max=200 或更高的值 (全局最大值 YoloV3 可以检测到的目标数是 0,0615234375*(width*height)，其中 width 和 height 是 cfg 文件中 [net] 的参数)。
for training for small objects (smaller than 16 x 16 after the image is resized to 416 x 416) - set layers = -1, 11 instead of darknet/cfg/yolov3.cfg - L720 - layers = -1, 36 and set stride=4 instead of darknet/cfg/yolov3.cfg - L717 - stride=2
用于训练小物体。
for training for both small and large objects use modified models:
对于小尺寸和大尺寸物体的训练，请使用修改后的模型：
Full-model: 5 yolo layers: AlexeyAB/darknet/master/cfg/yolov3_5l.cfg
Tiny-model: 3 yolo layers: AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg
Spatial-full-model: 3 yolo layers: AlexeyAB/darknet/master/cfg/yolov3-spp.cfg
If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right-turn on road signs, …) then for disabling flip data augmentation - add flip=0 here: darknet/cfg/yolov3.cfg - L17
如果训练模型以将左目标和右目标区分为单独的类 (左/右手，左/右转道路标志，…)，则禁用翻转数据增强，在此处添加 flip=0：darknet/cfg/yolov3.cfg - L17
General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
您的训练数据集应包括您要检测的一组相对大小的目标：

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size:
对于来自 Test 数据集的每个目标，Training 数据集中必须至少有一个目标具有相同的 class_id 且相对大小大约相同：

object width in percent from Training dataset ~= object width in percent from Test dataset

That is, if only objects that occupied 80-90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1-10% of the image.
也就是说，如果在训练集中仅存在占据图像的 80-90% 的目标，则训练的网络将不能检测占据图像的 1-10% 的目标。

to speedup training (with decreasing detection accuracy) do Fine-Tuning instead of Transfer-Learning, set param stopbackward=1 here: darknet/cfg/yolov3.cfg - L548 then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81, then train by using weights file yolov3.conv.81 instead of darknet53.conv.74.
为了加快训练速度（降低检测精度），请执行微调 (Fine-Tuning) 而不是迁移学习 (Transfer-Learning)，请在此处设置参数 stopbackward=1：darknet/cfg/yolov3.cfg - L548，然后执行以下命令：./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 将被创建为文件 yolov3.conv.81，然后使用权重文件 yolov3.conv.81 而不是 darknet53.conv.74 进行训练。
each: model of object, side, illumination, scale, each 30 grad of the turn and inclination angles - these are different objects from an internal perspective of the neural network. So the more different objects you want to detect, the more complex network model should be used.
每个：目标样式，侧面，光照，比例，转角和倾斜角度各 30 度，从神经网络的内部角度来看，它们是不同的目标。因此，要检测的目标越多，应使用越复杂的网络模型。

inclination [ˌɪnklɪˈneɪʃn]：n. 倾向，爱好，斜坡
grad [ɡræd]：n. 毕业生

to make the detected bounded boxes more accurate, you can add 3 parameters ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou to each [yolo] layer and train, it will increase mAP@0.9, but decrease mAP@0.5.
为了使检测到的边界框更准确，您可以向每个 [yolo] 层添加 3 个参数 ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou 并进行训练，它将增加 mAP@0.9，但降低 mAP@0.5。
Only if you are an expert in neural detection networks - recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)*<number of mask> before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.
仅当您是神经检测网络专家时，为 width and height from cfg-file 重新计算数据集的 anchors：darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416，然后设置相同的 9 个 anchors 在 cfg 文件的 3 个 [yolo] 层中的每个层中。但是，您应该为每个 [yolo] 层更改 anchors masks= 的索引，以便第一个 [yolo] 层具有大于 60x60 的 anchors，第二个大于 30x30 的 anchors，剩下为第三层分配。同样，您应该在每个 [yolo] 层之前更改 filters=(classes + 5)*<number of mask>。如果许多计算出的 anchors 不适合在适当的 [yolo] 层下，则只需尝试使用所有默认 anchors 即可。

2. after training

Increase network-resolution by set in your .cfg-file (height=608 and width=608) or (height=832 and width=832) or (any value multiple of 32) - this increases the precision and makes it possible to detect small objects.
通过在 .cfg 文件中设置 (height=608 and width=608) or (height=832 and width=832) or (任何 32 的倍数) 来提高网络分辨率，这可以提高精度，并可以检测到小尺寸物体。

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
......

it is not necessary to train the network again, just use .weights-file already trained for 416x416 resolution.
无需再次训练网络，只需使用已经针对 416x416 分辨率进行训练的 .weights 文件。

but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64.
但是要获得更高的精度，您应该使用更高分辨率的 608x608 或 832x832 进行训练，请注意：如果发生错误 内存不足，则在 .cfg 文件中，应增加 subdivisions=16, 32 or 64。