Darknet - How to improve object detection? - 如何提升物体检测?
https://github.com/AlexeyAB/darknet
1. before training
-
set flag
random=1
in your.cfg
-file - it will increase precision by training YOLO for different resolutions.
在.cfg
文件中设置标志random = 1
,通过对 YOLO 进行不同分辨率的训练,它将提高精度。 -
increase network resolution in your
.cfg
-file (height=608
,width=608
or any value multiple of 32) - it will increase precision.
增加.cfg
文件中的网络分辨率 (height=608
,width=608
或 32 的任意倍数),它将提高精度。 -
check that each object that you want to detect is mandatory labeled in your dataset - no one object in your data set should not be without label. In the most training issues - there are wrong labels in your dataset (got labels by using some conversion script, marked with a third-party tool, …). Always check your dataset by using: https://github.com/AlexeyAB/Yolo_mark
检查您要检测的每个目标是否在数据集中被强制标记,数据集中的任何目标都不应没有标签。在大多数训练问题中,数据集中有错误的标签 (通过使用某些转换脚本,使用第三方工具标记的标签来获得标签,…)。
mandatory [ˈmændətəri; mænˈdeɪtəri]:adj. 强制的,托管的,命令的 n. 受托者
mandatary ['mændət(ə)rɪ]:n. 委托人,受委任统治国
-
my Loss is very high and mAP is very low, is training wrong? Run training with
-show_imgs
flag at the end of training command, do you see correct bounded boxes of objects (in windows or in filesaug_...jpg
)? If no - your training dataset is wrong.
我的 Loss 非常高,而 mAP 却很低,训练错了吗?在训练命令末尾使用-show_imgs
标志运行训练,您是否看到正确的 bounded boxes (in windows or in filesaug_...jpg
)?如果否,您的训练数据集有误。 -
for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at different: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more.
对于您要检测的每个目标,训练数据集中必须至少有一个相似的目标,它们具有大致相同的情形:形状、目标的侧面、相对大小、旋转角度、倾斜度、光照。理想条件是,您的训练数据集包含不同情形下目标的图像:比例、旋转、照明、来自不同侧面、处于不同背景的图像。对于每个类别,您最好拥有 2000 张或者更多不同的图像,并且应该训练 2000*classes 或者更多的迭代。 -
desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty
.txt
files) - use as many images of negative samples as there are images with objects.
希望您的训练数据集包含带有不想检测的未标记目标的图像,无 bounded box 的负样本 (空的.txt
文件),使用与带有目标的图像一样多的负样本图像。 -
What is the best way to mark objects: label only the visible part of the object, or label the visible and overlapped part of the object, or label a little more than the entire object (with a little gap)? Mark as you like - how would you like it to be detected.
标记目标的最佳方法是什么:仅标记目标的可见部分,或标记目标的可见和重叠部分,或标记比整个目标多一点 (有一点间隙)?根据您希望如何检测它的进行标记。 -
for training with a large number of objects in each image, add the parameter
max=200
or higher value in the last[yolo]
-layer or[region]
-layer in your cfg-file (the global maximum number of objects that can be detected by YoloV3 is0,0615234375*(width*height)
where are width and height are parameters from[net]
section in cfg-file)。
要使用包含大量目标的图像进行训练,请在 cfg 文件的最后一个[yolo]
层或[region]
层中添加参数max=200
或更高的值 (全局最大值 YoloV3 可以检测到的目标数是0,0615234375*(width*height)
,其中 width 和 height 是 cfg 文件中[net]
的参数)。 -
for training for small objects (smaller than 16 x 16 after the image is resized to 416 x 416) - set
layers = -1, 11
instead ofdarknet/cfg/yolov3.cfg - L720 - layers = -1, 36
and setstride=4
instead ofdarknet/cfg/yolov3.cfg - L717 - stride=2
用于训练小物体。 -
for training for both small and large objects use modified models:
对于小尺寸和大尺寸物体的训练,请使用修改后的模型:
Full-model: 5 yolo layers: AlexeyAB/darknet/master/cfg/yolov3_5l.cfg
Tiny-model: 3 yolo layers: AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg
Spatial-full-model: 3 yolo layers: AlexeyAB/darknet/master/cfg/yolov3-spp.cfg -
If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right-turn on road signs, …) then for disabling flip data augmentation - add
flip=0
here:darknet/cfg/yolov3.cfg - L17
如果训练模型以将左目标和右目标区分为单独的类 (左/右手,左/右转道路标志,…),则禁用翻转数据增强,在此处添加flip=0
:darknet/cfg/yolov3.cfg - L17
-
General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
您的训练数据集应包括您要检测的一组相对大小的目标:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height
I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size:
对于来自 Test 数据集的每个目标,Training 数据集中必须至少有一个目标具有相同的 class_id 且相对大小大约相同:
object width in percent from Training dataset ~= object width in percent from Test dataset
That is, if only objects that occupied 80-90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1-10% of the image.
也就是说,如果在训练集中仅存在占据图像的 80-90% 的目标,则训练的网络将不能检测占据图像的 1-10% 的目标。
-
to speedup training (with decreasing detection accuracy) do Fine-Tuning instead of Transfer-Learning, set param
stopbackward=1
here:darknet/cfg/yolov3.cfg - L548
then do this command:./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81
will be created fileyolov3.conv.81
, then train by using weights fileyolov3.conv.81
instead ofdarknet53.conv.74
.
为了加快训练速度 (降低检测精度),请执行微调 (Fine-Tuning) 而不是迁移学习 (Transfer-Learning),请在此处设置参数stopbackward=1
:darknet/cfg/yolov3.cfg - L548
,然后执行以下命令:./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81
将被创建为文件yolov3.conv.81
,然后使用权重文件yolov3.conv.81
而不是darknet53.conv.74
进行训练。 -
each:
model of object, side, illumination, scale, each 30 grad of the turn and inclination angles
- these are different objects from an internal perspective of the neural network. So the more different objects you want to detect, the more complex network model should be used.
每个:目标样式,侧面,光照,比例,转角和倾斜角度各 30 度,从神经网络的内部角度来看,它们是不同的目标。因此,要检测的目标越多,应使用越复杂的网络模型。
inclination [ˌɪnklɪˈneɪʃn]:n. 倾向,爱好,斜坡
grad [ɡræd]:n. 毕业生
-
to make the detected bounded boxes more accurate, you can add 3 parameters
ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou
to each[yolo]
layer and train, it will increase mAP@0.9, but decrease mAP@0.5.
为了使检测到的边界框更准确,您可以向每个[yolo]
层添加 3 个参数ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou
并进行训练,它将增加mAP@0.9
,但降低mAP@0.5
。 -
Only if you are an expert in neural detection networks - recalculate anchors for your dataset for
width
andheight
from cfg-file:darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416
then set the same 9anchors
in each of 3[yolo]
-layers in your cfg-file. But you should change indexes of anchorsmasks=
for each[yolo]
-layer, so that 1st-[yolo]
-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change thefilters=(classes + 5)*<number of mask>
before each[yolo]
-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.
仅当您是神经检测网络专家时,为width
andheight
from cfg-file 重新计算数据集的 anchors:darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416
,然后设置相同的 9 个 anchors 在 cfg 文件的 3 个[yolo]
层中的每个层中。但是,您应该为每个[yolo]
层更改 anchorsmasks=
的索引,以便第一个[yolo]
层具有大于 60x60 的 anchors,第二个大于 30x30 的 anchors,剩下为第三层分配。同样,您应该在每个[yolo]
层之前更改filters=(classes + 5)*<number of mask>
。如果许多计算出的 anchors 不适合在适当的[yolo]
层下,则只需尝试使用所有默认 anchors 即可。
2. after training
- Increase network-resolution by set in your
.cfg
-file (height=608 and width=608) or (height=832 and width=832) or (any value multiple of 32) - this increases the precision and makes it possible to detect small objects.
通过在.cfg
文件中设置 (height=608 and width=608) or (height=832 and width=832) or (任何 32 的倍数) 来提高网络分辨率,这可以提高精度,并可以检测到小尺寸物体。
[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
......
it is not necessary to train the network again, just use .weights
-file already trained for 416x416 resolution.
无需再次训练网络,只需使用已经针对 416x416 分辨率进行训练的 .weights
文件。
but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory
occurs then in .cfg
-file you should increase subdivisions=
16, 32 or 64.
但是要获得更高的精度,您应该使用更高分辨率的 608x608 或 832x832 进行训练,请注意:如果发生错误 内存不足
,则在 .cfg
文件中,应增加 subdivisions=
16, 32 or 64。