使用coco数据集训练自己的yolo3网络_yolov3采用的是 coco 数据集微调的训练策略-CSDN博客

本文链接：https://blog.csdn.net/tugouxp/article/details/120020431

coco数据集的全称是MS COCO(Microsoft Common Objects in Context), 起源于微软2014年出资标注的Microsoft COCO数据集，COCO是当前目标识别，检测领域最重要，最权威的一个标杆。

搭建好后，用预训练好的权重测试推理功能：

现在我们用coco数据库训练自己的网络：

获取coco数据

要下载COCO数据和标签，darknet下scripts/get_coco_dataset.sh就是干这个的

执行如下命令：

$cp scripts/get_coco_dataset.sh data
$cd data
$bash get_coco_dataset.sh

查看脚本，实际上执行的工作很简单，测试集和训练集图片文件非常大，如果嫌wget过慢，可以在下载工具中输入其中的链接单独下载，不影响我们的目的

#!/bin/bash

# Clone COCO API
git clone https://github.com/pdollar/coco
cd coco

mkdir images
cd images

# Download Images
wget -c https://pjreddie.com/media/files/train2014.zip
wget -c https://pjreddie.com/media/files/val2014.zip

# Unzip
unzip -q train2014.zip
unzip -q val2014.zip

cd ..

# Download COCO Metadata
wget -c https://pjreddie.com/media/files/instances_train-val2014.zip
wget -c https://pjreddie.com/media/files/coco/5k.part
wget -c https://pjreddie.com/media/files/coco/trainvalno5k.part
wget -c https://pjreddie.com/media/files/coco/labels.tgz
tar xzf labels.tgz
unzip -q instances_train-val2014.zip

# Set Up Image Lists
paste <(awk "{print \"$PWD\"}" <5k.part) 5k.part | tr -d '\t' > 5k.txt
paste <(awk "{print \"$PWD\"}" <trainvalno5k.part) trainvalno5k.part | tr -d '\t' > trainvalno5k.txt

完成后，修改数据配置文件和网络配置文件，数据配置文件指向我们新下载的数据集，而网络配置文件告诉环境是训练而非推理，YOLOV3能识别80种类别的物体，如下图：

caozilong@caozilong-Vostro-3268:/media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet$ git diff
diff --git a/cfg/coco.data b/cfg/coco.data
index 3003841..6391e06 100644
--- a/cfg/coco.data
+++ b/cfg/coco.data
@@ -1,8 +1,8 @@
 classes= 80
-train  = /home/pjreddie/data/coco/trainvalno5k.txt
+train  = /media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet/data/coco/trainvalno5k.txt
 valid  = coco_testdev
 #valid = data/coco_val_5k.list
 names = data/coco.names
-backup = /home/pjreddie/backup/
+backup = /media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet/backup/
 eval=coco
 
diff --git a/cfg/yolov3.cfg b/cfg/yolov3.cfg
index 938ffff..ae0f4dc 100644
--- a/cfg/yolov3.cfg
+++ b/cfg/yolov3.cfg
@@ -4,7 +4,7 @@
 # subdivisions=1
 # Training
 batch=64
-subdivisions=16
+subdivisions=8
 width=608
 height=608
 channels=3
caozilong@caozilong-Vostro-3268:/media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet$

backup目录是存放训练过程中的权重文件的目录，训练前为空.

下载预训练权重

我们不从头训练，选择一个预训练好的权重文件

wget https://pjreddie.com/media/files/darknet53.conv.74

执行训练：

执行训练的命令是

./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74

root@caozilong-Vostro-3268:/media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet# ./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74
yolov3
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32  0.639 BFLOPs
    1 conv     64  3 x 3 / 2   608 x 608 x  32   ->   304 x 304 x  64  3.407 BFLOPs
    2 conv     32  1 x 1 / 1   304 x 304 x  64   ->   304 x 304 x  32  0.379 BFLOPs
    3 conv     64  3 x 3 / 1   304 x 304 x  32   ->   304 x 304 x  64  3.407 BFLOPs
    4 res    1                 304 x 304 x  64   ->   304 x 304 x  64
    5 conv    128  3 x 3 / 2   304 x 304 x  64   ->   152 x 152 x 128  3.407 BFLOPs
    6 conv     64  1 x 1 / 1   152 x 152 x 128   ->   152 x 152 x  64  0.379 BFLOPs
    7 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128  3.407 BFLOPs
    8 res    5                 152 x 152 x 128   ->   152 x 152 x 128
    9 conv     64  1 x 1 / 1   152 x 152 x 128   ->   152 x 152 x  64  0.379 BFLOPs
   10 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128  3.407 BFLOPs
   11 res    8                 152 x 152 x 128   ->   152 x 152 x 128
   12 conv    256  3 x 3 / 2   152 x 152 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   13 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   14 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   15 res   12                  76 x  76 x 256   ->    76 x  76 x 256
   16 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   17 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   18 res   15                  76 x  76 x 256   ->    76 x  76 x 256
   19 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   20 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   21 res   18                  76 x  76 x 256   ->    76 x  76 x 256
   22 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   23 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   24 res   21                  76 x  76 x 256   ->    76 x  76 x 256
   25 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   26 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   27 res   24                  76 x  76 x 256   ->    76 x  76 x 256
   28 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   29 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   30 res   27                  76 x  76 x 256   ->    76 x  76 x 256
   31 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   32 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   33 res   30                  76 x  76 x 256   ->    76 x  76 x 256
   34 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   35 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   36 res   33                  76 x  76 x 256   ->    76 x  76 x 256
   37 conv    512  3 x 3 / 2    76 x  76 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   38 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   39 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   40 res   37                  38 x  38 x 512   ->    38 x  38 x 512
   41 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   42 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   43 res   40                  38 x  38 x 512   ->    38 x  38 x 512
   44 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   45 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   46 res   43                  38 x  38 x 512   ->    38 x  38 x 512
   47 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   48 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   49 res   46                  38 x  38 x 512   ->    38 x  38 x 512
   50 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   51 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   52 res   49                  38 x  38 x 512   ->    38 x  38 x 512
   53 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   54 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   55 res   52                  38 x  38 x 512   ->    38 x  38 x 512
   56 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   57 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   58 res   55                  38 x  38 x 512   ->    38 x  38 x 512
   59 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   60 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   61 res   58                  38 x  38 x 512   ->    38 x  38 x 512
   62 conv   1024  3 x 3 / 2    38 x  38 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   63 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   64 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   65 res   62                  19 x  19 x1024   ->    19 x  19 x1024
   66 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   67 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   68 res   65                  19 x  19 x1024   ->    19 x  19 x1024
   69 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   70 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   71 res   68                  19 x  19 x1024   ->    19 x  19 x1024
   72 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   73 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   74 res   71                  19 x  19 x1024   ->    19 x  19 x1024
   75 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   76 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   77 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   78 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   79 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   80 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   81 conv    255  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 255  0.189 BFLOPs
   82 yolo
   83 route  79
   84 conv    256  1 x 1 / 1    19 x  19 x 512   ->    19 x  19 x 256  0.095 BFLOPs
   85 upsample            2x    19 x  19 x 256   ->    38 x  38 x 256
   86 route  85 61
   87 conv    256  1 x 1 / 1    38 x  38 x 768   ->    38 x  38 x 256  0.568 BFLOPs
   88 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   89 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   90 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   91 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   92 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   93 conv    255  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 255  0.377 BFLOPs
   94 yolo
   95 route  91
   96 conv    128  1 x 1 / 1    38 x  38 x 256   ->    38 x  38 x 128  0.095 BFLOPs
   97 upsample            2x    38 x  38 x 128   ->    76 x  76 x 128
   98 route  97 36
   99 conv    128  1 x 1 / 1    76 x  76 x 384   ->    76 x  76 x 128  0.568 BFLOPs
  100 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
  101 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
  102 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
  103 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
  104 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
  105 conv    255  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 255  0.754 BFLOPs
  106 yolo
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
512
Loaded: 0.421137 seconds
Region 82 Avg IOU: 0.201348, Class: 0.455102, Obj: 0.244645, No Obj: 0.390294, .5R: 0.062500, .75R: 0.000000,  count: 16
Region 94 Avg IOU: 0.120858, Class: 0.537060, Obj: 0.543120, No Obj: 0.518701, .5R: 0.038462, .75R: 0.000000,  count: 26
Region 106 Avg IOU: 0.165164, Class: 0.496659, Obj: 0.426623, No Obj: 0.418840, .5R: 0.020833, .75R: 0.000000,  count: 48

在没有GPU的平台上，这个训练要花费很长很长的时间, region表示网络的层数，region 106表示网络的层数，例如，上面的最后一行，表示的是第106层的输出。

数据分析

经过上面的操作后，coco数据集目录变成下面的样子：

比较重要的目录有images目录和labels目录，其中，images/train2014包含的是全部的训练集图片：

训练集包含13G，共计82783张照片，以其中的一个图片文件为例，查看其内容：eog COCO_train2014_000000387558.jpg

images/val2014包含验证集的图片，共计6.3G. 40504张照片, 查看其中一张的内容eog COCO_val2014_000000485307.jpg

由于深度学习属于监督学习的一种，训练集和测试集都要有对应的标签，labels/train2014即为训练集标签，直观体验一下它的内容

YOLO采用中心坐标的形式画框，标签的格式为：
class_id x y w h

其中，class_id表示类别ID的编号，标签的第一列是类别，第二列应该是归一化后的坐标.

x=x_center/width

y=y_center/height

w=(xmax-xmin)/width

h=(ymax-ymin)/height

以滑冰男人的图像为例，它的内容是vim COCO_train2014_000000387558.txt

我们可以根据上面的公式，计算实际物体的像素坐标

比如，以class id为0的object为例,图片信息为：

则

x_center = x * width = 0x381617 * 640 = 244 pixels

y_center = y * height = 0.534906 * 425 = 227 pixels

xmax-xmin = w * width = 0.270672 * 640 = 173 pixels

ymax-ymin = h * height = 0.586706 * 425 = 249 pixels

联立：

xmax+xmin = 2 * 244 = 488 =>xmax = 330, xmin=158

ymax+ymin = 2 * 227 = 454 =>ymax=352, ymin=102

得到对角线坐标分别为（158，102）(330,352)

同理，物体36：

x_center = x * width = 0.308930 * 640 = 198 pixels

y_center = y * height = 0.888165 * 425 = 377 pixels

xmax-xmin = w * width = 0.169859 * 640 = 109 pixels

ymax-ymin = h * height = 0.130729 * 425 = 56 pixels

联立：

xmax+xmin = 2 * 198 = 396 =>xmax = 253, xmin=143

ymax+ymin = 2 * 377 = 754 =>ymax=405, ymin=349

得到对角线坐标分别为（143，349）(253, 405)

我们使用下面的代码把它绘制出来：

import cv2
 
fname = './COCO_train2014_000000387558.jpg'
img = cv2.imread(fname)

pt1 = (158, 102)
pt2 = (330, 352)
cv2.rectangle(img, pt1, pt2, (0, 255, 0), 2)

pt3 = (143, 349)
pt4 = (253, 405)
cv2.rectangle(img, pt3, pt4, (255, 0, 0), 2)

cv2.imwrite('22.jpg', img)

绘制出来的效果如下,可以印证，我们对label文件中坐标的理解以及算法推演是正确的。