YoloV3：An Incremental Improvement 学习笔记

最新推荐文章于 2024-04-24 15:53:38 发布

三分明月落

最新推荐文章于 2024-04-24 15:53:38 发布

阅读量747

点赞数

本文链接：https://blog.csdn.net/qq_40755643/article/details/93381239

版权

yolov3相较于v1有了较大的改进，和v2的改进则较小。不明白可以移步下面的链接。

yolov1:YOLOv1学习笔记

yolov2:YOLOv2学习笔记

我们主要对比v2和v3的差别。

	Batch Normalization	Anchor Boxes	Dimension Clusters	Direct location prediction	Fine-Grained Features	Backbone
yolov2	yes	5 for cell	nums=5	yes	2 lays	Darknet-19
yolov3	yes	3 for cell	nums=9	yes	3 lays	Darknet-53

1.Multscale

YOLO2曾采用passthrough结构来检测细粒度特征(Fine-Grained Features）其实就是多尺度特征融合，添加了一个直通层（passthrough layer），将前面一层的26*26的特征图和最后一层13*13的特征图进行连接，将前面更高分辨率的特征图为输入，然后将其连接到后面的低分辨率特征图上。

在13*13的特征图上做预测，虽然对于大目标已经足够了，但对小目标不一定足够好，这里合并前面大一点的特征图可以有效的检测小目标。具体操作：对于26*26*512的特征图，经passthrough层处理之后就变成了13*13*2048的新特征图（特征图大小变为1/4，而通道数变为以前的4倍），然后与后面的13*13*1024特征图连接在一起形成13*13*3072的特征图，最后在该特征图上卷积做预测。

在YOLO3采用了3个不同尺度的特征图来进行对象检测。

由上图，

卷积网络在79层后，经过下方几个黄色的卷积层得到一种尺度的检测结果。这里用于检测的特征图有32倍的下采样。比如输入是416*416的话，这里的特征图就是13*13了。由于下采样倍数高，它具有较大尺度感受野，适合检测尺寸比较大的对象。
为了实现细粒度的检测，第79层的特征图又开始作上采样，然后与第61层特征图融合（Concatenation），这样得到第91层较细粒度的特征图，同样经过几个卷积层后得到相对输入图像16倍下采样的特征图26*26，它具有中等尺度的感受野，适合检测中等尺度的对象。
最后，第91层特征图再次上采样，并与第36层特征图融合（Concatenation），最后得到相对输入图像8倍下采样的特征图52*52。它的感受野最小，适合检测小尺寸的对象。

2.Dimension Clusters

YOLOv3延续了YOLOv2采用的K-means聚类得到先验框的尺寸，总共聚类出9种尺寸的先验框。

k-means的方式对训练集的bounding boxes（就是groundTruth）做聚类，试图找到合适的anchor box。传统的K-means聚类方法使用的是欧氏距离函数，也就意味着较大的boxes会比较小的boxes产生更多的error，聚类结果可能会偏离。为此，作者采用的评判标准是IOU得分（也就是boxes之间的交集除以并集），这样的话，error就和box的尺度无关了，最终的距离函数为：

在COCO数据集这9个先验框是：(10x13)，(16x30)，(33x23)，(30x61)，(62x45)，(59x119)，(116x90)，(156x198)，(373x326)。

3.Anchor Boxes

在YOLOv2中，由维度聚类得到5个先验框大的小，所以特征图的每个cell设定5个anchor boxes。

在YOLOv3中，为每种下采样尺度设定3种先验框，即是特征图的每个cell设定3个anchor boxes。

最小的13*13特征图上（有最大的感受野）应用较大的先验框(116x90)，(156x198)，(373x326)，适合检测较大的对象。
中等的26*26特征图上（有中等的感受野）应用中等的先验框(30x61)，(62x45)，(59x119)，适合检测中等大小的对象。
较大的52*52特征图上（有较小的感受野）应用较小的先验框(10x13)，(16x30)，(33x23)，适合检测较小的对象。

4.Direct location prediction

YOLOv3沿用了YOLOv2的方法，就是预测边界框中心点相对于对应cell左上角位置的相对偏移值。

为了将bounding box的中心点约束在当前cell中，使用sigmoid函数将tx、ty归一化处理，将值约束在0~1，也就是上图的：σ(tx)，σ(ty)这使得模型训练更稳定。

在判断anchor box的正负时，每个ground truth box只会选用一个与它有最大IOU的anchor box作为正样本，而不使用超过threshold（一般为0.5）值的anchor box也视为正样本框的策略。这一点是它与SSD或Faster-RCNN系列模型不同的地方。YOLOv3只为每个真实对象分配一个边界框，如果先验边界框与真实对象不吻合，则不会产生坐标或类别预测损失，只会产生物体预测损失。

5.Backbone

YOLOv2采用Darknet-19作为骨干网络，网络包含19个卷积层和5个max pooling层，而在YOLOv1中采用的GooleNet，包含24个卷积层和2个全连接层，因此Darknet-19整体上卷积卷积操作比YOLOv1中用的GoogleNet要少，这是计算量减少的关键。最后用average pooling层代替全连接层进行预测。

YOLO3采用Darknet-53为骨干网络（含有53个卷积层），它借鉴了残差网络residual network的做法，在一些层之间设置了快捷链路（shortcut connections）。

6.Detection

预测对象类别时不使用softmax，改成使用logistic的输出进行预测。隐含地视所有的bounding boxes只可能属于一类物体。这样能够支持多标签对象（比如一个人有Woman 和 Person两个标签）。使用 binary cross-entropy作为损失函数。

对于一个输入图像，YOLO3将其映射到3个尺度的输出张量，代表图像各个位置存在各种对象的概率。

对于一个416*416的输入图像，在每个尺度的特征图的每个网格设置3个先验框，总共有 13*13*3 + 26*26*3 + 52*52*3 = 10647 个预测。每一个预测是一个(4+1+80)=85维向量，这个85维向量包含边框坐标（4个数值），边框置信度（1个数值），对象类别的概率（对于COCO数据集，有80种对象）。

对比一下，YOLO2采用13*13*5 = 845个预测，YOLO3的尝试预测边框数量增加了10多倍，而且是在不同分辨率上进行，所以mAP以及对小物体的检测效果有一定的提升。

7.源码分析

待续。。。。。。。。。。。。。。。

8.训练自己的YOLOv3

一、官网版本（ c++）

本文用的是yolov3-tiny。

下载yolov3项目工程以及yolov3-tiny预训练权重。官网：https://pjreddie.com/darknet/yolo/

然后按照官网的步骤：

下载权重：

wget https://pjreddie.com/media/files/yolov3-tiny.weights

下载模型文件：

git clone https://github.com/pjreddie/darknet

修改darknet工程中的Makefile文件，这里没有装opencv的，不用到摄像头可以关掉，不然make会报错。

GPU=1 #用GPU训练则为1否则则为0，以下同理
CUDNN=1
OPENCV=1
OPENMP=0
DEBUG=0

ARCH= -gencode arch=compute_30,code=sm_30 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52]
#      -gencode arch=compute_20,code=[sm_20,sm_21] \ This one is deprecated?

make文件

cd darknet
make

准备自己的数据集。在scripts文件夹下创建文件夹VOCdevkit，如下：

VOCdevkit 
——VOC2019 
————Annotations #放入所有的xml文件 
————ImageSets 
——————Main #放入train.txt,val.txt文件 
————JPEGImages #放入所有的图片文件 
Main中的文件分别表示train.txt是训练集，val.txt是验证集

Main中的txt文件是你的数据文件名，train.txt里写训练图片的文件名，val.txt里写验证图片的文件名。用下面的代码可以实现

import os
from os import listdir, getcwd
from os.path import join
import random

random.random()


if __name__ == '__main__':
    source_folder = '/darknet/scripts/VOCdevkit/VOC2019/JPEGImages/'

    dest = '/darknet/scripts/VOCdevkit/VOC2019/ImageSets/Main/train.txt'

    dest2 = '/darknet/scripts/VOCdevkit/VOC2019/ImageSets/Main/val.txt'

    file_list = os.listdir(source_folder)
    train_file = open(dest, 'a')
    val_file = open(dest2, 'a')

    train_index = 0
    valid_index = 0

    for file_obj in file_list:

        file_path = os.path.join(source_folder, file_obj)

        file_name, file_extend = os.path.splitext(file_obj)

        if random.random()>0.3:
            train_file.write(file_name + '\n')
            train_index += 1
        else:
            val_file.write(file_name + '\n')
            valid_index +=1

    train_file.close()

val_file.close()

print(train_index)
print(valid_index)

修改voc_label.py，这个文件就是根据Main中txt里的文件名，生成相应的txt，里面存放的是它们的路径

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

sets=[('2019', 'train'), ('2019', 'val'), ] #与你的文件名要对应

classes = ["food","no_food"]                #与你的xml文件的标签要对应

def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(year, image_id):
    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
    out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
    if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
        os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
    image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
        convert_annotation(year, image_id)
    list_file.close()

os.system("cat 2019_train.txt 2019_val.txt > train.txt") #将生成的文件合并，所以很据需要修改，名字也要对应

运行voc_label.py，运行后会生成2019_train.txt、2019_val.txt、train.txt

修改cfg/voc.data的参数配置

classes= 2    #类别总数
train  = /darknet/scripts/2019_train.txt            #训练样本的路径
valid  = /darknet/scripts/2019_test.txt     #验证样本路径       
names = data/voc.names                              #类别名字文件路径
backup = backup                                     #模型保存路径

修改data/voc.name，按照的xml的label类别写

food
no_food

修改cfg/yolov3-tiny.cfg也就是模型的配置文件（重点）

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=32            #显卡不好不要跟源码64，不然内存溢出
subdivisions=16     #batch改小，subdivisions得相应调大

width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 10000
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=21        #修改为3*（classes+5）即3*（2+5）=21
activation=linear



[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1            #如果内存较小，将random设置为0，关闭多尺度训练

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=21        #修改为3*（classes+5）即3*（2+5）=21
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1            #如果内存较小，将random设置为0，关闭多尺度训练

加载预训练模型，Get pre-trained weights yolov3-tiny.conv.15 using command：

./darknet partial ./cfg/yolov3-tiny.cfg ./yolov3-tiny.weights ./yolov3-tiny.conv.15 15

你就会发现根目录下多了一个预训练权重文件：

X:~/darknet$ ./darknet partial ./cfg/yolov3-tiny.cfg ./yolov3-tiny.weights ./yolov3-tiny.conv.15 15
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv     21  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x  21  0.004 BFLOPs
   16 yolo
   17 route  13
   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8
   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv     21  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x  21  0.007 BFLOPs
   23 yolo
Loading weights from ./yolov3-tiny.weights...Done!
Saving weights to ./yolov3-tiny.conv.15

开始训练。

sudo ./darknet detector train cfg/voc.data cfg/yolov3-tiny.cfg yolov3-tiny.conv.15

测试。

./darknet detector test cfg/voc.data cfg/yolov3-tiny.cfg backup/yolov3-tiny_final.weights data/hand.jpg

结果。

二、Pytorch版本

我们选用的版本为：https://github.com/ultralytics/yolov3

准备数据集，将Annotations（xml）和JPEGImages（jpg）放入data目录下，并新建文件夹ImageSets，labels，复制JPEGImages，重命名images

运行根目录下makeTxt.py，将数据分成训练集train，测试集test和验证集val，其中比例可以在代码设置。train.txt里写训练图片的文件名，val.txt里写验证图片的文件名，代码如下

import os
import random

trainval_percent = 0.2
train_percent = 0.8
xmlfilepath = 'data/Annotations'
txtsavepath = 'data/ImageSets'
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open('data/ImageSets/trainval.txt', 'w')
ftest = open('data/ImageSets/test.txt', 'w')
ftrain = open('data/ImageSets/train.txt', 'w')
fval = open('data/ImageSets/val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftest.write(name)
        else:
            fval.write(name)
    else:
        ftrain.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

运行根目录下voc_label.py，得到labels的具体内容以及data目录下的train.txt，test.txt，val.txt，这里的train.txt与之前的区别在于，不仅仅得到文件名，还有文件的具体路径。

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

sets = ['train', 'test', 'val']

classes = ["food","no_food"]  # 我们只是检测手上有无东西，因此只有一个类别


def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


def convert_annotation(image_id):
    in_file = open('data/Annotations/%s.xml' % (image_id))
    out_file = open('data/labels/%s.txt' % (image_id), 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')


wd = getcwd()
print(wd)
for image_set in sets:
    if not os.path.exists('data/labels/'):
        os.makedirs('data/labels/')
    image_ids = open('data/ImageSets/%s.txt' % (image_set)).read().strip().split()
    list_file = open('data/%s.txt' % (image_set), 'w')
    for image_id in image_ids:
        list_file.write('data/images/%s.jpg\n' % (image_id))
        convert_annotation(image_id)
    list_file.close()

在data目录下我们可以看到：

修改data/voc.data的参数配置

classes= 2
train=data/train.txt
valid=data/test.txt
names=data/voc.names
backup=backup/
eval=coco

修改data/voc.name，按照的xml的label类别写

food
no_food

下载模型并fine-tune，下载官网的代码https://github.com/pjreddie/darknet，运行一下脚本，并将得到的yolov3-tiny.conv.15导入weights目录下，脚本如下

wget https://pjreddie.com/media/files/yolov3-tiny.weights
./darknet partial ./cfg/yolov3-tiny.cfg ./yolov3-tiny.weights ./yolov3-tiny.conv.15 15

开始训练。

python train.py --data-cfg data/voc.data --cfg cfg/yolov3-tiny.cfg

100 epochs completed in 0.375 hours.
               Class    Images   Targets         P         R       mAP        F1
Computing mAP: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 12.26it/s]
                 all       128       128     0.964     0.992     0.991     0.977
                food       128        64         1     0.984     0.984     0.992
             no_food       128        64     0.928         1     0.997     0.962

测试。

python detect.py --data-cfg data/voc.data --cfg cfg/yolov3-tiny.cfg --weights weights/best.pt

结果。

Reference：

https://pjreddie.com/media/files/papers/YOLOv3.pdf

https://www.jianshu.com/p/d13ae1055302

三分明月落

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
YoloV3：An Incremental Improvement 学习笔记

yolov3相较于v1有了较大的改进，和v2的改进则较小。不明白可以移步下面的链接。yolov1:YOLOv1学习笔记yolov2:YOLOv2学习笔记我们主要对比v2和v3的差别。 Batch Normalization Anchor Boxes Dimension Clusters Direct location prediction Fi...
复制链接

扫一扫