利用PaddleDetection自制自己的图像预测项目(一)

本文链接：https://blog.csdn.net/weixin_43134049/article/details/106242818

前言

这边呢是参考的这位富土康一号质检员的兄弟的博客点击跳转,进行的创建的,大家可以多多对照着进行paddledetection的使用
但使用的数据集和其不同,是自己标注了两天的数据集来做的,由于自己数据集和官方数据集的的不同,这里我会更加详细的介绍一下自己遇到的问题,和如何解决的.

先放个大力(成果)图:
单张图片识别:
在这里插入图片描述
通过摄像头识别:

来自官方的paddledetection简介:
飞桨推出的PaddleDetection是端到端目标检测开发套件，旨在帮助开发者更快更好地完成检测模型的训练、精度速度优化到部署全流程。PaddleDetection以模块化的设计实现了多种主流目标检测算法，并且提供了丰富的数据增强、网络组件、损失函数等模块，集成了模型压缩和跨平台高性能部署能力。目前基于PaddleDetection已经完成落地的项目涉及工业质检、遥感图像检测、无人巡检等多个领域。

paddledetection的github:下载地址
由于github下载速度国内有时候慢得吓人,这里再安利个大神做的github帮助下载的网站,直接把你要下载的项目网页链接输入进去就行点此进入
(没错你又赚了!还不记得给我点个赞,csdn帐号升级后我好换个博客壁纸)

在这里插入图片描述
我先交代一下我的开发环境: python版本:python3.7
IDE: pycharm
操作系统 window7x64
paddle版本: PaddlePaddle 1.7

注意: 在我使用模型预测的部分,pycharm内给了个弃用警告如下:
DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
所以说python3.8的朋友预测阶段估计会凉凉,

然后就是:已经安好了paddlepaddle的版本的,记得更新到最新版本,不想更新的看看github里paddledetection的要求如图,达到了就不用更新了
在这里插入图片描述
好的开始了!

一环境搭建

1.安装没安装好python环境的:Anaconda欢迎你
( 要求:Python 2 版本是 2.7.15+、Python 3 版本是 3.5.1+/3.6/3.7， pip/pip3 版本是 9.0.1+，Python 和 pip 均是 64 位版本，操作系统是 64 位操作系统。)

话说这里我还是推荐用pycharm之类的ide来打开,因为模块之间的跳转比较好,这样容易找到运行错误,当你搭建好后再用jupyter之类的调用

2.python中paddle模块安装:官方安装指南传送门点击进入

3.安装peddledetection所需要的模块以及配置环境变量

(不配置环境变量的话,peddledetection里的模块是找不到其它连接的模块的,当然你也可以在文件中使用os模块,不过文件有点多,)

3.1 linux上,
执行以下代码
(话说不要那么憨,文件夹记得cd对哈,参考这里:python如何从txt文件中批量pip安装包)
还有这个Linux添加PYTHONPATH方法

%cd PaddleDetection
!pip install -r requirements.txt

%env PYTHONPATH=.:$PYTHONPATH
%env CUDA_VISIBLE_DEVICES=0

3.2 这里对于window也就我这边的环境,多说一句:

pip install -r requirements.txt

失败了的话,自己打开从github上下载好的requirements.txt文件,按照里面的自己手动安装咯
在这里插入图片描述
然后呢就是在你自己的windows内设置环境变量路径的方法,
这里我推荐这种永久设置的方法(在Windows下的话哈)
如下图,右击我的电脑属性
找到高级系统设置
进入后找到环境变量
在系统变量一栏里找到PYTHONPATH(没有自己新建一个)
在这里插入图片描述
然后编辑把,你项目的路径放到里面(话说这里还有神奇的部分,做这个操作的时候,记得把你的ide关了,不然改了它还要变回来)

然后你调用
python ppdet/modeling/tests/test_architectures.py
当显示
Ran 12 tests in 6.945s
OK (skipped=2)
如下,则表示paddledetection已经搭建好了

(linux下则是d调用
set PYTHONPATH=pwd:$PYTHONPATH
python ppdet/modeling/tests/test_architectures.py)
在这里插入图片描述以上环境搭建差不多好了,接下来是样本的准备了.

二样本数据准备

这一步我们通过labelImg来对我们准备的图片(也就是包含我们打算识别的物体的图片集),这里以我为例,我准备了接近1400张图片(标注累死了…),每张都包含我打算识别的清凉油和手

1.下载数据标注软件labelImg

下载链接
我建议下1.8.0版本,打开就能用无需安装(1.8.1版本的使用的时候有bug)
在这里插入图片描述
(当然linux下使用,或者想使用最新版,或想修改标注软件内部文件使得标注更准确的同学,也可以下载源码经过一系列命令操作后运行,源码使用方法参考它的github下方的Installation说明,点击访问github_labelimg)

2022.1.19更新提示：labelImg 使用的时候，存入xml中的文件路径为默认绝对路径，因此数据集之后训练的时候放到别的路径下时，如果训练程序是通过解析path来进行的话，会发生错误， <path>你的路径</path>路径进行的寻找数据集的图片！！

安装好后打开长这样:
在这里插入图片描述

2.对准备的图片集使用labelImg进行生成对应xml文件.

使用方法大家参考这个博主的点击进入
或者参考富土康一号质检员在数据集的准备那部分,我就懒得写了哈,大家点过去直接看点击跳转

最终我们使用得到了两个文件夹的文件(话说存对于的jpg图片和xml文件就是下面这样的哈)
Annotations 存储xml文件
JPEGImages存储对应的jpg图片
在这里插入图片描述

3.使用代码对以上的xml文件进行处理

注:进入这部分之前python引用路径使用不熟的老弟看看我写的这个python路径读取
这个很重要,因为paddledetection使用了大量的路径拼接函数,其中如果你斜杠的方向错误或者读取文件等的名字首字母与前一个斜杠,拼接成了转义字符就会发生报错.
就像我参考的富土康一号质检员的博客最下面的这些人(也包括开始的我)的报错一样;
[图片]
上面说的改绝对路径其实是不对的,因为如果绝对路径也出现了python默认的转义字符还是会报错(至于我为什么知道,说多了都是累和泪)
等有空了,我就想办法写个转义字符的处理模块给github上管paddledetection代码维护的那哥们,喊他加上去,唉…

1.
当然首先先把上面装着图片和xml文件的文件夹(Annotations 存储xml文件
JPEGImages存储对应的jpg图片)放到paddledetection下的dataset下你建的一个文件夹里(当然你也可以学我省事(因为后面我训练用的模型就是用的训练fruit对应的模型),直接把dataset下的fruit里的文件删除完,然后把这两个文件夹放进去,就像上面图片顶部路径里一样)

(以下我,默认你放的文件夹在fruit里面了哈)

2.
接着在fruit文件夹下建立个py文件,来创建两个文件,train.txt和val.txt
(这里py的作用是:提取xml文件名,并分成了两个txt,因为训练的时候是要用训练集和验证集的,这里训练集和验证集的比例为7比3,具体的你可以在下面代码里面改)

```python
import os
import random

train_precent=0.7
xml="E:/paddledetection/fruit/Annotations"#Annotations文件夹的路径
save="E:/paddledetection/fruit"#写存储输出的txt的路径
total_xml=os.listdir(xml)

num=len(total_xml)
tr=int(num*train_precent)
train=range(0,tr)

ftrain=open("E:/paddledetection/fruit/train.txt","w")#写你的要存储的train.txt的路径
ftest=open("E:/paddledetection/fruit/val.txt","w")#写你的val.txt的路径格式同上

for i in range(num):
    name=total_xml[i][:-4]+"\n"
    if i in train:
        ftrain.write(name)
    else:
        ftest.write(name)

ftrain.close()
ftest.close()

处理完毕后长这样,其实就是xml文件去掉xml后缀
在这里插入图片描述
3.
在此基础上我们再在fruit该文件夹下创建另一个py文件(这个程序原型来自富土康一号质检员的博客,我为了避免文件路径索引出错,删掉并注释了匹配发现train.txt和val.txt路径的代码,直接把train.txt和val.txt路径写在了里面,并将生成的文件的命名都加了"m_"
反正你把这个程序和train.txt和val.txt放在一起运行就可以生成对应文件里)
(该程序作用:提取之前创建的train.txt和val.txt并联系文件名对应的图片,创建图片和对应的xml文件的映射组,并存储到txt中,)

import os
import os.path as osp
import re
import random

devkit_dir = './'



def get_dir(devkit_dir,  type):
    return osp.join(devkit_dir, type)


def walk_dir(devkit_dir):

    annotation_dir = get_dir(devkit_dir, 'Annotations')


    img_dir = get_dir(devkit_dir, 'JPEGImages')
    trainval_list = []
    test_list = []
    added = set()
    ii = 1
    #...................
    img_ann_list = []

    for i in range(2):
        if (ii == 1):
            img_ann_list = trainval_list
            fpath = "train.txt"#这里写train.txt的路径,如果不在同一个文件夹下,记得把路径写对
            ii = ii + 1
        elif (ii == 2):
            img_ann_list = test_list
            fpath = "val.txt"#这里写train.txt的路径,如果不在同一个文件夹下,要记得把路径写对记得用"\\"表示文件夹分割 \v会被认作转义字符的
        else:
            print("error")

        for line in open(fpath):
            name_prefix = line.strip().split()[0]
            if name_prefix in added:  # 这里的这个可以防止错误,检测到重复的则跳出本次循环
                continue
            added.add(name_prefix)
            ann_path1 = osp.join(name_prefix + '.xml')
            # print(ann_path1)

            ann_path = osp.join(annotation_dir + "/" + ann_path1)

            # print("begin")
            # print(ann_path)

            img_path1 = osp.join(name_prefix + '.jpg')
            img_path = osp.join(img_dir + "/" + img_path1)
            # print("begin2")
            # print(img_path)

            # assert os.path.isfile(ann_path), 'file %s not found.' % ann_path
            # assert os.path.isfile(img_path), 'file %s not found.' % img_path
            img_ann_list.append((img_path, ann_path))

            # print("begin3")
            # print(trainval_list)
            # print(test_list)

        # print("trainval_list:")
        # print(trainval_list)
        #
        # print("test_list")
        # print(test_list)


    return trainval_list, test_list


def prepare_filelist(devkit_dir, output_dir):
    trainval_list = []
    test_list = []
    trainval, test = walk_dir(devkit_dir)
    trainval_list.extend(trainval)
    test_list.extend(test)
    random.shuffle(trainval_list)
    with open(osp.join(output_dir, 'm_train.txt'), 'w') as ftrainval:
        for item in trainval_list:
            ftrainval.write(item[0] + ' ' + item[1] + '\n')

    with open(osp.join(output_dir, 'm_val.txt'), 'w') as ftest:
        for item in test_list:
            ftest.write(item[0] + ' ' + item[1] + '\n')


if __name__ == '__main__':
    prepare_filelist(devkit_dir, '.')

最后运行的结果之所以改成生成m_train.txt和m_val.txt,是因为加m后可以避免,之后为了之后训练程序读取时的转义字符的错误,原程序生成的是train.txt和val.txt,和上面我注释中说的一样,\val.txt中的\v会被认作转义字符,随便提一下,那个\是python里的拼接函数弄的,你改变不了的.

最后生成的长这个格式
./JPEGImages/20200507173034.jpg ./Annotations/20200507173034.xml
如下图:

在这里插入图片描述
4.
建立存储你标注的xml文件的类到label_list.txt(还是放在fruit文件夹就好)
格式的话,记得换行,然后再多写一个类
比如下图,我只标注了"qingliangyou"和"three"没有标注"guagua"

这里之所以多写了一个guagua(当然你写成haha也没问题),是因为这个预测水果所构造的神经网络里,对于返回时的类默认是3个,(报错后,我看tool/infer.py中outs的返回值分析的,说错了不要怪我,有大佬分析的,可以看看paddledetection中tool/infer.py中的这里,然后顺着方法过去看模型里面的设置)
在这里插入图片描述

三模型配置文件参数修改

这里的模型配置文件是在paddledetection的configs文件夹下的yolov3_mobilenet_v1_fruit.yml,当然你也可以用别的模型,那样就要使用别的配置文件
如设置
最大迭代步数：max_iters
预训练模型的来源：pretrain_weights
数据路径dataset_dir
Batch_size的大小 batch_size
当然什么的都要靠经验,没有经验的话就莫改了,不过关于在自己电脑训练使用gpu还是cpu还是得改改的

这里我贴下我改的,主要需要修改的是
dataset_dir:
anno_path:
还有use_gpu:
(电脑老,我就用的内存运行调错的)
注意看下面m_train.txt和m_val.txt用的相对路径,而label_list.txt用的绝对路径

architecture: YOLOv3
use_gpu: false
max_iters: 20000
log_smooth_window: 20
save_dir: output
snapshot_iter: 200
metric: VOC
map_type: 11point
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar
weights: output/yolov3_mobilenet_v1_fruit/best_model
num_classes: 3
finetune_exclude_pretrained_params: ['yolo_output']
use_fine_grained_loss: false

YOLOv3:
  backbone: MobileNet
  yolo_head: YOLOv3Head

MobileNet:
  norm_type: sync_bn
  norm_decay: 0.
  conv_group_scale: 1
  with_extra_blocks: false

YOLOv3Head:
  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
  anchors: [[10, 13], [16, 30], [33, 23],
            [30, 61], [62, 45], [59, 119],
            [116, 90], [156, 198], [373, 326]]
  norm_decay: 0.
  yolo_loss: YOLOv3Loss
  nms:
    background_label: -1
    keep_top_k: 100
    nms_threshold: 0.45
    nms_top_k: 1000
    normalized: false
    score_threshold: 0.01

YOLOv3Loss:
  # batch_size here is only used for fine grained loss, not used
  # for training batch_size setting, training batch_size setting
  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
  # size here should be set as same value as TrainReader.batch_size
  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: true

LearningRate:
  base_lr: 0.00001
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones:
    - 15000
    - 18000
  - !LinearWarmup
    start_factor: 0.
    steps: 100

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0005
    type: L2

_READER_: 'yolov3_reader.yml'
# will merge TrainReader into yolov3_reader.yml
TrainReader:
  inputs_def:
    image_shape: [3, 608, 608]
    fields: ['image', 'gt_bbox', 'gt_class', 'gt_score']
    num_max_boxes: 50
  dataset:
    !VOCDataSet
    dataset_dir: ../dataset/fruit
    anno_path: m_train.txt
    with_background: false
    use_default_label: false
  sample_transforms:
  - !DecodeImage
    to_rgb: true
    with_mixup: false
  - !NormalizeBox {}
  - !ExpandImage
    max_ratio: 4.0
    mean: [123.675, 116.28, 103.53]
    prob: 0.5
  - !RandomInterpImage
    max_size: 0
    target_size: 608
  - !RandomFlipImage
    is_normalized: true
    prob: 0.5
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: true
    is_channel_first: false
  - !PadBox
    num_max_boxes: 50
  - !BboxXYXY2XYWH {}
  batch_transforms:
  - !RandomShape
    sizes: [608]
  - !Permute
    channel_first: true
    to_bgr: false
  batch_size: 1
  shuffle: true
  mixup_epoch: -1

EvalReader:
  batch_size: 1
  inputs_def:
    image_shape: [3, 608, 608]
    fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
    num_max_boxes: 50
  dataset:
    !VOCDataSet
    dataset_dir: ../dataset/fruit
    anno_path: m_val.txt
    use_default_label: false
    with_background: false

TestReader:
  batch_size: 1
  dataset:
    !ImageFolder
    anno_path: E:/z_paddle detection5.18/dataset/fruit/label_list.txt
    use_default_label: false
    with_background: false

四移动准备好的数据文件夹并开启训练

先我们进入训练命令,项目文件夹的命令行(用pycharm的直接在里面用方便点哈)

python -u tools/train.py -c configs/yolov3_mobilenet_v1_fruit.yml --use_tb=True --tb_log_dir=tb_fruit_dir/scalar --eval

运行成功的话,就长这样
看见程序冒出这个格式,就表明实在训练了,loss训练得越下,理论上模型的准确度就越高,随便提一下,这里程序是训练十个周期保存一次

2020-05-21 19:42:23,476-INFO: iter: 0, lr: 0.000000, 'loss': '21803.730469', time: 0.990, eta: 5:30:01

在这里插入图片描述
重点来了:然后ctrl加c终止程序!
因为这一步训练并不是你的样本,实际上程序把官方的样本(就是那些水果图片和水果xml文件下载下来进行训练了)
这里调用这个程序,只是为了下载训练的神经网络模型而已,程序默认下载模型文件到
C:\Users\Administrator.cache\paddle\weights (反正默认是系统盘,是不是c我不知道哈,没找到的话,仔细看程序运行后命令行下载模型程序的路径,实在不行用everything找yolov3_mobilenet_v1文件夹吧)

将之前准备好的fruit文件夹内的东西放到这里C:\Users\Administrator.cache\paddle\dataset\fruit的文件夹里

然后再来使用,下面的代码,训练的时候就是自己准备的样本数据了

python -u tools/train.py -c configs/yolov3_mobilenet_v1_fruit.yml --use_tb=True --tb_log_dir=tb_fruit_dir/scalar --eval

五样本标记错误排查

在训练时,由于样本制作的错误,是会报错的,例如在进行xml文件制作的时候将标签写错了(其实我做的时候就写错了一个)
比如像下面这样,标记本来是打算写three但是少些了一个e的报错
在这里插入图片描述
这里要把错误的样本找出来的话,就要在ppdet\data\source\voc.py文件中进行修改(下面程序注释的部分,提供了一种打印对应读取xml文件内容的名字的方法,使用的时候去掉注释就可以了,当然xml文件内容别的错误,参考这个博客)

  with open(anno_path, 'r') as fr:
            print("进入107,打开xml路径成功")
            ii = 0
            while True:
                print("进入109")
                line = fr.readline()
                if not line:
                    break
                img_file, xml_file = [os.path.join(image_dir, x) \
                        for x in line.strip().split()[:2]]
                if not os.path.isfile(xml_file):
                    continue
                tree = ET.parse(xml_file)
                if tree.find('id') is None:
                    im_id = np.array([ct])
                else:
                    im_id = np.array([int(tree.find('id').text)])

                objs = tree.findall('object')
                # filename = tree.find('filename').text    #新加的....................................
                im_w = float(tree.find('size').find('width').text)
                im_h = float(tree.find('size').find('height').text)
                gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
                gt_class = np.zeros((len(objs), 1), dtype=np.int32)
                gt_score = np.ones((len(objs), 1), dtype=np.float32)
                is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
                difficult = np.zeros((len(objs), 1), dtype=np.int32)
                for i, obj in enumerate(objs):
                    cname = obj.find('name').text
                    # print("打印打印 cname:" + cname)
                    # print(ii)
                    # ii = ii + 1
                    # print(filename)#新增......................................................
                    gt_class[i][0] = cname2cid[cname]
                    _difficult = int(obj.find('difficult').text)
                    x1 = float(obj.find('bndbox').find('xmin').text)
                    y1 = float(obj.find('bndbox').find('ymin').text)
                    x2 = float(obj.find('bndbox').find('xmax').text)
                    y2 = float(obj.find('bndbox').find('ymax').text)
                    x1 = max(0, x1)
                    y1 = max(0, y1)
                    x2 = min(im_w - 1, x2)
                    y2 = min(im_h - 1, y2)
                    gt_bbox[i] = [x1, y1, x2, y2]
                    is_crowd[i][0] = 0
                    difficult[i][0] = _difficult
                voc_rec = {
                    'im_file': img_file,
                    'im_id': im_id,
                    'h': im_h,
                    'w': im_w,
                    'is_crowd': is_crowd,
                    'gt_class': gt_class,
                    'gt_score': gt_score,
                    'gt_bbox': gt_bbox,
                    'difficult': difficult
                }
                if len(objs) != 0:
                    records.append(voc_rec)
                # print("records打印记录1 !!!")
                # print(records)

                ct += 1
                if self.sample_num > 0 and ct >= self.sample_num:
                    break
        # print("records打印记录2 !!!")
        # print(records)
        assert len(records) > 0, 'not found any voc record in %s' % (
            self.anno_path)
        logger.info('{} samples in file {}'.format(ct, anno_path))
        self.roidbs, self.cname2cid = records, cname2cid

六利用生成的样本进行单张图片预测

训练没有问题的话,那么在设定的output\yolov3_mobilenet_v1_fruit文件夹里便会出现,一大堆类似于下面3个一组的文件在这里插入图片描述

调用的时候选择best_model.pdparams文件就可以进行预测了(以下是单张图片预测,预测一个文件夹的话直接把图片换成文件夹路径就可以了)

python -u tools/infer.py -c configs/yolov3_mobilenet_v1_fruit.yml -o weights=inference_model\best_model --infer_img=demo/orange_71.jpg --output_dir=infer_output

这样图片预测的第一阶段就算ok了
也就是文章开始时的样子
在这里插入图片描述
此博客为回应前期的里,我立flag的这个博客
(最近炉石传说上传说,卡在钻石5很难受,随便立个flag,后期来根据图像识别,和对抗神经网络来做个炉石传说自动上分的应用好了)

最后感谢百度深度学习集训营NLP的不会debug的学习委员在我报了一堆错后给我的建议.
也要感谢paddlepaddle开发的工程师们的努力(对于英语不好的我看tensrflow国外的文档,真是一言难尽…)

关于用摄像头进行视频预测的部分要修改
deploy/python/infer.py中的代码,由于这个预测水果的模型是直接搬过来用的,估计深度不够,样本太少,用摄像头识别的时候有点尴尬,并且我改的程序还有些神奇的bug,过些日子,我再使用别的模型进行训练后来做摄像头处理预测的博客,到时候我会将代码发到github上.