新手复现yolox遇到的问题（小白指南）

tan90渡

已于 2022-07-26 10:10:59 修改

阅读量1.9k

点赞数 4

分类专栏： cv论文学习项目笔记文章标签： python 深度学习开发语言

于 2022-07-14 11:08:28 首次发布

本文链接：https://blog.csdn.net/weixin_43722349/article/details/125714753

版权

cv论文学习同时被 3 个专栏收录

2 篇文章 0 订阅

订阅专栏

项目

2 篇文章 0 订阅

订阅专栏

笔记

2 篇文章 2 订阅

订阅专栏

新手复现yolox（小白指南）傻瓜教程

一复现步骤

本文复现环境：CUDA11.3 ；单卡
从0开始深度学习环境配置：从0开始深度学习环境配置
新手教程：
详细教程1
新手教程2
本文复现数据集：COCO2017 和自制无人机数据集（labelme标注）均转换为VOC格式处理

二遇到的问题（教程中和复现中出现的问题）

1. 不应该直接对新建环境进行默认包安装

出错的方式

pip install -r requirements.txt

无配套torch、torchvision，直接用以上命令安装包，会导致很多版本不兼容的问题

建议的安装方式
（1）安装torch、torchvision的时候，建议官网下载，根据自己的cuda版本安装！
pytroch官网
（2）其他的包

pip install 安装包 -i 镜像源

pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install opencv_python -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install loguru -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install scikit-image -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install Pillow -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install thop -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install ninja -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install tabulate -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install tensorboard -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install tqdm -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pycocotools==2.0.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install onnxruntime==1.8.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install onnx==1.8.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install onnx-simplifier==0.3.5 -i https://pypi.tuna.tsinghua.edu.cn/simple

3. 训练自己数据集（labelme-to-voc）出现类型错误

invalid literal for int() with base 10

2022-07-08 15:42:22 | INFO     | yolox.core.trainer:261 - epoch: 10/300, iter: 320/325, mem: 4909Mb, iter_time: 0.203s, data_time: 0.004s, total_loss: 1.6, iou_loss: 0.9, l1_loss: 0.0, conf_loss: 0.4, cls_loss: 0.3, lr: 6.235e-04, size: 512, ETA: 5:28:02
2022-07-08 15:42:23 | INFO     | yolox.core.trainer:356 - Save weights to ./YOLOX_outputs/VOC0707
100%|##########| 325/325 [00:21<00:00, 14.80it/s]
2022-07-08 15:42:46 | INFO     | yolox.evaluators.voc_evaluator:160 - Evaluate in main process...
Writing 0 VOC results file
Eval IoU : 0.50
2022-07-08 15:42:48 | INFO     | yolox.core.trainer:196 - Training of experiment is done and the best AP is 0.00
2022-07-08 15:42:48 | ERROR    | yolox.core.launch:147 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (1800038), thread 'MainThread' (139871294735552):
Traceback (most recent call last):
...
  File "/home/sunxue/anaconda3/envs/py36-new/lib/python3.6/site-packages/yolox-0.3.0-py3.6-linux-x86_64.egg/yolox/evaluators/voc_eval.py", line 27, in parse_rec
    int(bbox.find("xmin").text),
        │    └ <method 'find' of 'xml.etree.ElementTree.Element' objects>
        └ <Element 'bndbox' at 0x7f352dfa1b88>

ValueError: invalid literal for int() with base 10: '922.5190839694656'

lableme转换的标签数据为float格式，需要对源代码格式修改
yolox/evaluators/voc_eval.py, line 27
改为：

int(float(bbox.find("xmin").text)),
int(float(bbox.find("ymin").text)),
int(float(bbox.find("xmax").text)),
int(float(bbox.find("ymax").text)),

4.KeyError ‘airplane’(自己训练的类型名）

需要再次安装yolox

python setup.py install

三命令行执行yolox

测试环境

python tools/demo.py image -f exps/default/yolox_s.py -c ./yolox_s.pth --path assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result --device gpu

训练命令行

python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 4 --fp16  -c yolox_s.pth

接着上一次训练结果继续训练的命令行

python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 4 -c YOLOX_outputs/yolox_voc_s/latest_ckpt.pth --resume --start_epoch=100

测试实验模型（于单张图像）

python tools/demo.py image -f exps/example/yolox_voc/yolox_voc_s.py -c YOLOX_outputs/yolox_voc_s/latest_ckpt.pth --path ./assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result --device gpu

验证实验模型（于验证集）

python tools/eval.py -f exps/example/yolox_voc/yolox_voc_s.py -c YOLOX_outputs/yolox_voc_s/best_ckpt.pth -d 1 -b 4 --conf 0.001 --fp16

四 VOC数据集处理

1. labelme数据集转voc数据集（labelme2xml）

import os
import numpy as np
import codecs
import json
import glob
import cv2
import shutil
from sklearn.model_selection import train_test_split

# 1.标签路径
labelme_path = "F:\data\datasets\VOC\LabelmeData/"  # 原始labelme标注数据路径
saved_path = "F:\data\datasets\VOC\VOC2007/"  # 保存路径

# 2.创建要求文件夹
dst_annotation_dir = os.path.join(saved_path, 'Annotations')
if not os.path.exists(dst_annotation_dir):
    os.makedirs(dst_annotation_dir)
dst_image_dir = os.path.join(saved_path, "JPEGImages")
if not os.path.exists(dst_image_dir):
    os.makedirs(dst_image_dir)
dst_main_dir = os.path.join(saved_path, "ImageSets", "Main")
if not os.path.exists(dst_main_dir):
    os.makedirs(dst_main_dir)

# 3.获取待处理文件
org_json_files = sorted(glob.glob(os.path.join(labelme_path, '*.json')))
org_json_file_names = [i.split("\\")[-1].split(".json")[0] for i in org_json_files]
org_img_files = sorted(glob.glob(os.path.join(labelme_path, '*.jpg')))
org_img_file_names = [i.split("\\")[-1].split(".jpg")[0] for i in org_img_files]

# 4.labelme file to voc dataset
for i, json_file_ in enumerate(org_json_files):
    json_file = json.load(open(json_file_, "r", encoding="utf-8"))
    image_path = os.path.join(labelme_path, org_json_file_names[i]+'.jpg')
    img = cv2.imread(image_path)
    height, width, channels = img.shape
    dst_image_path = os.path.join(dst_image_dir, "{:06d}.jpg".format(i))
    cv2.imwrite(dst_image_path, img)
    dst_annotation_path = os.path.join(dst_annotation_dir, '{:06d}.xml'.format(i))
    with codecs.open(dst_annotation_path, "w", "utf-8") as xml:
        xml.write('<annotation>\n')
        xml.write('\t<folder>' + 'Pin_detection' + '</folder>\n')
        xml.write('\t<filename>' + "{:06d}.jpg".format(i) + '</filename>\n')
        # xml.write('\t<source>\n')
        # xml.write('\t\t<database>The UAV autolanding</database>\n')
        # xml.write('\t\t<annotation>UAV AutoLanding</annotation>\n')
        # xml.write('\t\t<image>flickr</image>\n')
        # xml.write('\t\t<flickrid>NULL</flickrid>\n')
        # xml.write('\t</source>\n')
        # xml.write('\t<owner>\n')
        # xml.write('\t\t<flickrid>NULL</flickrid>\n')
        # xml.write('\t\t<name>ChaojieZhu</name>\n')
        # xml.write('\t</owner>\n')
        xml.write('\t<size>\n')
        xml.write('\t\t<width>' + str(width) + '</width>\n')
        xml.write('\t\t<height>' + str(height) + '</height>\n')
        xml.write('\t\t<depth>' + str(channels) + '</depth>\n')
        xml.write('\t</size>\n')
        xml.write('\t\t<segmented>0</segmented>\n')
        for multi in json_file["shapes"]:
            points = np.array(multi["points"])
            xmin = min(points[:, 0])
            xmax = max(points[:, 0])
            ymin = min(points[:, 1])
            ymax = max(points[:, 1])
            label = multi["label"]
            if xmax <= xmin:
                pass
            elif ymax <= ymin:
                pass
            else:
                xml.write('\t<object>\n')
                xml.write('\t\t<name>' + label + '</name>\n')
                xml.write('\t\t<pose>Unspecified</pose>\n')
                xml.write('\t\t<truncated>1</truncated>\n')
                xml.write('\t\t<difficult>0</difficult>\n')
                xml.write('\t\t<bndbox>\n')
                xml.write('\t\t\t<xmin>' + str(xmin) + '</xmin>\n')
                xml.write('\t\t\t<ymin>' + str(ymin) + '</ymin>\n')
                xml.write('\t\t\t<xmax>' + str(xmax) + '</xmax>\n')
                xml.write('\t\t\t<ymax>' + str(ymax) + '</ymax>\n')
                xml.write('\t\t</bndbox>\n')
                xml.write('\t</object>\n')
                print(json_file_, xmin, ymin, xmax, ymax, label)
        xml.write('</annotation>')

2、xml2txt

import os
import random

trainval_percent = 0.95  # 可以自己修改
train_percent = 0.95  # 可以自己修改
xmlfilepath = 'F:\data\datasets\VOC/uav_VOC2007\Annotations'
txtsavepath = 'F:\data\datasets\VOC/uav_VOC2007\ImageSets\Main'
if not os.path.exists(txtsavepath):
    os.makedirs(txtsavepath)
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open('F:\data\datasets\VOC/uav_VOC2007\ImageSets\Main/trainval.txt', 'w')
ftest = open('F:\data\datasets\VOC/uav_VOC2007\ImageSets\Main/test.txt', 'w')
ftrain = open('F:\data\datasets\VOC/uav_VOC2007\ImageSets\Main/train.txt', 'w')
fval = open('F:\data\datasets\VOC/uav_VOC2007\ImageSets\Main/val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

五相关技巧

1. tensorboard使用

代码运行之后的可视化文件如下：
tensorboard文件
打开anaconda promt，使用以下命令，

tensorboard --logdir=文件所在路径

得到网址：tensorboard可视化网址

2.指定训练时使用的GPU

import os
os.environ['CUDA_VISIBLE_DEVICES']='0、3、5'

六.实验结果

1.基于coco数据集-coco格式-yolox_s的复现结果

tan90渡

关注

4
点赞
踩
16

收藏

觉得还不错? 一键收藏
0
评论
新手复现yolox遇到的问题（小白指南）

本文复现环境：CUDA11.3 ；单卡从0开始深度学习环境配置：从0开始深度学习环境配置新手教程：新手教程本文复现数据集：COCO2017 和自制无人机数据集（labelme标注）均转换为VOC格式处理出错的方式无配套torch、torchvision，直接用以上命令安装包，会导致很多版本不兼容的问题建议的安装方式（1）安装torch、torchvision的时候，建议官网下载，根据自己的cuda版本安装！pytroch官网（2）其他的包3. 训练自己数据集（labelme-to-
复制链接

扫一扫