SSD-Pytorch,Darknet数据集转VOC训练、检测图片、全流程跑通

最新推荐文章于 2022-03-20 21:18:05 发布

卖strawberry的小女孩

最新推荐文章于 2022-03-20 21:18:05 发布

阅读量2.3k

点赞数 1

分类专栏：深度学习目标检测文章标签： pytorch 目标检测深度学习 python 神经网络

本文链接：https://blog.csdn.net/baidu_41906969/article/details/121835265

版权

深度学习同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

目标检测

5 篇文章 0 订阅

订阅专栏

文章目录

一、环境

千辛万苦走通后，发现版本真的坑死人。

我的版本：python3.6 + cuda 10.2 + pytorch1.7.1 + numpy1.15.1 + RTX2060

（建议：据说将pytorch的版本降低为1.2及以下的版本。但是我的cuda是10.2，目前不支持1.2及其以下的GPU版的pytorch，重安装太麻烦了，我就只能在后边解决问题了）

二、下载项目

自己配置
使用的是SSD-Pytorch git项目，因为需要使用外网，有时候可能git clone不成功，建议直接下载zip包，本地压缩：
项目地址：https://github.com/amdegroot/ssd.pytorch

git clone https://gitcode.net/mirrors/amdegroot/ssd.pytorch.git

第一次下载失败，git网页没进去，后来使用了VPN才能下载：

下载配置好项目（我自己调通后的项目）
地址：https://github.com/625135449/SSD-Pytorch，可直接git clone

git clone https://github.com/625135449/SSD-Pytorch

三、准备数据集

我的数据集格式是darknet的yolo格式，在此转为voc数据集（可自行转为coco数据集）

3.1 数据结构

darknet格式：

voc格式：在Annotations中放置所有的xml标签，在JPEGImages中放置所有的图片，ImageSets/Main中放置train.txt、trainval.txt、val.txt、test.txt（内容只有图片的名字)

3.2 darkent的txt文件转为voc的xml文件代码

输入相关的文件地址、类别

import os
import glob
from PIL import Image
from tqdm import tqdm

voc_annotations = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/Annotations/' #存放的xml文件地址
yolo_txt = '/home/darknet/Helmet/labels/'  #darkent数据集标签文件地址
img_path = '/home/darknet/Helmet/images/' #darkent数据集图片地址
labels = ['no helmet', 'wear helmet']  #darknet数据集的类别

# 图像存储位置
src_img_dir = img_path
# 图像的txt文件存放位置
src_txt_dir = yolo_txt
src_xml_dir = voc_annotations
img_Lists = glob.glob(src_img_dir + '/*.jpg')
img_basenames = []

for item in img_Lists:
    img_basenames.append(os.path.basename(item))

img_names = []
for item in img_basenames:
    temp1, temp2 = os.path.splitext(item)
    img_names.append(temp1)

for img in tqdm(img_names):
    im = Image.open((src_img_dir + '/' + img + '.jpg'))
    width, height = im.size

    # 打开txt文件
    gt = open(src_txt_dir + '/' + img + '.txt').read().splitlines()
    # print(gt)
    if gt:
        # 将主干部分写入xml文件中
        xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
        xml_file.write('<annotation>\n')
        xml_file.write('    <folder>VOC2007</folder>\n')
        xml_file.write('    <filename>' + str(img) + '.jpg' + '</filename>\n')
        xml_file.write('    <size>\n')
        xml_file.write('        <width>' + str(width) + '</width>\n')
        xml_file.write('        <height>' + str(height) + '</height>\n')
        xml_file.write('        <depth>3</depth>\n')
        xml_file.write('    </size>\n')

        # write the region of image on xml file
        for img_each_label in gt:  # txt 文件中的每一行
            spt = img_each_label.split(' ')  # 这里如果txt里面是以逗号‘，’隔开的，那么就改为spt = img_each_label.split(',')。
            # print(f'spt:{spt}')
            xml_file.write('    <object>\n')
            xml_file.write('        <name>' + str(labels[int(spt[0])]) + '</name>\n')
            xml_file.write('        <pose>Unspecified</pose>\n')
            xml_file.write('        <truncated>0</truncated>\n')
            xml_file.write('        <difficult>0</difficult>\n')
            xml_file.write('        <bndbox>\n')

            center_x = round(float(spt[1].strip()) * width)
            center_y = round(float(spt[2].strip()) * height)
            bbox_width = round(float(spt[3].strip()) * width)
            bbox_height = round(float(spt[4].strip()) * height)
            xmin = str(int(center_x - bbox_width / 2))
            ymin = str(int(center_y - bbox_height / 2))
            xmax = str(int(center_x + bbox_width / 2))
            ymax = str(int(center_y + bbox_height / 2))

            xml_file.write('            <xmin>' + xmin + '</xmin>\n')
            xml_file.write('            <ymin>' + ymin + '</ymin>\n')
            xml_file.write('            <xmax>' + xmax + '</xmax>\n')
            xml_file.write('            <ymax>' + ymax + '</ymax>\n')
            xml_file.write('        </bndbox>\n')
            xml_file.write('    </object>\n')

        xml_file.write('</annotation>')

3.3 自动生成test.txt、train.txt、trainval.txt、val.txt代码

输入相关的文件地址

import os
import random

trainval_percent = 0.66
train_percent = 0.5
xmlfilepath = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/Annotations'
txtsavepath = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main'
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)   #xml个数
list = range(num)
tv = int(num * trainval_percent)   #总数的66%
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/trainval.txt', 'w')
ftest = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/test.txt', 'w')
ftrain = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/train.txt', 'w')
fval = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

四、ssd.pytorch项目操作

4.1 创建数据集

使用的VOC数据集
没有数据集的可以下载代码自带的VOC和COCO数据集（./data/scripts目录下）
有自己数据集的进行以下操作：
- 在data文件夹下新建VOCdevkit文件夹
- 上边转好的数据集VOC2021复制到VOCdevkit文件夹下，结构如下（如果使用我的项目，运行./data/VOCdevkit、VOC2021下的darknet_to_voc.py、split_txt.py）：

4.2 修改配置文件

以下以我的数据为例：

配置环境：
- 下载预训练权重vgg16_reducedfc.pth，放入ssd.pytorch/weights中(没有weights文件夹则新建)，权重地址
- 安装pillow、opencv-python、tqdm
- 安装numpy ：建议安装1.15.1，高于该版本会报错
- 安装pytorch：可去官网根据cuda版本下载相应的Torch版本，这位博主有写对应的，地址：https://blog.csdn.net/llm765800916/article/details/118146146
  我的安装命令：

pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2

./data/config.py中的voc：
- HOME = os.path.expanduser("~")，加入项目ssd.pytorch所在的绝对地址（我的是改为HOME = os.path.expanduser("/home/ssd.pytorch")）
- 'num_classes’的类别数：classes+1（背景算一类），我是2个类，所有是3
- ‘max_iter’的训练迭代次数：测试用，所以暂时设置的1000（根据自己的电脑配置参数与需求）
./data/coco.py
- 将11line中的COCO_ROOT = osp.join(HOME, ‘data/coco/’)改为COCO_ROOT = osp.join(HOME, ‘data/’)
./data/voc0712.py
- 将20line 的VOC_CLASS改为自己的类别名；
- 93line的image_sets=[(‘2007’, ‘trainval’), (‘2012’, ‘trainval’)]改为自己的数据集名字和文件名（我的数据集为VOC2021，用ImageSets/Main下的train.txt、trainval.txt），我改后为：image_sets=[(‘2021’, ‘train’), (‘2021’, ‘trainval’)]
- 95line的dataset_name='VOC0712’改为dataset_name=‘voc0712’
./train.py
- 32line的batch_size，默认=32，建议改小一点，可以改8（Batch Size指一次训练所选取的样本数，其大小影响模型的优化程度和速度，同时其直接影响到GPU内存的使用情况，假如你GPU内存不大，该数值最好设置小一点）
- 194line的iteration % 5000 == 0，根据config.py中设置的max_iter选择每迭代多少次保存一次模型。
./SSD.py
- 32line的self.cfg = (coco, voc)[num_classes == 21]，21改为自己的类别数3
- 198line的def build_ssd(phase, size=300, num_classes=21)，21改为自己的类别数

五、训练过程error、warning解决

line的定位可能不太准，在该line上下几行定位下即可

error
loss_c[pos] = 0 # filter out pos boxes for now
IndexError: The shape of the mask [8, 8732] at index 0 does not match the shape of the indexed tensor [69856, 1] at index 0
solved
定位到./layers/modules/multibox_loss.py
- 97line与98line对调一下
  loss_c[pos] = 0 # filter out pos boxes for now
  loss_c = loss_c.view(num, -1)
  改为：
  loss_c = loss_c.view(num, -1)
  loss_c[pos] = 0 # filter out pos boxes for now
- 114line
  N = num_pos.data.sum()
  改为：
  N = num_pos.data.sum().double()
  loss_l = loss_l.double()
  loss_c = loss_c.double()
error
RuntimeError: Expected a ‘cuda’ device type for generator but found ‘cpu’
solved
安装对应cuda的pytorch：

pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2

error
loc_loss += loss_l.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item<T>() in C++ to convert a 0-dim tensor to a number
solved
定位到./train.py 183line之后的所有.data[0]改为.data
error
StopIteration
solved
定位到./train.py 165line
*images, targets = next(batch_iterator)*改为：
try:
images, targets = next(batch_iterator)
except StopIteration:
batch_iterator = iter(data_loader)
images, targets = next(batch_iterator)
warning
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray

solved
pip install numpy==1.15.1
warning
- UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  init.xavier_uniform(param)
  solved
  定位train.py 218line：init.xavier_uniform 改为 init.xavier_uniform_
- UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
  targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
  solved
  定位train.py 173line、176line中的’volatile=True’删除，例如：targets = [Variable(ann.cuda(), volatile=True)
  改为：targets = [Variable(ann.cuda())
训练出现-nan

solved：定位到./train.py 42line：*parser.add_argument(’–lr’, ‘–learning-rate’, default=1e-3, type=float,help=‘initial learning rate’)*默认为0.01（1e-3），降低学习率即可。

六、训练完成后的验证

6.1 配置eval.py

修改38line的训练好的模型（运行train.py成功后会自动保存模型到weights文件夹中，我取的loss值最低的一个模型）：
parser.add_argument(’–trained_model’,default=‘weights/ssd_VOC_500.pth’…)
修改54line的：args = parser.parse_args()–>args,unknow= parser.parse_known_args()
修改69、70、71、73line的annopath、imgpath、imgsetpath、YEAR（项目作者用的voc2007，我建立的是voc2021，所以需要修改）
比如：annopath = os.path.join(args.voc_root, ‘VOC2007’, ‘Annotations’, ‘%s.xml’)
改为：annopath = os.path.join(args.voc_root, ‘VOC2021’, ‘Annotations’, ‘%s.xml’)
修改429line：dataset = VOCDetection(args.voc_root, [(‘2007’, set_type)]…)
改为：dataset = VOCDetection(args.voc_root, [(‘2021’, set_type)]…)

得到map结果：
在这里插入图片描述

6.3 配置test.py

修改17line的训练好的模型
修改87line的testset = VOCDetection(args.voc_root, [(‘2007’, ‘test’)]…)，2007改为2021

在这里插入图片描述

6.4 检测图片，可视化

放入项目 ./demo/live_img.py：带检测框的图片 https://github.com/625135449/SSD-Pytorch/blob/main/demo/live_img.py
在这里插入图片描述

放入项目 ./demo/live_score.py：带置信度检测框的图片 https://github.com/625135449/SSD-Pytorch/blob/main/demo/live_score.py
在这里插入图片描述

6.5 eval.py检测过程的error、warning

error
RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method.
solved
据说pytorch版本低于1.2不会出现该问题，可自行降版本，以下是不降版本的解决方法，参考的这位博主的解决方法：地址链接
- 定位./ssd.py 98line（注释的是原代码，以下是修改后的）

        if self.phase == "test":
            # output = self.detect(
            #     loc.view(loc.size(0), -1, 4),                   # loc preds
            #     self.softmax(conf.view(conf.size(0), -1,
            #                  self.num_classes)),                # conf preds
            #     self.priors.type(type(x.data))                  # default boxes
            # )
            output = self.detect.forward(
                loc.view(loc.size(0), -1, 4),  # loc preds
                self.softmax(conf.view(conf.size(0), -1,
                                       self.num_classes)),  # conf preds
                self.priors.type(type(x.data))  # default boxes
            )

定位./layers/box_utils.py 中的def nms(boxes, scores, overlap=0.5, top_k=200)函数，改成以下的函数：

def nms(boxes, scores, overlap=0.5, top_k=200):  ##参数：边界框精确位置，边界框类别的分数、nms阈值、前200个边界框
    '''（1）构建keep张量：初始值为0,形状与预测框的数量相同（预测框的数量为该类，类别置信度大于阈值的预测边界框的数量）'''
    keep = scores.new(scores.size(0)).zero_().long()

    if boxes.numel() == 0:
        return keep

    '''（2）计算预测边界框的面积'''
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    area = torch.mul(x2 - x1, y2 - y1)

    '''（3）获取 类别置信度分数最高的top_k个 预测边界框的索引'''
    v, idx = scores.sort(0)  # 对类别置信度分数升序排序，返回 按照类别置信度分数排序后的   预测边界框的索引
    # I = I[v >= 0.01]
    '''类别置信度分数最高的前top_k个预测框的索引：idx '''
    idx = idx[-top_k:]  # indices of the top-k largest vals
    xx1 = boxes.new()
    yy1 = boxes.new()
    xx2 = boxes.new()
    yy2 = boxes.new()
    w = boxes.new()
    h = boxes.new()
    '''(4)将nms后的预测边界框的索引，存入keep'''
    count = 0
    while idx.numel() > 0:
        ''''#1.类别置信度分数最高的预测边界框————————索引逐一写入keep'''
        i = idx[-1]  # index of current largest val
        # keep.append(i)
        keep[count] = i
        count += 1

        if idx.size(0) == 1:
            break
        '''#2.剩余预测边界框的索引'''
        idx = idx[:-1]  # remove kept element from view
        '''#3.计算剩余预测边界框与，分数最高的边界框之间的iou值'''
        #####################################添加代码##########################################
        # 否者出错RuntimeError: index_select(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
        idx = torch.autograd.Variable(idx, requires_grad=False)
        idx = idx.data
        x1 = torch.autograd.Variable(x1, requires_grad=False)
        x1 = x1.data
        y1 = torch.autograd.Variable(y1, requires_grad=False)
        y1 = y1.data
        x2 = torch.autograd.Variable(x2, requires_grad=False)
        x2 = x2.data
        y2 = torch.autograd.Variable(y2, requires_grad=False)
        y2 = y2.data
        ######################################添加代码#################################################
        torch.index_select(x1, 0, idx, out=xx1)
        torch.index_select(y1, 0, idx, out=yy1)
        torch.index_select(x2, 0, idx, out=xx2)
        torch.index_select(y2, 0, idx, out=yy2)
        # store element-wise max with next highest score
        xx1 = torch.clamp(xx1, min=x1[i])
        yy1 = torch.clamp(yy1, min=y1[i])
        xx2 = torch.clamp(xx2, max=x2[i])
        yy2 = torch.clamp(yy2, max=y2[i])
        w.resize_as_(xx2)
        h.resize_as_(yy2)
        w = xx2 - xx1
        h = yy2 - yy1
        # check sizes of xx1 and xx2.. after each iteration
        w = torch.clamp(w, min=0.0)
        h = torch.clamp(h, min=0.0)
        inter = w * h
        # IoU = i / (area(a) + area(b) - i)
        #####################################添加代码##########################################
        # 否者出错RuntimeError: index_select(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
        area = torch.autograd.Variable(area, requires_grad=False)
        area = area.data
        idx = torch.autograd.Variable(idx, requires_grad=False)
        idx = idx.data
        ######################################添加代码#################################################
        rem_areas = torch.index_select(area, 0, idx)  # load remaining areas)
        union = (rem_areas - inter) + area[i]
        IoU = inter / union  # store result in iou
        # keep only elements with an IoU <= overlap
        '''4.保留iou值小于nms阈值的预测边界框的索引'''
        idx = idx[IoU.le(overlap)]  # 保留交并比小于阈值的预测边界框的id
    return keep, count```

+ **warning** 
	*UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  self.priors = Variable(self.priorbox.forward(), volatile=True)*
  **solved**：
  定位./ssd.py 34line *self.priors = Variable(self.priorbox.forward(), volatile=True)*
  改为：*self.priors = Variable(self.priorbox.forward())*

参考了这位博主的流程：[https://blog.csdn.net/weixin_42447868/article/details/105675158#comments_19145022](https://blog.csdn.net/weixin_42447868/article/details/105675158#comments_19145022)

卖strawberry的小女孩

关注

1
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
SSD-Pytorch,Darknet数据集转VOC训练、检测图片、全流程跑通

文章目录一、环境二、下载SSD-Pytorch git项目一、环境千辛万苦走通后，发现版本真的坑死人。我的版本：python3.6 + cuda 10.2 + pytorch1.7.1 + numpy1.15.1（建议：据说将pytorch的版本降低为1.2及以下的版本。但是我的cuda是10.2，目前不支持1.2及其以下的GPU版的pytorch，重安装太麻烦了，我就只能在后边解决问题了）二、下载SSD-Pytorch git项目...
复制链接

扫一扫