从零到一使用mmdetection 构建自己的目标探测网络

junlaii

已于 2022-10-16 18:37:31 修改

阅读量1.6k

点赞数 1

文章标签：深度学习 pytorch 人工智能

于 2022-10-12 17:24:22 首次发布

本文链接：https://blog.csdn.net/qq2474842866/article/details/127283506

版权

目标：在服务器构建mmdetection的目标探测模型

更新时间 2022.10.12 （准备环境如下，亲测可行，如遇到问题，参照版本兼容等的变化）

环境准备
– ubuntu 18.04
– cuda版本：11.3
– torch 1.11.0+cu113
– torchvision 0.12.0+cu113
– mmcv-full 1.6.0
– mmdet 2.25.2

1.查看cuda版本（可以使用nvcc -V命令查看）

做好版本控制，否则在执行途中会因为版本不兼容而报各种缺失错误，选择自己显卡支持的cuda版本去安装
更改自己的cuda版本可以看这篇文章
更改cuda版本，
去官网下载runfile手动安装
在这里插入图片描述

2.安装mmcv-full ，不知道可以去官网查看自己cuda版本合适的mmcv版本安装

官网：链接
自动搜索下载命令

pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html

在占位符替换自己的版本信息
以下是我的机器安装

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html

安装成功的信息提示
在这里插入图片描述

3.安装torch和torchvision

执行命令

pip install torchvision==0.12.0+cu113

再安装

pip install torch==1.11.0+cu113

检查一下版本是否正确

pip list

在这里插入图片描述

4.安装mmdetection

进入用户主目录
github克隆项目

git clone https://github.com/open-mmlab/mmdetection.git

在这里插入图片描述

进入项目

cd mmdetection

安装相关依赖

pip install -r requirements/build.txt
pip install -v -e .

在这里插入图片描述

5.图像标注

在线标注：链接
点击 get start
拖拽训练图片
在这里插入图片描述
点击object dection 进行标注
插入一个自定义的标签,点击start project
可以选择polygon标注

导出coco/voc格式

博主这里以导出xml格式文件做后续处理。

6.设置数据目录

在mmdetection目录下

mkdir ./data/VOCdevkit/VOC2007

后续建立如下目录
在这里插入图片描述
Annotations存放标注好的xml文件
JPEGImages存放图片
ImagesSets存放四个txt文件，分别记录图片的前缀名字，换行分开

7.修改配置文件

第一步，修改标签类别

编辑文件 mmdetection/mmdet/datasets/voc.py
修改如下
在这里插入图片描述

第二步，修改mmdetection/mmdet/core/evaluation/class_names.py，将return修改为你的数据集的类别

第三步,修改模型的配置文件，以faster-rcnn为例

找到模型的母文件：mmdetection/configs/base/models/faster_rcnn_r50_fpn.py

搜索num_classes,修改为对应的类别个数
在这里插入图片描述

第四步修改数据集的配置文件(VOC格式)

路径：mmdetection/configs/base/datasets/voc0712.py
注释掉源代码，将其变更为
img_prefix=[data_root + ‘VOC2007/’]

完整路径设置

# dataset settings
dataset_type = 'VOCDataset'
data_root = 'data/VOC2007/'
# data_root = 'data/VOCdevkit/'


img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1000, 600),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='RepeatDataset',
        times=3,
        dataset=dict(
            type=dataset_type,
            ann_file=[
            	# 按照路径修改
            	data_root + 'ImageSets/Main/trainval.txt'
                
                # 注释掉源代码
                # data_root + 'VOC2007/ImageSets/Main/trainval.txt',
                # data_root + 'VOC2012/ImageSets/Main/trainval.txt'
            ],
          	# 照样修改
            img_prefix=[data_root],
            
            # 注释掉源代码
            # img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'],
            pipeline=train_pipeline)),
    val=dict(
        type=dataset_type,
        
        # 同上
        ann_file=data_root + 'ImageSets/Main/test.txt',
        img_prefix=data_root,
        
        # ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
        # img_prefix=data_root + 'VOC2007/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        
        # 同上
        ann_file=data_root + 'ImageSets/Main/test.txt',
        img_prefix=data_root,
        
        # ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
        # img_prefix=data_root + 'VOC2007/',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='mAP')

第五步，修改训练模型配置文件，faster-rcnn-res50

找到相应的训练模型文件：mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py

设置路径

_base_ = [
    '../_base_/models/faster_rcnn_r50_fpn.py', # 继承母模型配置文件
    
    # 使用voc格式数据集用voc0712.py数据集配置文件
    '../_base_/datasets/voc0712.py'，	       # 继承数据集配置文件
    # '../_base_/datasets/coco_detection.py',  # 继承数据集配置文件
    
    '../_base_/schedules/schedule_1x.py',      # 继承优化器配置文件
    '../_base_/default_runtime.py'	       # 继承运行配置文件
]

第六步，如果用的是刚才推荐的打标签网站，则要额外修改

修改mmdetection/mmdet/datasets/xml_style.py文件，将114行修改为difficult = 0 #if difficult is None else int(difficult.text)

在这里插入图片描述

8.训练模型

进入mmdetection文件夹的根目录下，运行以下代码

python tools/train.py ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py

如果报了路径错误，去xml_type.py下把后缀jpg改成大写JPG再试一试

运行成功如图：
在这里插入图片描述

9.voc xml格式转coco进行测试

方法一

将voc_to_coco_converter.py放在mmdetection目录下

在安装的环境执行

python ./voc_to_coco_converter.py

在VOC2007目录下的convert/coco/ 生成train.json
在这里插入图片描述

voc_to_coco_converter.py代码如下：

import xml.etree.ElementTree as ET
import os
import json
from datetime import datetime
import sys
import argparse

#初始化
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []

category_set = dict()
image_set = set()

category_item_id = -1
image_id = 000000
annotation_id = 0

#添加新的item实例
def addCatItem(name):
    global category_item_id
    category_item = dict()
    category_item['supercategory'] = 'none'
    category_item_id += 1
    category_item['id'] = category_item_id
    category_item['name'] = name
    coco['categories'].append(category_item)
    category_set[name] = category_item_id
    return category_item_id

#寻找路径并添加图片item
def addImgItem(file_name, size):
    global image_id
    if file_name is None:
        raise Exception('Could not find filename tag in xml file.')
    if size['width'] is None:
        raise Exception('Could not find width tag in xml file.')
    if size['height'] is None:
        raise Exception('Could not find height tag in xml file.')
    image_id += 1
    image_item = dict()
    image_item['id'] = image_id
    image_item['file_name'] = file_name
    image_item['width'] = size['width']
    image_item['height'] = size['height']
    image_item['license'] = None
    image_item['flickr_url'] = None
    image_item['coco_url'] = None
    image_item['date_captured'] = str(datetime.today())
    coco['images'].append(image_item)
    image_set.add(file_name)
    return image_id

#增加标注个例的item
def addAnnoItem(object_name, image_id, category_id, bbox):
    global annotation_id
    annotation_item = dict()
    annotation_item['segmentation'] = []
    seg = []
    # bbox[] is x,y,w,h
    # left_top
    seg.append(bbox[0])
    seg.append(bbox[1])
    # left_bottom
    seg.append(bbox[0])
    seg.append(bbox[1] + bbox[3])
    # right_bottom
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1] + bbox[3])
    # right_top
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1])

    annotation_item['segmentation'].append(seg)

    annotation_item['area'] = bbox[2] * bbox[3]
    annotation_item['iscrowd'] = 0
    annotation_item['ignore'] = 0
    annotation_item['image_id'] = image_id
    annotation_item['bbox'] = bbox
    annotation_item['category_id'] = category_id
    annotation_id += 1
    annotation_item['id'] = annotation_id
    coco['annotations'].append(annotation_item)

#读取图片列表，放置在数组中
def read_image_ids(image_sets_file):
    ids = []
    with open(image_sets_file, 'r') as f:
        for line in f.readlines():
            ids.append(line.strip())
    return ids

#转换xml文件
def parseXmlFilse(data_dir, json_save_path, split='train'):
    assert os.path.exists(data_dir), "data path:{} does not exist".format(data_dir)
    labelfile = split + ".txt"
    image_sets_file = os.path.join(data_dir, "ImageSets", "Main", labelfile)
    xml_files_list = []
    if os.path.isfile(image_sets_file):
        ids = read_image_ids(image_sets_file)
        xml_files_list = [os.path.join(data_dir, "Annotations", f"{i}.xml") for i in ids]
    elif os.path.isdir(data_dir):
        # 修改此处xml的路径即可
        # xml_dir = os.path.join(data_dir,"labels/voc")
        xml_dir = data_dir
        xml_list = os.listdir(xml_dir)
        xml_files_list = [os.path.join(xml_dir, i) for i in xml_list]

    for xml_file in xml_files_list:
        if not xml_file.endswith('.xml'):
            continue

        tree = ET.parse(xml_file)
        root = tree.getroot()

        # 初始化
        size = dict()
        size['width'] = None
        size['height'] = None

        if root.tag != 'annotation':
            raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))

        # 提取图片名字
        file_name = root.findtext('filename')
        assert file_name is not None, "filename is not in the file"

        # 提取图片 size {width,height,depth}
        size_info = root.findall('size')
        assert size_info is not None, "size is not in the file"
        for subelem in size_info[0]:
            size[subelem.tag] = int(subelem.text)

        if file_name is not None and size['width'] is not None and file_name not in image_set:
            # 添加coco['image'],返回当前图片ID
            current_image_id = addImgItem(file_name, size)
            print('add image with name: {}\tand\tsize: {}'.format(file_name, size))
        elif file_name in image_set:
            raise Exception('file_name duplicated')
        else:
            raise Exception("file name:{}\t size:{}".format(file_name, size))

        # 提取一张图片内所有目标object标注信息
        object_info = root.findall('object')
        if len(object_info) == 0:
            continue
        # 遍历每个目标的标注信息
        for object in object_info:
            # 提取目标名字
            object_name = object.findtext('name')
            if object_name not in category_set:
                # 创建类别索引
                current_category_id = addCatItem(object_name)
            else:
                current_category_id = category_set[object_name]

            # 初始化标签列表
            bndbox = dict()
            bndbox['xmin'] = None
            bndbox['xmax'] = None
            bndbox['ymin'] = None
            bndbox['ymax'] = None
            # 提取box:[xmin,ymin,xmax,ymax]
            bndbox_info = object.findall('bndbox')
            for box in bndbox_info[0]:
                bndbox[box.tag] = int(box.text)

            if bndbox['xmin'] is not None:
                if object_name is None:
                    raise Exception('xml structure broken at bndbox tag')
                if current_image_id is None:
                    raise Exception('xml structure broken at bndbox tag')
                if current_category_id is None:
                    raise Exception('xml structure broken at bndbox tag')
                bbox = []
                # x
                bbox.append(bndbox['xmin'])
                # y
                bbox.append(bndbox['ymin'])
                # w
                bbox.append(bndbox['xmax'] - bndbox['xmin'])
                # h
                bbox.append(bndbox['ymax'] - bndbox['ymin'])
                print('add annotation with object_name:{}\timage_id:{}\tcat_id:{}\tbbox:{}'.format(object_name,
                                                                                                   current_image_id,
                                                                                                   current_category_id,
                                                                                                   bbox))
                addAnnoItem(object_name, current_image_id, current_category_id, bbox)

    json_parent_dir = os.path.dirname(json_save_path)
    if not os.path.exists(json_parent_dir):
        os.makedirs(json_parent_dir)
    json.dump(coco, open(json_save_path, 'w'))
    print("class nums:{}".format(len(coco['categories'])))
    print("image nums:{}".format(len(coco['images'])))
    print("bbox nums:{}".format(len(coco['annotations'])))


if __name__ == '__main__':
    """
    脚本说明：
        本脚本用于将VOC格式的标注文件.xml转换为coco格式的标注文件.json
    参数说明：
        voc_data_dir:两种格式
            1.voc2012文件夹的路径，会自动找到voc2012/imageSets/Main/xx.txt
            2.xml标签文件存放的文件夹
        json_save_path:json文件输出的文件夹
        split:主要用于voc2012查找xx.txt,如train.txt.如果用格式2，则不会用到该参数
    """
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--voc-dir', type=str, default='data/label/voc', help='voc path')
    parser.add_argument('-s', '--save-path', type=str, default='./data/convert/coco/train.json', help='json save path')
    parser.add_argument('-t', '--type', type=str, default='train', help='only use in voc2012/2007')
    opt = parser.parse_args()
    if len(sys.argv) > 1:
        print(opt)
        parseXmlFilse(opt.voc_dir, opt.save_path, opt.type)
    else:
        voc_data_dir = './data/VOCdevkit/VOC2007/Annotations/'
        json_save_path = './data/VOCdevkit/VOC2007/convert/coco/train.json'
        #voc_data_dir = r'D:\dataset\VOC2012\VOCdevkit\VOC2012'
        #voc_data_dir = './data/labels/voc'
        #json_save_path = './data/convert/coco/train.json'
        split = 'train'
        parseXmlFilse(data_dir=voc_data_dir, json_save_path=json_save_path, split=split)

方法二

准备classes.txt文件放在VOC2007目录下
内容是设置的标签
在这里插入图片描述

voc0712.py修改相应的测试文件路径

    test=dict(
    	# 告诉模型，用的是Coco数据集类型
        type='CocoDataset',
        
        # 同上
        ann_file=data_root + 'annotations/coco.json',
        img_prefix='',
        
        # ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
        # img_prefix=data_root + 'VOC2007/',
        pipeline=test_pipeline)

修改./mmdet/datasets/coco.py相应的CLASSES的类别名
在这里插入图片描述

执行代码

python tools/dataset_converters/images2coco.py  data/VOCdevkit/VOC2007/JPEGImages  data/VOC2007/classes.txt  coco.json

开始测试

python tools/test.py  ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py  ./work_dirs/faster_rcnn_r50_fpn_1x_coco/latest.pth
--show-dir results   --format-only  --eval-options "jsonfile_prefix=./results"

检测结果的信息以results.bbox.json存储在根目录下

10.可视化

安装seaborn

pip install seaborn

运行可视化前，确保安装了seaborn

日志mAP可视化

python tools/analysis_tools/analyze_logs.py plot_curve  ./work_dirs/faster_rcnn_r50_fpn_1x_coco/20220501_151937.log.json 	
--keys mAP --legend mAP --out mAP_results.png

可视化结果保存至，mmdetection/mAP_results.png

如果用的是yolo系列模型还可以将loss可视化,将上述代码改为:

python tools/analysis_tools/analyze_logs.py plot_curve  ./work_dirs/faster_rcnn_r50_fpn_1x_coco/20220501_151937.log.json  --keys loss_cls loss_bbox --legend loss_cls loss_bbox --out loss_result.png

效果图
在这里插入图片描述