[学习笔记] Nanodet模型

绯雨千叶

已于 2023-09-22 11:03:18 修改

阅读量586

点赞数 2

分类专栏：人工智能文章标签：人工智能 python 目标检测神经网络 pytorch

于 2023-09-14 18:00:36 首次发布

本文链接：https://blog.csdn.net/qq_58664081/article/details/132629596

版权

人工智能专栏收录该内容

14 篇文章 1 订阅

订阅专栏

此文章记录了作者学习使用Nanodet模型的一些心得和踩过的坑，如有错误或问题，欢迎指出并共同探讨。

一、Nanodet模型下载

Github传送门：RangiLyu/nanodet：NanoDet-Plus⚡超快速轻量级无锚目标检测模型。🔥仅 980 KB（int8） / 1.8MB （fp16），可在手机上🔥运行 97FPS (github.com)

二、配置运行环境

能从各种渠道了解到Nanodet模型的人，想必应该不是零基础，所以我简要概括一下就好。

1.Anaconda

老生常谈的虚拟环境配置，Ubuntu或者Windows都推荐下载一个，Python=3.9就好。

传送门：Anaconda | The World’s Most Popular Data Science Platform

conda create -n 你的环境名 python=版本号
conda activate 环境名  #进环境| 
conda deactivate       #出环境

2.requirements

全部在虚拟环境中下载好。

CUDA、cudnn、以及Pytorch、torchvison是四大报错天坑，4个版本都需要互相对应匹配。
请各位依据自己的GPU型号下载合适的版本，不适合下载和作者一样的。
篇幅有限，详情请移步其他文章仔细查阅下载对应版本。

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple/ 库名称

坑：

①.numpy版本不适合太高，否则会报错。如报错请降级numpy版本。

②.如果pip安装pycocotools卡住，请尝试：

conda install -c esri pycocotools

3.Nanodet包

文件的根目录下有一个setup.py的包

在Python解释器终端或者外部终端，进入虚拟环境之后，运行下方命令就可以安装nanodet包

python setup.py develop

有显示如下信息就是安装成功了

三、数据集标注制作

1.标注数据集

常用的Labelbox，RectLabel，VGG Image Annotator (VIA)，LabelImg都可以应用于图像标注。

我个人用的比较多的是makesense：Make Sense

这样就有了一个xml文件夹

2.数据预处理

数据集太少了，可以用数据增强稍微扩充一下，以下是“随机旋转”这一方法的数据增强代码，不够完善，遇到没有object框的xml文件会报错然后停止：

import xml.etree.ElementTree as ET
import pickle
import os
from os import getcwd
import numpy as np
from PIL import Image
import shutil
import matplotlib.pyplot as plt
import imgaug as ia
from imgaug import augmenters as iaa
ia.seed(1)
def read_xml_annotation(root, image_id):
    in_file = open(os.path.join(root, image_id))
    tree = ET.parse(in_file)
    root = tree.getroot()
    bndboxlist = []

    for object in root.findall('object'):  # 找到root节点下的所有country节点
        bndbox = object.find('bndbox')  # 子节点下节点rank的值

        xmin = int(bndbox.find('xmin').text)
        xmax = int(bndbox.find('xmax').text)
        ymin = int(bndbox.find('ymin').text)
        ymax = int(bndbox.find('ymax').text)
        # print(xmin,ymin,xmax,ymax)
        bndboxlist.append([xmin, ymin, xmax, ymax])
        # print(bndboxlist)

    bndbox = root.find('object').find('bndbox')
    return bndboxlist
# (506.0000, 330.0000, 528.0000, 348.0000) -> (520.4747, 381.5080, 540.5596, 398.6603)
def change_xml_annotation(root, image_id, new_target):
    new_xmin = new_target[0]
    new_ymin = new_target[1]
    new_xmax = new_target[2]
    new_ymax = new_target[3]

    in_file = open(os.path.join(root, str(image_id) + '.xml'))  # 这里root分别由两个意思
    tree = ET.parse(in_file)
    xmlroot = tree.getroot()
    object = xmlroot.find('object')
    bndbox = object.find('bndbox')
    xmin = bndbox.find('xmin')
    xmin.text = str(new_xmin)
    ymin = bndbox.find('ymin')
    ymin.text = str(new_ymin)
    xmax = bndbox.find('xmax')
    xmax.text = str(new_xmax)
    ymax = bndbox.find('ymax')
    ymax.text = str(new_ymax)
    tree.write(os.path.join(root, str("%06d" % (str(id) + '.xml'))))


def change_xml_list_annotation(root, image_id, new_target, saveroot, id,img_name):
    in_file = open(os.path.join(root, str(image_id) + '.xml'))  # 这里root分别由两个意思
    tree = ET.parse(in_file)
    elem = tree.find('filename')
    elem.text = (img_name + str("_%06d" % int(id)) + '.png')  # 图片格式
    xmlroot = tree.getroot()
    index = 0

    for object in xmlroot.findall('object'):  # 找到root节点下的所有country节点
        bndbox = object.find('bndbox')  # 子节点下节点rank的值

        # xmin = int(bndbox.find('xmin').text)
        # xmax = int(bndbox.find('xmax').text)
        # ymin = int(bndbox.find('ymin').text)
        # ymax = int(bndbox.find('ymax').text)

        new_xmin = new_target[index][0]
        new_ymin = new_target[index][1]
        new_xmax = new_target[index][2]
        new_ymax = new_target[index][3]

        xmin = bndbox.find('xmin')
        xmin.text = str(new_xmin)
        ymin = bndbox.find('ymin')
        ymin.text = str(new_ymin)
        xmax = bndbox.find('xmax')
        xmax.text = str(new_xmax)
        ymax = bndbox.find('ymax')
        ymax.text = str(new_ymax)

        index = index + 1

    tree.write(os.path.join(saveroot, img_name + str("_%06d" % int(id)) + '.xml'))


def mkdir(path):
    # 去除首位空格
    path = path.strip()
    # 去除尾部 \ 符号
    path = path.rstrip("\\")
    # 判断路径是否存在
    # 存在     True
    # 不存在   False
    isExists = os.path.exists(path)
    # 判断结果
    if not isExists:
        # 如果不存在则创建目录
        # 创建目录操作函数
        os.makedirs(path)
        print(path + ' 创建成功')
        return True
    else:
        # 如果目录存在则不创建，并提示目录已存在
        print(path + ' 目录已存在')
        return False


if __name__ == "__main__":

    IMG_DIR = "coco_plate_head_3.1/archive/images"           # 原始数据集图像的路径
    XML_DIR = "coco_plate_head_3.1/archive/xml"              # 原始xml文件的路径
    AUG_XML_DIR = "coco_plate_head_3.1/train_xml_archive"    # 数据增强后的xml文件的保存路径
    try:
        shutil.rmtree(AUG_XML_DIR)
    except FileNotFoundError as e:
        a = 1
    mkdir(AUG_XML_DIR)
    AUG_IMG_DIR = "coco_plate_head_3.1/train_img_archive"    # 数据增强后图片的保存路径
    try:
        shutil.rmtree(AUG_IMG_DIR)
    except FileNotFoundError as e:
        a = 1
    mkdir(AUG_IMG_DIR)

    AUGLOOP = 10  # 每张影像增强额外副本的数量


    boxes_img_aug_list = []
    new_bndbox = []
    new_bndbox_list = []

    # 影像增强
    seq = iaa.Sequential([
        iaa.Flipud(0.5),  # vertically flip 20% of all images
        iaa.Fliplr(0.5),  # 镜像
        iaa.Multiply((1.2, 1.5)),  # change brightness, doesn't affect BBs
        iaa.GaussianBlur(sigma=(0, 3.0)),  # iaa.GaussianBlur(0.5),
        iaa.Affine(
            translate_px={"x": 15, "y": 15},
            scale=(0.8, 0.95),
            rotate=(-30, 30)
        )  # translate by 40/60px on x/y axis, and scale to 50-70%, affects BBs
    ])

    for root, sub_folders, files in os.walk(XML_DIR):

        for name in files:
            print(name)
            bndbox = read_xml_annotation(XML_DIR, name)
            shutil.copy(os.path.join(XML_DIR, name), AUG_XML_DIR)
            shutil.copy(os.path.join(IMG_DIR, name[:-4] + '.png'), AUG_IMG_DIR)

            for epoch in range(AUGLOOP):
                seq_det = seq.to_deterministic()  # 保持坐标和图像同步改变，而不是随机
                # 读取图片
                img = Image.open(os.path.join(IMG_DIR, name[:-4] + '.png'))
                # sp = img.size
                img = np.asarray(img)
                # bndbox 坐标增强
                for i in range(len(bndbox)):
                    bbs = ia.BoundingBoxesOnImage([
                        ia.BoundingBox(x1=bndbox[i][0], y1=bndbox[i][1], x2=bndbox[i][2], y2=bndbox[i][3]),
                    ], shape=img.shape)

                    bbs_aug = seq_det.augment_bounding_boxes([bbs])[0]
                    boxes_img_aug_list.append(bbs_aug)

                    # new_bndbox_list:[[x1,y1,x2,y2],...[],[]]
                    n_x1 = int(max(1, min(img.shape[1], bbs_aug.bounding_boxes[0].x1)))
                    n_y1 = int(max(1, min(img.shape[0], bbs_aug.bounding_boxes[0].y1)))
                    n_x2 = int(max(1, min(img.shape[1], bbs_aug.bounding_boxes[0].x2)))
                    n_y2 = int(max(1, min(img.shape[0], bbs_aug.bounding_boxes[0].y2)))
                    if n_x1 == 1 and n_x1 == n_x2:
                        n_x2 += 1
                    if n_y1 == 1 and n_y2 == n_y1:
                        n_y2 += 1
                    if n_x1 >= n_x2 or n_y1 >= n_y2:
                        print('error', name)
                    new_bndbox_list.append([n_x1, n_y1, n_x2, n_y2])
                # 存储变化后的图片
                image_aug = seq_det.augment_images([img])[0]
                path = os.path.join(AUG_IMG_DIR,
                                    name[:-4] + str( "_%06d" % (epoch + 1)) + '.png')
                image_auged = bbs.draw_on_image(image_aug, thickness=0)
                Image.fromarray(image_auged).save(path)

                # 存储变化后的XML
                change_xml_list_annotation(XML_DIR, name[:-4], new_bndbox_list, AUG_XML_DIR,
                                           epoch + 1,name[:-4])
                print( name[:-4] + str( "_%06d" % (epoch + 1)) + '.png')
                new_bndbox_list = []

这样，我们就有了数据增强之后的图片和xml文件，我建议按照9:1或者8:2的方式分为训练集和验证集。

然后把xml文件转为一个.json文件

# 将文件夹内的所有xml文件转化为json格式

import xml.etree.ElementTree as ET
import os
import json

coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []

category_set = dict()
image_set = set()

category_item_id = 0
# image_id = 'ball-'
image_id = 0
id_num = 0
annotation_id = 0


def addCatItem(name):
    global category_item_id
    category_item = dict()
    category_item['supercategory'] = 'none'
    category_item_id += 1
    category_item['id'] = category_item_id
    category_item['name'] = name
    coco['categories'].append(category_item)
    category_set[name] = category_item_id
    return category_item_id


def addImgItem(file_name, size):
    global image_id, id_num
    if file_name is None:
        raise Exception('Could not find filename tag in xml file.')
    if size['width'] is None:
        raise Exception('Could not find width tag in xml file.')
    if size['height'] is None:
        raise Exception('Could not find height tag in xml file.')

    image_item = dict()
    # temp = str(id_num)
    temp = int(id_num)
    # image_item['id'] = image_id + temp
    image_item['id'] = temp
    id_num += 1
    image_item['file_name'] = file_name
    image_item['width'] = size['width']
    image_item['height'] = size['height']
    coco['images'].append(image_item)
    image_set.add(file_name)
    return image_item['id']


def addAnnoItem(object_name, image_id, category_id, bbox):
    global annotation_id
    annotation_item = dict()
    annotation_item['segmentation'] = []
    seg = []
    # bbox[] is x,y,w,h
    # left_top
    seg.append(bbox[0])
    seg.append(bbox[1])
    # left_bottom
    seg.append(bbox[0])
    seg.append(bbox[1] + bbox[3])
    # right_bottom
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1] + bbox[3])
    # right_top
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1])
    annotation_item['segmentation'].append(seg)
    annotation_item['area'] = bbox[2] * bbox[3]
    annotation_item['iscrowd'] = 0
    annotation_item['ignore'] = 0
    annotation_item['image_id'] = image_id
    annotation_item['bbox'] = bbox
    annotation_item['category_id'] = category_id
    annotation_id += 1
    annotation_item['id'] = annotation_id

    coco['annotations'].append(annotation_item)

def parseXmlFiles(xml_path):
    for f in os.listdir(xml_path):
        if not f.endswith('.xml'):
            continue

        bndbox = dict()
        size = dict()
        current_image_id = None
        current_category_id = None
        file_name = None
        size['width'] = None
        size['height'] = None
        size['depth'] = None

        xml_file = os.path.join(xml_path, f)
        print(xml_file)

        tree = ET.parse(xml_file)
        root = tree.getroot()
        if root.tag != 'annotation':
            raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))

        # elem is <folder>, <filename>, <size>, <object>
        for elem in root:
            current_parent = elem.tag
            current_sub = None
            object_name = None

            if elem.tag == 'folder':
                continue
            if elem.tag == 'filename':
                file_name = elem.text
                if file_name in category_set:
                    raise Exception('file_name duplicated')

            # add img item only after parse <size> tag
            elif current_image_id is None and file_name is not None and size['width'] is not None:
                if file_name not in image_set:
                    current_image_id = addImgItem(file_name, size)
                    print('add image with {} and {}'.format(file_name, size))
                else:
                    raise Exception('duplicated image: {}'.format(file_name))
                    # subelem is <width>, <height>, <depth>, <name>, <bndbox>
            for subelem in elem:
                bndbox['xmin'] = None
                bndbox['xmax'] = None
                bndbox['ymin'] = None
                bndbox['ymax'] = None

                current_sub = subelem.tag
                if current_parent == 'object' and subelem.tag == 'name':
                    object_name = subelem.text
                    if object_name not in category_set:
                        current_category_id = addCatItem(object_name)
                    else:
                        current_category_id = category_set[object_name]

                elif current_parent == 'size':
                    if size[subelem.tag] is not None:
                        raise Exception('xml structure broken at size tag.')
                    size[subelem.tag] = int(subelem.text)

                # option is <xmin>, <ymin>, <xmax>, <ymax>, when subelem is <bndbox>
                for option in subelem:
                    if current_sub == 'bndbox':
                        if bndbox[option.tag] is not None:
                            raise Exception('xml structure corrupted at bndbox tag.')
                        bndbox[option.tag] = int(option.text)
                # 仅在解析<object>标签后
                if bndbox['xmin'] is not None:
                    if object_name is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_image_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_category_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    bbox = []
                    bbox.append(bndbox['xmin'])  # x
                    bbox.append(bndbox['ymin'])  # y
                    bbox.append(bndbox['xmax'] - bndbox['xmin'])  # w
                    bbox.append(bndbox['ymax'] - bndbox['ymin'])  # h
                    print('add annotation with {},{},{},{}'.format(object_name, current_image_id, current_category_id,
                                                                   bbox))
                    addAnnoItem(object_name, current_image_id, current_category_id, bbox)

if __name__ == '__main__':

    xml_path = "../coco_plate_head_3.1/train_xml"  ## 原始的xml文件夹路径
    json_file = '../coco_plate_head_3.1/train.json'  ## 转后保存.json文件的路径

    parseXmlFiles(xml_path)
    json.dump(coco, open(json_file, 'w'))

这样就得到了.json格式的标注文件。不过要留意一下.json文件末尾的标签顺序：

四、训练

1.修改.yml配置文件

根据README文档中的表，使用第一个就好了，在下图位置修改

其中，要实际只是训练不更改其他重要配置（backbone等）的话，要修改的部分还是比较少的。

模型保存路径：

类别数量（改成你标签的数量）：

训练集和验证集的位置（CocoDataset是数据集类型）：

训练设备，batchsize根据自己显存定，尽量吃满显存，官方文档中1060 6G可以设置为80，我用的是3090，所以设置的比较大（最好为8的倍数）。

加载之前的模型训练，默认是注释掉的，如果要训练之前的模型就解除注释，然后路径设置为之前训练文件的model_last.ckpt

总训练轮数（不是越多越好）当观察到验证集上的指标（如 mAP 等）开始下降或不再提升时，就要停止了，继续训练会过拟合。

验证间隔轮数（隔几轮会验证一下，方便观察）

类别名称（根据你json的顺序写）：

2.训练

终端输入就可以开始训练了

python tools/train.py ./config/legacy_v0.x_configs/nanodet-m.yml

最后的权重文件会保存在配置文件顶部预先写的路径

五、测试

config路径就是之前的.yml配置文件路径，model路径就是训练好的模型权重文件.pth

#图片测试
python demo/demo.py image --config CONFIG_PATH --model MODEL_PATH --path IMAGE_PATH

#视频测试
python demo/demo.py video --config CONFIG_PATH --model MODEL_PATH --path VIDEO_PATH

#摄像头测试
python demo/demo.py webcam --config CONFIG_PATH --model MODEL_PATH --path 0

大概这样