基于Faster-rcnn的安全帽检测（使用Colab进行训练）

最新推荐文章于 2025-04-13 21:09:51 发布

Creep___

最新推荐文章于 2025-04-13 21:09:51 发布

阅读量1.7k

点赞数 2

分类专栏：目标检测文章标签：深度学习 pytorch 目标检测

本文链接：https://blog.csdn.net/qq_43312130/article/details/123647987

版权

目标检测专栏收录该内容

2 篇文章

订阅专栏

基于Faster-rcnn的安全帽检测（使用Colab进行训练）

前言
源码与数据集下载
数据预处理

前言

寒假学习了Faster-rcnn算法流程并且阅读了别人复现的代码，感觉收获颇多，但总感觉缺乏实践，因此找了一份开源的数据集，希望能够通过这个项目入门目标检测以及加深对Faster-rcnn的理解。

源码与数据集下载

别人复现的Faster-RCNN源码
 源码对应的博客
 数据集地址
 可以在Colab上进行训练的源码说明: 我是在这份源码的基础上进行修改的

数据预处理

数据集目录结构

源码是按照VOC格式的数据进行处理，因此下载安全帽数据集，并把其整理为VOC格式。
在这里插入图片描述

划分训练、验证、测试集

运行步骤

首先在VOC2028文件夹下新建voc_classes.txt文件，包含数据集包含的类别的名字。顺序没有关系，但是名字一定要和xml文件里object里的name匹配，person不能在这里写成people
运行voc_annotation.py文件划分训练、验证、测试集。

补充说明

在voc_annotation.py文件中，有多种模式：
0：干了两件事。

1. 在ImageSets目录产生四个txt文件，包括训练集、验证集、测试集的图像id，但是不包括后缀。在跑安全帽数据的时候，我记得有些图片后缀是JPG（大写），导致后面读取数据出错，记得把后缀名都统一为小写的jpg。
1. 根据上一步产生的txt文件生成对应的包含标注信息的txt文件。
  格式是：img_path box1 box2...boxn其中boxi的格式是x1, y1, x2, y2 类别的id

第一次运行的mode设为0即可

项目根目录

关于为什么要在前面写这几行代码的：

# 建议在本机上op = 1先生成txt文件再上传到云盘，会快很多
op = 0 # 0 for 本机 1 for colab
data_year = '2028'
if op == 0:
    project_root = pathlib.Path(__file__).parent
else:
    project_root = pathlib.Path('/content/drive/MyDrive/faster-rcnn-pytorch-master')

在Colab中，但凡涉及到路径的，一定要使用绝对路径，不能用相对路径，不然会出现no file or directory等错误，所以需要获得项目的根路径，然后后面凡是涉及到xxx_path的都要和根目录进行拼接。
我发现在Colab上运行这份代码，产生txt文件的过程特别慢，我想大概是程序需要从云盘里一个个文件的去读取所以很慢，所以建议在本机上op = 1先生成txt文件再上传到云盘，会快很多。

数据可视化

我认为碰到任何一个数据集，首先得把标注box标在原图上，观察目标的大小、数目等特点，才方便去调参。

在utils/utils.py文件，draw_annotation能够实现把标注信息标在原图上。有两种模式：

batch = True：批量标注
batch = False：单张标注

def draw_annotation(img_path, annot_path, save_path, classes_path, batch=False):
	"""
	img_path:原图的目录路径
	annot_path:标注文件的路径
	save_path:把框画在原图上之后保存的目录路径
	"""
    font = cv2.FONT_HERSHEY_SIMPLEX
    class_names, num_classes = get_classes(classes_path)
    if batch:
        print('start draw ground truth in batch way')
        f = open(annot_path, 'r', encoding='UTF-8')
        annot = f.readlines()
        num = 0
        
        for line in tqdm(annot):
            line = line.split()

            img = cv2.imread(line[0])
            img_name = os.path.basename(line[0])
            img_copy = img.copy()
            for box in line[1:]:
                box = box.split(',')
                cv2.rectangle(img_copy, (int(box[0]), int(box[1])), (int(
                    box[2]), int(box[3])), color=(0, 255, 0), thickness=3)
                label = class_names[int(box[-1])]
                cv2.putText(img_copy, label, (int(box[0]), int(box[1])), font,
                            1.2, (0, 0, 255), 2)
            if not os.path.exists(save_path):
                os.mkdir(os.path.join(save_path))
            cv2.imwrite(os.path.join(save_path, img_name), img_copy)
            num += 1
        print('draw {} ground truth in batch way done!'.format(num))
    else:
        img_name = os.path.basename(img_path)
        img = cv2.imread(img_path)
        # img = img[:,:,::-1]

        f = open(annot_path, 'r')
        annot = f.readlines()

        a = img.copy()
        for line in annot:
            line = line.split()
            annot_name = os.path.basename(line[0])
            if annot_name[:-4] == img_name[:-4]:
                for box in line[1:]:
                    box = box.split(',')
                    cv2.rectangle(a, (int(box[0]), int(box[1])), (int(
                        box[2]), int(box[3])), color=(0, 0, 255))
        cv2.imshow('1', a)
        cv2.waitKey(0)

我是放在了和原数据同级目录下，在之后训好模型之后再测试集上把预测的框画在原图上也保存起来，这样真值和预测值都可视化了，就方便观察了。

# 把这个函数跑三次，路径也要相应的改
# 第一次
draw_annotation(img_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\JPEGImages', 
                    annot_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\train.txt', 
                    save_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\Trainval-Ground-truth', 
                    classes_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\voc_classes.txt', 
                    batch=True)

# 第二次
draw_annotation(img_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\JPEGImages', 
                    annot_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\val.txt', 
                    save_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\Trainval-Ground-truth', 
                    classes_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\voc_classes.txt', 
                    batch=True)

# 第三次
draw_annotation(img_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\JPEGImages', 
                    annot_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\test.txt', 
                    save_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\Test-Ground-truth', 
                    classes_path=r'D:\faster-rcnn-pytorch-master\VOCdevkit\VOC2028\voc_classes.txt', 
                    batch=True)

在这里插入图片描述

训练

模型

源码实现了vgg16和resnet50作为backbone，考虑到resnet50可能会比较大，因此我只用了vgg16训练。
首先在nets/vgg16.py文件里第三行的load_state_dict_from_url函数修改一下包来源：from torch.hub import load_state_dict_from_url
在这里插入图片描述

参数设置

在utils\config.py文件里进行超参数的设置。
根据第二步数据可视化的结果来看，这个数据集的目标既有大而稀，也有小而密的，因此我在anchor_size的设置上，相比原论文就会多一些，并且base_size也要调小一点。
在这里插入图片描述

开始训练

建议下载原博主用vgg16为backbone在VOC训练好的模型，放到model_data，加载这个模型后再训练安全帽数据集，效果会好很多。
在这里插入图片描述
新建一个Colaboratory，输入以下命令行即可。

from google.colab import drive
drive.mount('/content/drive')
!pip install yacs
!python drive/MyDrive/faster-rcnn-pytorch-master/train.py

Colab每运行12h就会中断，下次再进行训练时可以使用以下命令：

from google.colab import drive
drive.mount('/content/drive')
!pip install yacs
!python drive/MyDrive/faster-rcnn-pytorch-master/train.py --checkpoint exp/VOC2028/vgg16/checkpoint.pth

获得结果

保存网络参数

在Colab训练到17个Epoch之后就迫不及待的想要测一下结果，测试的话我认为还是在本地会更方便。
可以看到参数文件有1.5G左右大小（原因是不仅保存了网络参数，还保存了优化器、lr_schedule等的参数，目的是中断后重新训练更方便），相信使用Colab的同学都是缺卡的（哭唧唧），这么大的参数我在本地笔记本上用torch.load显存会爆，所以只能在Colab上读取现有的model_best.pth文件，然后把网络的参数单独保存为一个pth文件，这样文件就只有500+M大小了。
在Colab上用以下命令即可保存网络的参数到项目根目录下的model_best.pth文件。

from google.colab import drive
drive.mount('/content/drive')
import os
import torch
project_root = r"/content/drive/MyDrive/faster-rcnn-pytorch-master"
model_path = r"exp/VOC2028/vgg16/model_best.pth"
pretrained_dict = torch.load(os.path.join(project_root,model_path), map_location=torch.device('cpu'))['state_dict']
torch.save(pretrained_dict, r"/content/drive/MyDrive/faster-rcnn-pytorch-master/model_best.pth", _use_new_zipfile_serialization=False)