Charley- Datawhale AI 夏令营 第五期 CV Task1 学习笔记

代码解读

库文件准备

首先需要下载一些需要的第三方库:

  1. opencv-python: 包含了cv2,用来处理图像和保存图片
  2. pandas: 是python中处理特定格式文件的第三方库
  3. matplotliib: 是python中绘制图表的第三方库
  4. ultralytics: 是包含了yolo模型的第三方库
!/opt/miniconda/bin/pip install opencv-python pandas matplotlib ultralytics
import os, sys
import cv2, glob, json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
!apt install zip unzip -y
!apt install unar -y

!wget "https://comp-public-prod.obs.cn-east-3.myhuaweicloud.com/dataset/2024/%E8%AE%AD%E7%BB%83%E9%9B%86%28%E6%9C%89%E6%A0%87%E6%B3%A8%E7%AC%AC%E4%B8%80%E6%89%B9%29.zip?AccessKeyId=583AINLNMLDRFK7CC1YM&Expires=1739168844&Signature=9iONBSJORCS8UNr2m/VZnc7yYno%3D" -O 训练集\(有标注第一批\).zip
!unar -q 训练集\(有标注第一批\).zip

!wget "https://comp-public-prod.obs.cn-east-3.myhuaweicloud.com/dataset/2024/%E6%B5%8B%E8%AF%95%E9%9B%86.zip?AccessKeyId=583AINLNMLDRFK7CC1YM&Expires=1739168909&Signature=CRsB54VqOtrzIdUHC3ay0l2ZGNw%3D" -O 测试集.zip
!unar -q 测试集.zip

json文件的读取方法

json.load是第三方库json提供的一个读取json文件的方法,可以将json文件按照格式转化为python对象,此处为列表

train_anno[0]是json文件的第1个违法行为信息,可以看到该训练文件一共有1688个违法行为

train_anno = json.load(open('训练集(有标注第一批)/标注/45.json', encoding='utf-8'))
train_anno[0], len(train_anno)

# ({'frame_id': 0,
#  'event_id': 1,
#  'category': '非机动车违停',
#  'bbox': [746, 494, 988, 786]},
# 1688)

video文件的解析方法

通过cv2.VideoCapture方法可以获取一个视频的迭代器,对cap用read方法即可读取每一帧

通过frame.shape获得该视频的规模,分辨率为1080x1920,三通道

cv2.CAP_PROP_FRAME_COUNT是cv2的一个关键字,此处用cap.get(*)获取45.mp4的帧数,为422

接下来将该帧加上识别框并通过plt.imshow进行绘制

video_path = '训练集(有标注第一批)/视频/45.mp4'
cap = cv2.VideoCapture(video_path)
while True:
    # 读取下一帧
    ret, frame = cap.read()
    if not ret:
        break
    break    
    
	frame.shape
	# (1080, 1920, 3)
    
	int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
	# 422
    
    bbox = [746, 494, 988, 786]

    pt1 = (bbox[0], bbox[1])
    pt2 = (bbox[2], bbox[3])

    color = (0, 255, 0) 
    thickness = 2  # 线条粗细

    cv2.rectangle(frame, pt1, pt2, color, thickness)

    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    plt.imshow(frame)

yolo模型的文件配置和数据集的准备

创建文件夹和配置yaml文件

yaml文件中保存了:

  1. path: 根目录
  2. tran: 训练数据集的相对路径,其中保存了训练集和标记
  3. val: 评估数据集的相对路径,其中保存了评估集和标记
  4. names: 索引,对应event_id
if not os.path.exists('yolo-dataset/'):
    os.mkdir('yolo-dataset/')
if not os.path.exists('yolo-dataset/train'):
    os.mkdir('yolo-dataset/train')
if not os.path.exists('yolo-dataset/val'):
    os.mkdir('yolo-dataset/val')

dir_path = os.path.abspath('./') + '/'

# 需要按照你的修改path
with open('yolo-dataset/yolo.yaml', 'w', encoding='utf-8') as up:
    up.write(f'''
path: {dir_path}/yolo-dataset/
train: train/
val: val/

names:
    0: 非机动车违停
    1: 机动车违停
    2: 垃圾桶满溢
    3: 违法经营
''')

将训练集文件名进行列表并排序

train_annos = glob.glob('训练集(有标注第一批)/标注/*.json')
train_videos = glob.glob('训练集(有标注第一批)/视频/*.mp4')
train_annos.sort(); train_videos.sort();

category_labels = ["非机动车违停", "机动车违停", "垃圾桶满溢", "违法经营"]

将训练集导入到yolo-dataset/train中

帧保存为’./yolo-dataset/train/*_idx.jpg‘

对应的违法信息按照yolo模型的标记格式保存为同名的’./yolo-dataset/train/*_idx.json‘

json文件中按照格式保存有category_idx, x_center, y_center, width, height:

  1. category_idx: 违法行为标记
  2. <x_center><y_center>是边界框中心点相对于图像宽度和高度的比例
  3. <width><height>是边界框的宽度和高度相对于图像宽度和高度的比例
for anno_path, video_path in zip(train_annos[:5], train_videos[:5]):
    print(video_path)
    anno_df = pd.read_json(anno_path)
    cap = cv2.VideoCapture(video_path)
    frame_idx = 0 
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        img_height, img_width = frame.shape[:2]
        
        frame_anno = anno_df[anno_df['frame_id'] == frame_idx]
        cv2.imwrite('./yolo-dataset/train/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.jpg', frame)

        if len(frame_anno) != 0:
            with open('./yolo-dataset/train/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.txt', 'w') as up:
                for category, bbox in zip(frame_anno['category'].values, frame_anno['bbox'].values):
                    category_idx = category_labels.index(category)
                    
                    x_min, y_min, x_max, y_max = bbox
                    x_center = (x_min + x_max) / 2 / img_width
                    y_center = (y_min + y_max) / 2 / img_height
                    width = (x_max - x_min) / img_width
                    height = (y_max - y_min) / img_height

                    if x_center > 1:
                        print(bbox)
                    up.write(f'{category_idx} {x_center} {y_center} {width} {height}\n')
        
        frame_idx += 1

同样的方法,将评估集导入到yolo-dataset/val中

for anno_path, video_path in zip(train_annos[-3:], train_videos[-3:]):
    print(video_path)
    anno_df = pd.read_json(anno_path)
    cap = cv2.VideoCapture(video_path)
    frame_idx = 0 
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        img_height, img_width = frame.shape[:2]
        
        frame_anno = anno_df[anno_df['frame_id'] == frame_idx]
        cv2.imwrite('./yolo-dataset/val/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.jpg', frame)

        if len(frame_anno) != 0:
            with open('./yolo-dataset/val/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.txt', 'w') as up:
                for category, bbox in zip(frame_anno['category'].values, frame_anno['bbox'].values):
                    category_idx = category_labels.index(category)
                    
                    x_min, y_min, x_max, y_max = bbox
                    x_center = (x_min + x_max) / 2 / img_width
                    y_center = (y_min + y_max) / 2 / img_height
                    width = (x_max - x_min) / img_width
                    height = (y_max - y_min) / img_height

                    up.write(f'{category_idx} {x_center} {y_center} {width} {height}\n')
        
        frame_idx += 1

下载模型搭建好的模型

!wget http://mirror.coggle.club/yolo/yolov8n-v8.2.0.pt -O yolov8n.pt

下载Ultralytics的arial字体文件

!mkdir -p ~/.config/Ultralytics/
!wget http://mirror.coggle.club/yolo/Arial.ttf -O ~/.config/Ultralytics/Arial.ttf

模型的训练

导入下载好的模型,调用train方法,设置好参数就可以开始运行了:

  1. epochs: 训练的轮数
  2. imgsz: 图片大小
  3. batch: 训练一次打包的图片数

其中每一轮训练结束后,yolo还会自动将训练好的模型加参数保存为文件方便下一次调用

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import warnings
warnings.filterwarnings('ignore')


from ultralytics import YOLO
model = YOLO("yolov8n.pt")
results = model.train(data="yolo-dataset/yolo.yaml", epochs=2, imgsz=1080, batch=16)

模型的评估

在训练好模型后,用训练好的模型去评估测试集,将测试集的结果保存到result文件夹中

category_labels = ["非机动车违停", "机动车违停", "垃圾桶满溢", "违法经营"]

if not os.path.exists('result/'):
    os.mkdir('result')
from ultralytics import YOLO
model = YOLO("runs/detect/train/weights/best.pt")
import glob

for path in glob.glob('测试集/*.mp4'):
    submit_json = []
    results = model(path, conf=0.05, imgsz=1080,  verbose=False)
    for idx, result in enumerate(results):
        boxes = result.boxes  # Boxes object for bounding box outputs
        masks = result.masks  # Masks object for segmentation masks outputs
        keypoints = result.keypoints  # Keypoints object for pose outputs
        probs = result.probs  # Probs object for classification outputs
        obb = result.obb  # Oriented boxes object for OBB outputs

        if len(boxes.cls) == 0:
            continue
        
        xywh = boxes.xyxy.data.cpu().numpy().round()
        cls = boxes.cls.data.cpu().numpy().round()
        conf = boxes.conf.data.cpu().numpy()
        for i, (ci, xy, confi) in enumerate(zip(cls, xywh, conf)):
            submit_json.append(
                {
                    'frame_id': idx,
                    'event_id': i+1,
                    'category': category_labels[int(ci)],
                    'bbox': list([int(x) for x in xy]),
                    "confidence": float(confi)
                }
            )

    with open('./result/' + path.split('/')[-1][:-4] + '.json', 'w', encoding='utf-8') as up:
        json.dump(submit_json, up, indent=4, ensure_ascii=False)

最后删除其他文件,将result文件打包即可

!\rm result/.ipynb_checkpoints/ -rf
!\rm result.zip
!zip -r result.zip result/

训练优化

在baseline1中,为了便捷性,把训练集的大小设置得较小,训练轮数设置得较少,因此,为了提高准确率,可以人为增加训练集和训练轮数,如:

# 由于虚拟机磁盘大小有限,只能保存30个左右视频的全部图片,这里只保存20个
for anno_path, video_path in zip(train_annos[:20], train_videos[:20]):
# for anno_path, video_path in zip(train_annos[:5], train_videos[:5]):

results = model.train(data="yolo-dataset/yolo.yaml", epochs=5, imgsz=1080, batch=16)
# results = model.train(data="yolo-dataset/yolo.yaml", epochs=2, imgsz=1080, batch=16)

作为对比,可以看到修改后,单看每一轮或看最终的训练结果,训练时的各项损失值都有明显下降,说明baseline1的训练是远不够充分的

    # 修改前
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/2      6.62G      1.529       3.15      1.241         20       1088
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95):
                   all        583       6590     0.0225      0.355     0.0576     0.0224


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        2/2      6.25G     0.8006      1.357     0.9097         29       1088
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                   all        583       6590     0.0398      0.547      0.175     0.0802
     # 修改后
    Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/5      6.33G     0.6679     0.7915     0.9846         80       1088
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                   all        583       6590       0.38       0.13       0.17     0.0605

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        2/5      6.01G     0.3587     0.2876      0.858        121       1088
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                   all        583       6590     0.0329      0.206     0.0353     0.0128


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        3/5      5.83G     0.2769     0.2256     0.8355         96       1088
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                   all        583       6590      0.396     0.0625     0.0389      0.012


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        4/5      6.01G       0.22     0.1847     0.8204         90       1088
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                   all        583       6590      0.382      0.172     0.0924     0.0284


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        5/5      5.91G      0.174     0.1513     0.8083         75       1088
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                   all        583       6590      0.189       0.18      0.121     0.0431
  • 12
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值