YOLOv5 提升小目标识别能力 解决VisDrone2019数据集

这篇文章介绍了VisDrone2019数据集,一个用于无人机图像分析的大型标注基准,包含高分辨率图像和密集的对象实例。研究者展示了如何使用YOLOv5和EfficientDet架构,特别是BiFPN结构,以及增加小目标检测层来提升在小目标检测上的性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

VisDrone2019是一个无人机航拍数据集,图像分辨率高达2000x1500。训练集包含6,471张图像,共有343,205个标签,每张图像平均包含53个实例对象,这表明对象密度很高,而且大多数对象都非常小(<32像素),如图5所示。验证集和测试集分别包含548张和1,610张图像。该数据集包括10个类别:行人、人群、自行车、汽车、货车、卡车、三轮车、遮阳三轮车、公共汽车和摩托车。本文中的所有模型都是在训练集上进行训练,然后在验证集上进行验证,最后在测试集上进行评估的。
The VisDrone Dataset is a large-scale benchmark created by the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China. It contains carefully annotated ground truth data for various computer vision tasks related to drone-based image and video analysis.

VisDrone is composed of 288 video clips with 261,908 frames and 10,209 static images, captured by various drone-mounted cameras. The dataset covers a wide range of aspects, including location (14 different cities across China), environment (urban and rural), objects (pedestrians, vehicles, bicycles, etc.), and density (sparse and crowded scenes). The dataset was collected using various drone platforms under different scenarios and weather and lighting conditions. These frames are manually annotated with over 2.6 million bounding boxes of targets such as pedestrians, cars, bicycles, and tricycles. Attributes like scene visibility, object class, and occlusion are also provided for better data utilization.

# Ultralytics YOLO 🚀, AGPL-3.0 license
# VisDrone2019-DET dataset https://github.com/VisDrone/VisDrone-Dataset by Tianjin University
# Example usage: yolo train data=VisDrone.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── VisDrone  ← downloads here (2.3 GB)


# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/VisDrone  # dataset root dir
train: VisDrone2019-DET-train/images  # train images (relative to 'path')  6471 images
val: VisDrone2019-DET-val/images  # val images (relative to 'path')  548 images
test: VisDrone2019-DET-test-dev/images  # test images (optional)  1610 images

# Classes
names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor


# Download script/URL (optional) ---------------------------------------------------------------------------------------
download: |
  import os
  from pathlib import Path

  from ultralytics.utils.downloads import download

  def visdrone2yolo(dir):
      from PIL import Image
      from tqdm import tqdm

      def convert_box(size, box):
          # Convert VisDrone box to YOLO xywh box
          dw = 1. / size[0]
          dh = 1. / size[1]
          return (box[0] + box[2] / 2) * dw, (box[1] + box[3] / 2) * dh, box[2] * dw, box[3] * dh

      (dir / 'labels').mkdir(parents=True, exist_ok=True)  # make labels directory
      pbar = tqdm((dir / 'annotations').glob('*.txt'), desc=f'Converting {dir}')
      for f in pbar:
          img_size = Image.open((dir / 'images' / f.name).with_suffix('.jpg')).size
          lines = []
          with open(f, 'r') as file:  # read annotation.txt
              for row in [x.split(',') for x in file.read().strip().splitlines()]:
                  if row[4] == '0':  # VisDrone 'ignored regions' class 0
                      continue
                  cls = int(row[5]) - 1
                  box = convert_box(img_size, tuple(map(int, row[:4])))
                  lines.append(f"{cls} {' '.join(f'{x:.6f}' for x in box)}\n")
                  with open(str(f).replace(f'{os.sep}annotations{os.sep}', f'{os.sep}labels{os.sep}'), 'w') as fl:
                      fl.writelines(lines)  # write label.txt


  # Download
  dir = Path(yaml['path'])  # dataset root dir
  urls = ['https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-train.zip',
          'https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-val.zip',
          'https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-test-dev.zip',
          'https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-test-challenge.zip']
  download(urls, dir=dir, curl=True, threads=4)

  # Convert
  for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':
      visdrone2yolo(dir / d)  # convert VisDrone annotations to YOLO labels

YOLOv5

在这里插入图片描述

在这里插入图片描述

方法一: BiFPN结构替代FPN+PAN的Neck结构

论文题目:《EfficientDet: Scalable and Efficient Object Detection》

EfficientDet是谷歌大脑 Mingxing Tan、Ruoming Pang 和 Quoc V. Le 提出新架构 EfficientDet,结合 EfficientNet(同样来自该团队)和新提出的 BiFPN,实现新的 SOTA 结果。

贡献:
提出一种全新的特征融合方法:重复加权双向特征金字塔网络 BiFPN ;
提出一种复合的缩放方法(EfficientNet方法):统一缩放 分辨率、深度、宽度、特征融合网络、box/class网络。
BiFPN 全称 Bidirectional Feature Pyramid Network 加权双向(自顶向下 + 自低向上)特征金字塔网络。
本文将YOLOv5中的PANet层修改为EfficientDet-BiFPN,实现自上而下与自下而上的深浅层特征双向融合,增强不同网络层之间特征信息的传递,明显提升YOLOv5算法检测精度,并且具有更加不错的检测性能。
在这里插入图片描述
原文中是有五层,我们YOLOv5s里原版只有三层的检测头。所以大概结构是这样的:
在这里插入图片描述

那么,YOLOv5结合BiFPN需要修改以下几个地方:

**

  • 1.修改common.py

**

class Concat_BiFPN(nn.Module):
    def __init__(self, c1):
        super(Concat, self).__init__()
        # self.relu = nn.ReLU()
        self.w = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
        self.epsilon = 0.0001
        self.swish = Swish()
        
    def forward(self, x):
        weight = self.w / (torch.sum(self.w, dim=0) + self.epsilon)
        # Connections for P6_0 and P7_0 to P6_1 respectively
        x = self.swish(weight[0] * x[0] + weight[1] * x[1])
        return x

**

  • 2.修改yolo.py

**

channel现在不是直接concat了,而是进行pairwise add操作,所以out-channel不再是sum。

**

  • 3.修改配置文件

**

根据自身结构在yolov5代码的yolov5s.yaml 中加入Concat_BiFPN.

结果:
在这里插入图片描述

mAP50 是 0.348, mAP50-95是 0.192。

方法二、小目标检测层P2

采用增加小目标检测层的方式来使YOLOv5能够检测小目标,只需要修改models下的yaml文件中的内容即可。
原本yaml文件:

# parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
 
# anchors
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
 
# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, C3, [1024, False]],  # 9
  ]
 
# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13
 
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)
 
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)
 
   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)
 
   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

改变后的yaml文件:

# parameters
nc: 10  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
 
# anchors
anchors:
  - [5,6, 8,14, 15,11]
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
 
# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, C3, [1024, False]],  # 9
  ]
 
# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13
 
   [-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [512, False]],  # 17 (P3/8-small)
 
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 2], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)
 
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 18], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [256, False]],  # 20 (P4/16-medium)
 
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],
   [-1, 3, C3, [512, False]],
 
   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)
 
   [[21, 24, 27, 30], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

三、 BiFPN结构和B2小目标检测层一起使用

大概画了一个示意图

yaml文件:

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
 
# Parameters  (P2, P3, P4, P5)都输出,宽深与large版本相同,相当于比large版本能检测更小物体
nc: 1  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors: 4  # AutoAnchor evolves 3 anchors per P output layer
 
# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]
 
# YOLOv5 v6.0 head with (P2, P3, P4, P5) outputs
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, BiFPN_Add2, [256,256]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13
 
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, BiFPN_Add2, [128,128]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)
 
   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 2], 1, BiFPN_Add2, [64,64]],  # cat backbone P2
   [-1, 1, C3, [128, False]],  # 21 (P2/4-xsmall)
 
   [-1, 1, Conv, [128, 3, 2]],
   [-1, 1, Conv, [256, 1, 1]],
   [[-1, 17, 4], 1, BiFPN_Add3, [128,128]],  # cat head P3
   [-1, 3, C3, [256, False]],  # 25 (P3/8-small)
 
   [-1, 1, Conv, [256, 3, 2]],
   [-1, 1, Conv, [512, 1, 1]],
   [[-1, 13, 6], 1, BiFPN_Add3, [256,256]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 29 (P4/16-medium)
 
   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, BiFPN_Add2, [256,256]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 32 (P5/32-large)
 
   [[21, 25, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P2, P3, P4, P5)
  ]
VisDrone2019是一个视觉无人机视频数据集,是一个公开且大规模的数据集,用于无人机视频分析的研究和开发。该数据集包含了来自不同环境下,拍摄的无人机视频、图像和相关的注释信息。VisDrone2019提供的数据种类非常丰富,包括目标检测、目标跟踪、目标计数、行为分析等多个任务。 VisDrone2019的主要目的是促进无人机视觉领域的研究和发展。无人机视觉是一门新兴的研究领域,其可以应用于很多实际场景中,如安全监控、农业观测、城市规划等。然而,由于无人机视频数据的特殊性,例如数据量大、视角变化大、遮挡问题等,使得无人机视觉研究面临很多挑战。VisDrone2019的发布旨在解决这些问题,提供一个可用于模型训练和算法验证的数据集,推动无人机视觉技术的进一步发展。 VisDrone2019数据集包含了超过在测试集中包含了超过4,800个视频片段,达到了超过2万多分钟的视频帧数。同时,数据集还提供了超过2万个物体实例的标注信息,以及视频序列的注释信息。这些数据可以帮助开发者和研究人员进行目标检测、跟踪、计数等任务的开发和评估。 总的来说,VisDrone2019是一个重要的无人机视觉数据集,为研究人员和开发者提供了宝贵的资源,推动了无人机视觉技术的发展,对于提高无人机应用的效果有着重要的意义。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Terrence Shen

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值