mmdetection基于 PyTorch 的目标检测开源工具箱入门教程

O丶ne丨柒夜

已于 2023-08-24 09:56:31 修改

阅读量1.1k

点赞数 2

文章标签： pytorch 目标检测人工智能

于 2023-08-23 23:11:12 首次发布

本文链接：https://blog.csdn.net/m0_61634551/article/details/132461438

版权

安装环境

MMDetection 支持在 Linux，Windows 和 macOS 上运行。它需要 Python 3.7 以上，CUDA 9.2 以上和 PyTorch 1.8 及其以上。

1、安装依赖

步骤 0. 从官方网站下载并安装 Miniconda。

步骤 1. 创建并激活一个 conda 环境。

conda create --name openmmlab python=3.8 -y
conda activate openmmlab

步骤 2. 基于 PyTorch 官方说明安装 PyTorch。

在 GPU 平台上：

conda install pytorch torchvision -c pytorch

在 CPU 平台上：

conda install pytorch torchvision cpuonly -c pytorch

步骤 3. 使用 MIM 安装 MMEngine 和 MMCV。

pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"

注意： 在 MMCV-v2.x 中，mmcv-full 改名为 mmcv，如果你想安装不包含 CUDA 算子精简版，可以通过 mim install "mmcv-lite>=2.0.0rc1" 来安装。

步骤 4. 安装 MMDetection。

方案 a：如果你开发并直接运行 mmdet，从源码安装它：

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .
# "-v" 指详细说明，或更多的输出
# "-e" 表示在可编辑模式下安装项目，因此对代码所做的任何本地修改都会生效，从而无需重新安装。

方案 b：如果你将 mmdet 作为依赖或第三方 Python 包，使用 MIM 安装：

mim install mmdet

2、验证安装

为了验证 MMDetection 是否安装正确，我们提供了一些示例代码来执行模型推理。

步骤 1. 我们需要下载配置文件和模型权重文件。

mim download mmdet --config rtmdet_tiny_8xb32-300e_coco --dest .

下载将需要几秒钟或更长时间，这取决于你的网络环境。完成后，你会在当前文件夹中发现两个文件 rtmdet_tiny_8xb32-300e_coco.py 和 rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth。

步骤 2. 推理验证。

方案 a：如果你通过源码安装的 MMDetection，那么直接运行以下命令进行验证：

python demo/image_demo.py demo/demo.jpg rtmdet_tiny_8xb32-300e_coco.py --weights rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --device cpu
cuda:
python demo/image_demo.py demo/demo2(1).jpg rtmdet_tiny_8xb32-300e_coco.py --weights rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --device cuda

python demo/image_demo.py demo/demo.jpg rtmdet_tiny_8xb32-300e_coco.py --weights rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --device cpu

你会在当前文件夹中的 outputs/vis 文件夹中看到一个新的图像 demo.jpg，图像中包含有网络预测的检测框。

方案 b：如果你通过 MIM 安装的 MMDetection，那么可以打开你的 Python 解析器，复制并粘贴以下代码：

from mmdet.apis import init_detector, inference_detector

config_file = 'rtmdet_tiny_8xb32-300e_coco.py'
checkpoint_file = 'rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'
model = init_detector(config_file, checkpoint_file, device='cpu')  # or device='cuda:0'
inference_detector(model, 'demo/demo.jpg')

你将会看到一个包含 DetDataSample 的列表，预测结果在 pred_instance 里，包含有检测框，类别和得分。

目标检测+覆盖

mim download mmdet --config mask-rcnn_r101_fpn_2x_coco --dest models

python demo/image_demo.py demo/demo.jpg configs\mask_rcnn\mask-rcnn_r101_fpn_2x_coco.py  --weights models/mask_rcnn_r101_fpn_2x_coco_bbox.pth --device cuda

3.mmdetection算法速览

albu_example - 示例使用Albumentations数据增强库
atss - Anchor-free一阶段目标检测模型
autoassign - 自动分配样本到计算资源的示例
boxinst - BoxInst目标实例分割模型
bytetrack - 基于ByteTrack的多目标跟踪
carafe - CARAFE通道注意力模块
cascade_rcnn - Cascade R-CNN级联RCNN目标检测模型
cascade_rpn - CascadeRPN用于Faster R-CNN的级联RPN
centernet - CenterNet中心点检测模型
centripetalnet - CentripetalNet边缘眼动检测模型
cityscapes - Cityscapes城市场景数据集
common - 通用配置和脚本
condinst - 基于条件CondiInst目标实例分割
conditional_detr - 基于DETR的条件目标检测
convnext - ConvNeXt图像分类模型
cornernet - CornerNet角点检测模型
crowddet - 群众密集场景检测模型CrowdDet
dab_detr - DAB-DETR对抗学习增强的DETR
dcn - 可变形卷积网络
dcnv2 - 可变形卷积网络v2
ddod - DDOD端到端目标检测
deepfashion - DeepFashion人体解析数据集
deepsort - DeepSORT深度学习多目标跟踪
deformable_detr - 基于可变形卷积的DETR
detectors - 通用目标检测器配置
detr - DETR (DEformable DEtection TRansformer)
dino - DINO自监督预训练模型
double_heads - Double-Heads双头目标检测
dsdl - DSdL场景文本检测
dyhead - DyHead动态头注意力
dynamic_rcnn - Dynamic R-CNN动态RCNN
efficientnet - EfficientNet图像分类网络
empirical_attention - Empirical Attention注意力机制
faster_rcnn - Faster R-CNN两阶段目标检测模型
fast_rcnn - Fast R-CNN较早的两阶段目标检测模型
fcos - FCOS全景分割目标检测
foveabox - FoveaBox凝视预测模块
fpg - Feature Pyramid Grids
free_anchor - FreeAnchor自由锚框检测
fsaf - Feature Selective Anchor-Free模块
gcnet - GCNet场景图卷积网络
gfl - Generalized Focal Loss
ghm - Gradient Harmonizing Mechanism
glip - Global Local Image Pyramid
gn - Group Normalization
gn+ws - Group Normalization + Weight Standardization
grid_rcnn - Grid R-CNN网格RCNN
groie - Gradient-weighted R-CNN Object IoU Estimation
guided_anchoring - Guided Anchoring定向锚框
hrnet - High-Resolution Network高分辨率网络
htc - Hybrid Task Cascade模块
instaboost - Instance Boostraping样本选择算法
lad - Lightweight ADetector轻量级检测器
ld - Localization Distillation知识蒸馏模块
legacy_1.x - 早期MMDetection 1.x版本配置
libra_rcnn - Libra R-CNN均衡RCNN
lvis - LVIS大词汇数据集
mask2former - Mask2Former 基于transformer的实例分割
maskformer - MaskFormer transformer based实例分割
masktrack_rcnn - MaskTrack R-CNN视频实例分割跟踪
mask_rcnn - Mask R-CNN实例分割模型
misc - 其他独立模块
ms_rcnn - Multi-Scale RCNN多尺度RCNN
nas_fcos - NAS-FCOS神经结构搜索FCOS
nas_fpn - NAS-FPN神经结构搜索特征金字塔
objects365 - Objects365数据集
ocsort - 一种基于检测的跟踪方法
openimages - OpenImages数据集
paa - Pooling-based Anchor Assignment
pafpn - Path Aggregation Network
panoptic_fpn - Panoptic FPN全景分割FPN
pascal_voc - PASCAL VOC数据集
pisa - Prime Sample Attention采样注意力
point_rend - PointRend点分割
pvt - Pyramid Vision Transformer金字塔视觉transformer
qdtrack - Quality Aware Network for Multiple Object Tracking
queryinst - QueryInst基于query的实例分割
regnet - RegNet网络结构
reid - 人员重识别模型
reppoints - RepPoints角点检测
res2net - Res2Net网络结构
resnest - ResNeSt网络结构
retinanet - RetinaNet单阶段目标检测模型
rpn - Region Proposal Network
rtmdet - Real-time Multi-scale Detector实时多尺度检测器
sabl - Side-Aware Boundary Localization
scnet - SCNet场景解析模型
scratch - 从零开始训练配置
seesaw_loss - Seesaw Loss
selfsup_pretrain - 自监督预训练模型
simple_copy_paste - Simple Copy-Paste数据增强方法
soft_teacher - Soft Teacher Semi-Supervised Object Detection
solo - Segmenting Objects by Locations单阶段实例分割
solov2 - SOLOv2
sort - SORT简单联合检测和跟踪算法
sparse_rcnn - Sparse R-CNN稀疏RCNN
ssd - SSD单阶段目标检测模型
strongsort - StrongSORT强化的SORT算法
strong_baselines - 一些强基准模型配置
swin - Swin Transformer
timm_example - 使用timm库的示例
tood - TOOD场景文本检测器
tridentnet - TridentNet三叉网络
vfnet - VarifocalNet变焦点网络
wider_face - WIDER FACE人脸数据集
yolact - YOLACT实时实例分割
yolo - YOLO系列目标检测模型
yolof - YOLOF快速Yolo模型
yolox - YOLOX优化的Yolo模型
base - 基础模块和脚本

目标检测

图片目标检测

视频检测

命令行

mim download mmdet --config faster-rcnn_r101_fpn_2x_coco --dest models
python demo/webcam_demo.py configs/faster_rcnn/faster_rcnn_r101_fpn_2x_coco.py models/faster_rcnn_r101_fpn_2x_coco_bbox.pth --file 漫步在闵行莘庄老街道.mp4

import argparse
import os
import cv2 as cv
import torch
import argparse
import numpy as np

from mmdet.apis import inference_detector, init_detector

file_path = __file__
dir_path = os.path.dirname(file_path)
output_video_path = os.path.join(dir_path, 'result.mp4')


def main():
    args = {"file": "Nan", "checkpoint": "Nan", "config": "Nan",
            "out": "Nan", "device": "Nan", "camera_id": "Nan", "score_thr": "Nan", }
    args = argparse.Namespace(**args)
    # 自定义输入    args.device = "cuda:0"
    args.file = '/home/sha/PycharmProjects/mmdetection/workdir_hurricane/videos_test/video5.MP4'
    args.checkpoint = '/home/sha/PycharmProjects/mmdetection/workdir_hurricane/traffic_dataset_fasterRCNN/latest.pth'
    args.config = '/home/sha/PycharmProjects/mmdetection/workdir_hurricane/faster_rcnn_r101_fpn_2x_coco.py'
    args.out = 'workdir_hurricane/result.mp4'
    args.device = 'cuda:0'
    args.camera_id = 0
    args.score_thr = 0.5

    print("*" * 50)
    print(args)
    print("*" * 50)

    if not args.file:
        print('No target file!')
        exit(0)

    device = torch.device(args.device)

    print('device:', args.device)

    model = init_detector(args.config, args.checkpoint, device=device)

    camera = cv.VideoCapture(args.file)

    camera_width = int(camera.get(cv.CAP_PROP_FRAME_WIDTH))
    camera_hight = int(camera.get(cv.CAP_PROP_FRAME_HEIGHT))

    print(camera_hight, camera_width)
    fps = camera.get(cv.CAP_PROP_FPS)

    video_writer = cv.VideoWriter(args.out, cv.VideoWriter_fourcc(*'mp4v'),
                                  fps, (camera_width, camera_hight))

    count = 0

    print('Press "Esc", "q" or "Q" to exit.')
    while True:
        torch.cuda.empty_cache()
        ret_val, img = camera.read()
        if ret_val:
            if count < 0:
                count += 1
                print("Write {} in result Successfuly!".format(count))
                continue
            result = inference_detector(model, img)
            print("*" * 50)
            print(result)
            print("*" * 50)
            result_int = result[1][0:3]
            result_int = result_int.astype(int)        
            left_top = (result_int[0][0], result_int[0][1])
            right_bottom = (result_int[0][2], result_int[0][3])
            
            cv.rectangle(img, left_top, right_bottom, (0, 0, 255), 2)
            cv.imshow("img", img)
            # cv.resizeWindow("img",300,300)
            cv.waitKey(0)
            cv.destroyWindow("img")

            ch = cv.waitKey(1)
            if ch == 27 or ch == ord('q') or ch == ord('Q'):
                break

            frame = model.show_result(img, result, score_thr=args.score_thr, wait_time=1, show=False)
            cv.imshow('frame', frame)
            if len(frame) >= 1:
                video_writer.write(frame)
                count += 1
                print("Write {} in result Successfuly!".format(count))

        else:
            print('Load fail!')
            break
    camera.release()
    video_writer.release()
    cv.destroyWindow()


if __name__ == '__main__':
    main()