安装使用DeepStream-Yolo

最新推荐文章于 2024-12-12 11:06:16 发布

THE@JOKER

最新推荐文章于 2024-12-12 11:06:16 发布

阅读量2.4k

点赞数 2

分类专栏： Jetson nano 文章标签：深度学习 pytorch 自动驾驶

本文链接：https://blog.csdn.net/W1995S/article/details/119902236

版权

DeepStream SDK YOLO模型 INT8校准对象检测 TensorRT转换

关键词由CSDN通过智能技术生成

Jetson nano 专栏收录该内容

5 篇文章

订阅专栏

该资源介绍了如何使用NVIDIA DeepStream SDK 5.1配合DeepStream-Yolo库来配置和优化YOLO系列模型，包括YOLOv2至YOLOv5的不同版本。支持 DarknetCFG 参数解析，INT8 校准，以及多种不被官方支持的模型和层。文章详细阐述了模型转换、配置文件编辑、INT8校准流程，并提供了性能比较，包括mAP和FPS。此外，还提及了从DeepStream获取元数据的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

推荐看DeepStream SDK：https://developer.nvidia.com/deepstream-sdk

DeepStream-Yolo

git clone https://github.com/marcoslucianops/DeepStream-Yolo

YOLO 模型的 NVIDIA DeepStream SDK 5.1 配置

此存储库的改进

Darknet CFG 参数解析器（不需要编辑 nvdsparsebbox_Yolo.cpp 或其他用于本地模型的文件）
支持 new_coords、beta_nms 和 scale_x_y 参数
支持官方 DeepStream SDK YOLO 中不支持的新模型。
支持官方 DeepStream SDK YOLO 中不支持的层。
支持官方 DeepStream SDK YOLO 中不支持的激活。
支持卷积组
支持 INT8 校准（不适用于 YOLOv5 型号）
支持非方形模型

TensorRT 转换

本机（以下测试模型）

YOLOv4x-Mish [cfg] [weights]
YOLOv4-CSP [cfg] [weights]
YOLOv4 [cfg] [weights]
YOLOv4-Tiny [cfg] [weights]
YOLOv3-SPP [cfg] [weights]
YOLOv3 [cfg] [weights]
YOLOv3-Tiny-PRN [cfg] [weights]
YOLOv3-Tiny [cfg] [weights]
YOLOv3-Lite [cfg] [weights]
YOLOv3-Nano [cfg] [weights]
YOLO-Fastest 1.1 [cfg] [weights]
YOLO-Fastest-XL 1.1 [cfg] [weights]
YOLOv2 [cfg] [weights]
YOLOv2-Tiny [cfg] [weights]

External

YOLOv5 5.0
YOLOv5 4.0
YOLOv5 3.X (3.0/3.1)

要求

NVIDIA DeepStream SDK 5.1
DeepStream-Yolo Native（适用于基于Darknet YOLO 的模型）
DeepStream-Yolo External（用于基于 PyTorch YOLOv5 的模型）

基本用法

git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo/native

从您的模型下载 cfg 和权重文件并移动到 DeepStream-Yolo/native 文件夹

编译

x86平台

CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo

jetson平台

CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo

为您的模型编辑 config_infer_primary.txt（YOLOv4 的示例）

[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolov4.cfg
# Weights
model-file=yolov4.weights
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25

运行

deepstream-app -c deepstream_app_config.txt

如果要使用 YOLOv2 或 YOLOv2-Tiny 模型，请在运行前更改 deepstream_app_config.txt

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2.txt

注意：config_infer_primary.txt 在 beta_nms 不可用时使用 cluster-mode=4 和 NMS = 0.45（通过代码）（当 beta_nms 可用时，NMS = beta_nms），而 config_infer_primary_yoloV2.txt 使用 cluster-mode=2 和 nms-iou -threshold=0.45 设置 NMS。

INT8校准

安装 OpenCV

sudo apt-get install libopencv-dev

使用 OpenCV 支持编译/重新编译 nvdsinfer_custom_impl_Yolo 库

x86平台

cd DeepStream-Yolo/native
CUDA_VER=11.1 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo

jetson平台

cd DeepStream-Yolo/native
CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo

对于 COCO 数据集，下载val2017，解压并移动到 DeepStream-Yolo/native 文件夹

从 COCO 数据集中选择 1000 张随机图像进行校准

mkdir calibration
for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
    cp val2017/${jpg} calibration/; \
done

使用所有选定的图像创建calibration.txt 文件

realpath calibration/*jpg > calibration.txt

设置环境变量

export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1

更改 config_infer_primary.txt 文件

...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...

至

...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...

运行

deepstream-app -c deepstream_app_config.txt

注意：NVIDIA 建议至少使用 500 张图像以获得良好的准确性。在这个例子中，我使用了 1000 张图像来获得更好的准确度（更多的图像 = 更高的准确度）。较高的 INT8_CALIB_BATCH_SIZE 值将提高精度和校准速度。根据您的 GPU 内存进行设置。这个过程可能需要很长时间。校准不适用于 YOLOv5 型号。

模型之间的 mAP/FPS 比较

valid = val2017 (COCO)
NMS = 0.45 (changed to beta_nms when used in Darknet cfg file) / 0.6 (YOLOv5 models)
pre-cluster-threshold = 0.001 (mAP eval) / 0.25 (FPS measurement)
batch-size = 1
FPS measurement display width = 1920
FPS measurement display height = 1080
NOTE: Used NVIDIA GTX 1050 (4GB Mobile) for evaluate. Used maintain-aspect-ratio=1 in config_infer file for YOLOv4 (with letter_box=1) and YOLOv5 models. For INT8 calibration, was used 1000 random images from val2017 (COCO) and INT8_CALIB_BATCH_SIZE=1.

提取metadata

您可以在 Python 和 C++ 中从 deepstream 获取metadata。对于 C++，您需要编辑 deepstream-app 或 deepstream-test 代码。对于 Python，您需要安装和编辑deepstream_python_apps。

您需要操作 NvDsObjectMeta ( Python / C++ )、 NvDsFrameMeta ( Python / C++ ) 和 NvOSD_RectParams ( Python / C++ ) 来获取 bbox 的标签、位置等。

在 C++ deepstream-app 应用程序中，您的代码需要位于 analytics_done_buf_prob 函数中。在 C++/Python deepstream-test 应用程序中，您的代码需要在 osd_sink_pad_buffer_probe/tiler_src_pad_buffer_probe 函数中。

Python 比 C 稍慢（大约 5-10%）。