jetson nano 部署yolov8（python）

最新推荐文章于 2024-08-27 22:21:49 发布

八级玄仙

最新推荐文章于 2024-08-27 22:21:49 发布

阅读量713

点赞数 7

文章标签： YOLO

本文链接：https://blog.csdn.net/qq_32636415/article/details/140977498

版权

前言

jetson nano 环境如下

sudo apt-cache show nvidia-jetpack

一、nano运行yolov8 pt模型

1、环境搭建

conda create -n yolo python=3.8

conda activate yolo

pip install ultralytics onnx lapx numpy==1.23.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

# 安装Jetson的Pytorch GPU版本

pip install torch-*.whl torchvision-*.whl

# torch-1.11.0a0+gitbc2c6ed-cp38-cp38-linux_aarch64.whl

# torchvision-0.12.0a0+9b5a3fe-cp38-cp38-linux_aarch64.whl

安装后pip list查看

python -c "import torch;print(torch.cuda.is_available(), torch.__version__)"

2、推理测试

在终端运行，同级目录需要有yolov8n.pt，bus.jpg文件

yolo task=detect mode=predict model=yolov8n.pt source=bus.jpg show=True

如果报错：OSError: libomp.so.5: cannot open shared object file: No such file or directory
执行sudo apt-get install libomp5可解决

结果

3、性能测试

内存/GPU占用

yolov8n.pt 1.71G

yolov8s.pt 1.77G

检测速度

yolov8n.pt FPS: 5.35

yolov8s.pt FPS: <3

m、l、x模型分别如下

通过yolov8直接运行.pt模型，GPU占用大，检测速度慢！

来自：https://i7y.org/en/yolov8-on-jetson-nano/

测试代码

import time
from ultralytics import YOLO
import cv2


def detect_objects(model_path, image_path, iterations=100, report_interval=20):
    # Load the model
    model = YOLO(model_path)

    # Load the image
    img = cv2.imread(image_path)

    # Initialize variables
    total_time = 0.0
    start_time = time.time()

    for i in range(iterations):
        # Perform the object detection
        results = model.predict(source=img, conf=0.5)  # conf is the confidence threshold

        # Measure the time taken for prediction
        end_time = time.time()
        elapsed_time = end_time - start_time
        start_time = end_time

        # Print the single iteration time
        # print(f"Iteration {i + 1}: Detection took {elapsed_time:.4f} seconds")

        total_time += elapsed_time

        # Print the results every 20 iterations
        if (i + 1) % report_interval == 0:
            avg_time = total_time / report_interval
            fps = 1 / avg_time
            print(f"Iteration {i + 1}: Average Time: {avg_time:.4f} seconds, FPS: {fps:.2f}")
            total_time = 0.0  # Reset total time for next interval

    # Final print after all iterations
    print("Finished running all iterations.")


# Define the paths to the model and the image
model_path = "yolov8s.pt"
image_path = "bus.jpg"

# Call the detection function
detect_objects(model_path, image_path, iterations=100, report_interval=20)

二、TensorRT Python Bindings

由于yolov8需要python3.8以上的版本，jetson nano自带的python版tensorrt时绑定的python3.6，采用tensorrt加速yolov8模型时不兼容，需要安装python3.8版本tensorrt。

参考：

Jetson NX实现TensorRT加速部署YOLOv8_yolov8模型部署nx-CSDN博客

Jetson/L4T/TRT Customized Example - eLinux.org

https://github.com/NVIDIA/TensorRT/tree/release/8.2

Index of /pool/main/p/python3.8

二、TensorRT Python Bindings

1. Building python3.9

$ sudo apt install zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev libbz2-dev

$ wget https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tar.xz

$ tar xvf Python-3.9.1.tar.xz Python-3.9.1/

$ mkdir build-python-3.9.1

$ cd build-python-3.9.1/

$ ../Python-3.9.1/configure --enable-optimizations

$ make -j $(nproc)

$ sudo -H make altinstall

$ cd ../

2. Build cmake 3.13.5

$ sudo apt-get install -y protobuf-compiler libprotobuf-dev openssl libssl-dev libcurl4-openssl-dev

$ wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz

$ tar xvf cmake-3.13.5.tar.gz

$ rm cmake-3.13.5.tar.gz

$ cd cmake-3.13.5/

$ ./bootstrap --system-curl

$ make -j$(nproc)

$ echo 'export PATH='${PWD}'/bin/:$PATH' >> ~/.bashrc

$ source ~/.bashrc

$ cd ../

sudo apt install zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev libbz2-dev

Installation

Download pybind11

Create a directory for external sources and download pybind11 into it.

export EXT_PATH=~/external

mkdir -p $EXT_PATH && cd $EXT_PATH

git clone https://github.com/pybind/pybind11.git

Download Python headers

Add Main Headers

Get the source code from the official python sources

下载 python3.8.19

Python Release Python 3.8.19 | Python.org

tar xvf Python-3.8.19.tar.xz Python-3.8.19

Building python3.9

$ sudo apt install zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev libbz2-dev

$ wget https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tar.xz

$ tar xvf Python-3.9.1.tar.xz Python-3.9.1/

$ mkdir build-python-3.9.1

$ cd build-python-3.9.1/

$ ../Python-3.9.1/configure --enable-optimizations

$ make -j $(nproc)

$ sudo -H make altinstall

$ cd ../

Add PyConfig.h

从官方获取python源代码 Python Source Releases | Python.org，下载对应的python版本。将python源码中Include路径下的内容拷贝到~/external/python3.8/include中（python3.8/include 该目录自己新建的）。

下载 Python-3.8.19.tar.xz

tar xvf Python-3.9.2.tar.xz Python-3.9.2

cp -r Python-3.9.2/Include

将 libpython3.9-dev_3.9.2-1_arm64.deb 放到 ~/work/tool/，

下载地址：

http://ftp.us.debian.org/debian/pool/main/p/python3.9/

Index of /pool/main/p/python3.8

ar x libpython3.8-dev_3.8.2-1ubuntu1_arm64.deb
tar -xvf data.tar.xz 
cp ./usr/include/aarch64-linux-gnu/python3.8/pyconfig.h ~/external/python3.8/include/

Build Python bindings

TRT_OSSPATH=${PWD}/.. EXT_PATH=${PWD}/../.. TARGET=aarch64 PYTHON_MINOR_VERSION=9 bash build.sh (用下面的方法)

修改TensorRT/python/bash.sh中的内容。

bash.sh中找到以下内容：

#原内容
PYTHON_MAJOR_VERSION=${PYTHON_MAJOR_VERSION:-3}
PYTHON_MINOR_VERSION=${PYTHON_MINOR_VERSION:-8}
TARGET=${TARGET_ARCHITECTURE:-x86_64}
CUDA_ROOT=${CUDA_ROOT:-/usr/local/cuda}
ROOT_PATH=${TRT_OSSPATH:-/workspace/TensorRT}
EXT_PATH=${EXT_PATH:-/tmp/external}
WHEEL_OUTPUT_DIR=${ROOT_PATH}/python/build

将TARGET修改为-aarch64。
将ROOT_PATH改为你TensoRT对应的绝对路径。
将EXT_PATH改为你创建的external对应的绝对路径。

#修改后如下：
PYTHON_MAJOR_VERSION=${PYTHON_MAJOR_VERSION:-3}
PYTHON_MINOR_VERSION=${PYTHON_MINOR_VERSION:-8}
TARGET=${TARGET_ARCHITECTURE:-aarch64}
CUDA_ROOT=${CUDA_ROOT:-/usr/local/cuda}
ROOT_PATH=${TRT_OSSPATH:-/home/xxx/TensorRT}
EXT_PATH=${EXT_PATH:-/home/xxx/external}
WHEEL_OUTPUT_DIR=${ROOT_PATH}/python/build

最后运行bash.sh。运行前检查setuptools是否为最新版本。

pip install -U pip setuptools
bash ./build.sh

Install the python wheel

pip install build/dist/tensorrt-8.2.3.0-cp38-none-linux_aarch64.whl

#-----------------------------------------------

$ git clone -b release/8.2 https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT
$ git submodule update --init --recursive
$
$ cmake .. -DGPU_ARCHS="53" -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc
$ make -j$(nproc)

编译tensorrt 生成trtexec

cd ~/external/TensorRT/build

cmake ..

使用：cmake -DCMAKE_CUDA_ARCHITECTURES=53 ..

(yolo8) xxx@miivii-tegra:~/external/TensorRT/build$ cmake -DCMAKE_CUDA_ARCHITECTURES=53 ..
Building for TensorRT version: 8.2.3, library version: 8
-- Targeting TRT Platform: aarch64
-- CUDA version set to 10.2.89
-- cuDNN version set to 8.2
-- Protobuf version set to 3.0.0
-- Setting up another Protobuf build for cross compilation targeting aarch64-Linux
-- Using libprotobuf /home/home58/suo58/external/TensorRT/build/third_party.protobuf_aarch64/lib/libprotobuf.a
-- ========================= Importing and creating target nvinfer ==========================
-- Looking for library nvinfer
-- Library that was found /usr/lib/aarch64-linux-gnu/libnvinfer.so
-- ==========================================================================================
-- ========================= Importing and creating target nvuffparser ==========================
-- Looking for library nvparsers
-- Library that was found /usr/lib/aarch64-linux-gnu/libnvparsers.so
-- ==========================================================================================
-- GPU_ARCHS is not defined. Generating CUDA code for default SMs: 53;60;61;70;75;72
-- Protobuf proto/trtcaffe.proto -> proto/trtcaffe.pb.cc proto/trtcaffe.pb.h
-- /home/home58/suo58/external/TensorRT/build/parsers/caffe
Generated: /home/xxx/external/TensorRT/build/parsers/onnx/third_party/onnx/onnx/onnx_onnx2trt_onnx-ml.proto
Generated: /home/home58/suo58/external/TensorRT/build/parsers/onnx/third_party/onnx/onnx/onnx-operators_onnx2trt_onnx-ml.proto
Generated: /home/home58/suo58/external/TensorRT/build/parsers/onnx/third_party/onnx/onnx/onnx-data_onnx2trt_onnx.proto
--
-- ******** Summary ********
-- CMake version : 3.20.4
-- CMake command : /home/xxx/miniforge3/envs/yolo8/lib/python3.8/site-packages/cmake/data/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/g++
-- C++ compiler version : 7.5.0
-- CXX flags : -Wno-deprecated-declarations -DBUILD_SYSTEM=cmake_oss -Wall -Wno-deprecated-declarations -Wno-unused-function -Wnon-virtual-dtor
-- Build type : Release
-- Compile definitions : _PROTOBUF_INSTALL_DIR=/home/xxx/external/TensorRT/build/third_party.protobuf;SOURCE_LENGTH=37;ONNX_NAMESPACE=onnx2trt_onnx
-- CMAKE_PREFIX_PATH :
-- CMAKE_INSTALL_PREFIX : /home/xxx/external/TensorRT/build/..
-- CMAKE_MODULE_PATH :
--
-- ONNX version : 1.8.0
-- ONNX NAMESPACE : onnx2trt_onnx
-- ONNX_BUILD_TESTS : OFF
-- ONNX_BUILD_BENCHMARKS : OFF
-- ONNX_USE_LITE_PROTO : OFF
-- ONNXIFI_DUMMY_BACKEND : OFF
-- ONNXIFI_ENABLE_EXT : OFF
--
-- Protobuf compiler :
-- Protobuf includes :
-- Protobuf libraries :
-- BUILD_ONNX_PYTHON : OFF
-- Found CUDA headers at /usr/local/cuda-10.2/include
-- Found TensorRT headers at /home/xxx/external/TensorRT/include
-- Find TensorRT libs at /usr/lib/aarch64-linux-gnu/libnvinfer.so;/home/xxx/external/TensorRT/lib/libnvinfer_plugin.so
ONNX_INCLUDE_DIR
-- Adding new sample: sample_algorithm_selector
-- - Parsers Used: caffe
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_char_rnn
-- - Parsers Used: uff;caffe;onnx
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_dynamic_reshape
-- - Parsers Used: onnx
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_fasterRCNN
-- - Parsers Used: caffe
-- - InferPlugin Used: ON
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_googlenet
-- - Parsers Used: caffe
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_int8
-- - Parsers Used: caffe
-- - InferPlugin Used: ON
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_int8_api
-- - Parsers Used: onnx
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_mnist
-- - Parsers Used: caffe
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_mnist_api
-- - Parsers Used: caffe
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_nmt
-- - Parsers Used: none
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_onnx_mnist
-- - Parsers Used: onnx
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_io_formats
-- - Parsers Used: caffe
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_ssd
-- - Parsers Used: caffe
-- - InferPlugin Used: ON
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_uff_fasterRCNN
-- - Parsers Used: uff
-- - InferPlugin Used: ON
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_uff_maskRCNN
-- - Parsers Used: uff
-- - InferPlugin Used: ON
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_uff_mnist
-- - Parsers Used: uff
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_uff_plugin_v2_ext
-- - Parsers Used: uff
-- - InferPlugin Used: OFF
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_uff_ssd
-- - Parsers Used: uff
-- - InferPlugin Used: ON
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: sample_onnx_mnist_coord_conv_ac
-- - Parsers Used: onnx
-- - InferPlugin Used: ON
-- - Licensing: samples
ONNX_INCLUDE_DIR
-- Adding new sample: trtexec
-- - Parsers Used: caffe;uff;onnx
-- - InferPlugin Used: OFF
-- - Licensing: samples
-- Configuring done
-- Generating done
-- Build files have been written to: /home/xxx/external/TensorRT/build

make -j4

make install

三、YOLOv8 模型加速

参考：Jetson nano部署YOLOv8_jetson nano yolov8-CSDN博客

https://zhuanlan.zhihu.com/p/665546297

1、模型转换：采用infer框架trtexec工具进行模型转换

# 模型转换工具

git clone https://github.com/shouxieai/infer.git

# yolov8源码

git clone https://github.com/ultralytics/ultralytics.git

（1）将pt模型导出ONNX

编写exportOnnx.py放入ultralytics下（开发板上）

from ultralytics import YOLO
model = YOLO("../yolov8/yolov8n.pt")

success = model.export(imgsz=640,format="onnx", batch=1)

运行 python exportOnnx.py后，在yolov8n.pt所在目录下生成 yolov8n.onnx

（2）将yolov8n.onnx模型优化生成yolov8n.transd.onnx

参考：Jetson nano部署YOLOv8_jetson nano yolov8-CSDN博客

进入infer/workspace/，执行 python v8trans.py yolov8n.onnx

v8trans.py代码如下：

import onnx
import onnx.helper as helper
import sys
import os

def main():

    if len(sys.argv) < 2:
        print("Usage:\n python v8trans.py yolov8n.onnx")
        return 1

    file = sys.argv[1]
    if not os.path.exists(file):
        print(f"Not exist path: {file}")
        return 1

    prefix, suffix = os.path.splitext(file)
    dst = prefix + ".transd" + suffix

    model = onnx.load(file)
    node  = model.graph.node[-1]

    old_output = node.output[0]
    node.output[0] = "pre_transpose"

    for specout in model.graph.output:
        if specout.name == old_output:
            shape0 = specout.type.tensor_type.shape.dim[0]
            shape1 = specout.type.tensor_type.shape.dim[1]
            shape2 = specout.type.tensor_type.shape.dim[2]
            new_out = helper.make_tensor_value_info(
                specout.name,
                specout.type.tensor_type.elem_type,
                [0, 0, 0]
            )
            new_out.type.tensor_type.shape.dim[0].CopyFrom(shape0)
            new_out.type.tensor_type.shape.dim[2].CopyFrom(shape1)
            new_out.type.tensor_type.shape.dim[1].CopyFrom(shape2)
            specout.CopyFrom(new_out)

    model.graph.node.append(
        helper.make_node("Transpose", ["pre_transpose"], [old_output], perm=[0, 2, 1])
    )

    print(f"Model save to {dst}")
    onnx.save(model, dst)
    return 0

if __name__ == "__main__":
    sys.exit(main())

生成

(3) engine生成

执行 trtexec --onnx=yolov8n.transd.onnx --saveEngine=yolov8n.transd.engine

生成 yolov8n.transd.engine

直接转换：

#将pt模型转换为onnx模型
yolo export model=yolov8n.pt format=onnx opset=12
# 将onnx模型转换为engine模型
trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.engine --fp16

(yolo8) xxx@miivii-tegra:~/work/yolov8$ trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.engine --fp16

&&&& RUNNING TensorRT.trtexec # trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.engine --fp16
[08/12/2024-09:51:36] [I] === Model Options ===
[08/12/2024-09:51:36] [I] Format: ONNX
[08/12/2024-09:51:36] [I] Model: yolov8n.onnx
[08/12/2024-09:51:36] [I] Output:
[08/12/2024-09:51:36] [I] === Build Options ===
[08/12/2024-09:51:36] [I] Max batch: 1
[08/12/2024-09:51:36] [I] Workspace: 16 MB
[08/12/2024-09:51:36] [I] minTiming: 1
[08/12/2024-09:51:36] [I] avgTiming: 8
[08/12/2024-09:51:36] [I] Precision: FP32+FP16
[08/12/2024-09:51:36] [I] Calibration:
[08/12/2024-09:51:36] [I] Safe mode: Disabled
[08/12/2024-09:51:36] [I] Save engine: yolov8n.engine
[08/12/2024-09:51:36] [I] Load engine:
[08/12/2024-09:51:36] [I] Builder Cache: Enabled
[08/12/2024-09:51:36] [I] NVTX verbosity: 0
[08/12/2024-09:51:36] [I] Inputs format: fp32:CHW
[08/12/2024-09:51:36] [I] Outputs format: fp32:CHW
[08/12/2024-09:51:36] [I] Input build shapes: model
[08/12/2024-09:51:36] [I] Input calibration shapes: model
[08/12/2024-09:51:36] [I] === System Options ===
[08/12/2024-09:51:36] [I] Device: 0
[08/12/2024-09:51:36] [I] DLACore:
[08/12/2024-09:51:36] [I] Plugins:
[08/12/2024-09:51:36] [I] === Inference Options ===
[08/12/2024-09:51:36] [I] Batch: 1
[08/12/2024-09:51:36] [I] Input inference shapes: model
[08/12/2024-09:51:36] [I] Iterations: 10
[08/12/2024-09:51:36] [I] Duration: 3s (+ 200ms warm up)
[08/12/2024-09:51:36] [I] Sleep time: 0ms
[08/12/2024-09:51:36] [I] Streams: 1
[08/12/2024-09:51:36] [I] ExposeDMA: Disabled
[08/12/2024-09:51:36] [I] Spin-wait: Disabled
[08/12/2024-09:51:36] [I] Multithreading: Disabled
[08/12/2024-09:51:36] [I] CUDA Graph: Disabled
[08/12/2024-09:51:36] [I] Skip inference: Disabled
[08/12/2024-09:51:36] [I] Inputs:
[08/12/2024-09:51:36] [I] === Reporting Options ===
[08/12/2024-09:51:36] [I] Verbose: Disabled
[08/12/2024-09:51:36] [I] Averages: 10 inferences
[08/12/2024-09:51:36] [I] Percentile: 99
[08/12/2024-09:51:36] [I] Dump output: Disabled
[08/12/2024-09:51:36] [I] Profile: Disabled
[08/12/2024-09:51:36] [I] Export timing to JSON file:
[08/12/2024-09:51:36] [I] Export output to JSON file:
[08/12/2024-09:51:36] [I] Export profile to JSON file:
[08/12/2024-09:51:36] [I]
----------------------------------------------------------------
Input filename:   yolov8n.onnx
ONNX IR version:  0.0.7
Opset version:    12
Producer name:    pytorch
Producer version: 1.11.0
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[08/12/2024-09:51:38] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/12/2024-09:52:52] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[08/12/2024-09:59:08] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[08/12/2024-09:59:08] [I] Starting inference threads
[08/12/2024-09:59:12] [I] Warmup completed 4 queries over 200 ms
[08/12/2024-09:59:12] [I] Timing trace has 60 queries over 3.11545 s
[08/12/2024-09:59:12] [I] Trace averages of 10 runs:
[08/12/2024-09:59:12] [I] Average on 10 runs - GPU latency: 51.1508 ms - Host latency: 51.9235 ms (end to end 51.9339 ms, enqueue 6.89342 ms)
[08/12/2024-09:59:12] [I] Average on 10 runs - GPU latency: 51.1141 ms - Host latency: 51.8855 ms (end to end 51.8961 ms, enqueue 6.94103 ms)
[08/12/2024-09:59:12] [I] Average on 10 runs - GPU latency: 51.1348 ms - Host latency: 51.9039 ms (end to end 51.9146 ms, enqueue 6.94259 ms)
[08/12/2024-09:59:12] [I] Average on 10 runs - GPU latency: 51.1422 ms - Host latency: 51.9132 ms (end to end 51.9238 ms, enqueue 6.89012 ms)
[08/12/2024-09:59:12] [I] Average on 10 runs - GPU latency: 51.1737 ms - Host latency: 51.9433 ms (end to end 51.9536 ms, enqueue 6.95898 ms)
[08/12/2024-09:59:12] [I] Average on 10 runs - GPU latency: 51.14 ms - Host latency: 51.9092 ms (end to end 51.9192 ms, enqueue 6.85737 ms)
[08/12/2024-09:59:12] [I] Host Latency
[08/12/2024-09:59:12] [I] min: 51.7911 ms (end to end 51.802 ms)
[08/12/2024-09:59:12] [I] max: 52.0718 ms (end to end 52.083 ms)
[08/12/2024-09:59:12] [I] mean: 51.9131 ms (end to end 51.9235 ms)
[08/12/2024-09:59:12] [I] median: 51.9051 ms (end to end 51.9152 ms)
[08/12/2024-09:59:12] [I] percentile: 52.0718 ms at 99% (end to end 52.083 ms at 99%)
[08/12/2024-09:59:12] [I] throughput: 19.2589 qps
[08/12/2024-09:59:12] [I] walltime: 3.11545 s
[08/12/2024-09:59:12] [I] Enqueue Time
[08/12/2024-09:59:12] [I] min: 6.57861 ms
[08/12/2024-09:59:12] [I] max: 7.72876 ms
[08/12/2024-09:59:12] [I] median: 6.8739 ms
[08/12/2024-09:59:12] [I] GPU Compute
[08/12/2024-09:59:12] [I] min: 51.0255 ms
[08/12/2024-09:59:12] [I] max: 51.2957 ms
[08/12/2024-09:59:12] [I] mean: 51.1426 ms
[08/12/2024-09:59:12] [I] median: 51.1315 ms
[08/12/2024-09:59:12] [I] percentile: 51.2957 ms at 99%
[08/12/2024-09:59:12] [I] total compute time: 3.06856 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.engine --fp16

trtexec参数
trtexec是NVIDIA TensorRT SDK中的一个实用工具，它允许用户从命令行轻松运行和测试TensorRT引擎。trtexec命令行工具可以使用以下参数：
其中一些重要的参数如下：

--uff：指定输入为UFF模型，后面跟上模型文件的路径。
--onnx：指定输入为ONNX模型，后面跟上模型文件的路径。
--model：指定输入为序列化的引擎文件，后面跟上文件路径。
--deploy：指定输入为Caffe deploy文件的路径。
--output：指定输出Tensor名称。
--batch：指定执行推理时每个batch的大小，默认为1。
--device：指定执行推理的设备编号，默认为0。
--workspace：指定GPU内存的最大使用量，默认为1GB。
--fp16：启用FP16精度，可提高推理性能和减少内存使用。
--int8：启用INT8精度，可进一步提高推理性能和减少内存使用。
--calib：指定INT8校准数据集的路径。
--useDLA：指定使用哪个DLA，以及在DLA上运行哪些层。
--allowGPUFallback：如果使用DLA，当某些层无法在DLA上运行时，是否允许将其回退到GPU。
--iterations：指定测试迭代次数。
--avgRuns：指定平均运行次数。
--verbose：打印更详细的输出信息。
--loadEngine：指定加载的TensorRT引擎文件，后面跟上文件路径
--saveEngine：指定生成的TensorRT引擎文件，后面跟上文件路径

1.2 模型转换：基于wang-xinyu/tensorrtx 进行模型转换

cd tensorrtx/yolov8

mkdir build

cd bulid

cmake ..

make -j4

cmake .. 报错

cmake -DCMAKE_CUDA_ARCHITECTURES=53 ..

make 报错

查看 yolov8/build/CMakeFiles/CMakeError.log，内容如下

Performing C SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output:
Change Dir: /home/xxx/work/yolov8/tensorrtx/yolov8/build/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/make -f Makefile cmTC_eb756/fast && /usr/bin/make -f CMakeFiles/cmTC_eb756.dir/build.make CMakeFiles/cmTC_eb756.dir/build
make[1]: Entering directory '/home/xxx/work/yolov8/tensorrtx/yolov8/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_eb756.dir/src.c.o
/usr/bin/cc -DCMAKE_HAVE_LIBC_PTHREAD -fPIC -o CMakeFiles/cmTC_eb756.dir/src.c.o -c /home/xxx/work/yolov8/tensorrtx/yolov8/build/CMakeFiles/CMakeTmp/src.c
Linking C executable cmTC_eb756
/home/xxx/miniforge3/envs/yolo8/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_eb756.dir/link.txt --verbose=1
/usr/bin/cc -fPIC CMakeFiles/cmTC_eb756.dir/src.c.o -o cmTC_eb756
CMakeFiles/cmTC_eb756.dir/src.c.o: In function `main':
src.c:(.text+0x48): undefined reference to `pthread_create'
src.c:(.text+0x50): undefined reference to `pthread_detach'
src.c:(.text+0x58): undefined reference to `pthread_cancel'
src.c:(.text+0x64): undefined reference to `pthread_join'
src.c:(.text+0x74): undefined reference to `pthread_atfork'
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_eb756.dir/build.make:98: recipe for target 'cmTC_eb756' failed
make[1]: *** [cmTC_eb756] Error 1
make[1]: Leaving directory '/home/xxx/work/yolov8/tensorrtx/yolov8/build/CMakeFiles/CMakeTmp'
Makefile:127: recipe for target 'cmTC_eb756/fast' failed
make: *** [cmTC_eb756/fast] Error 2

Source file was:
#include <pthread.h>

static void* test_func(void* data)
{
return data;
}

int main(void)
{
pthread_t thread;
pthread_create(&thread, NULL, test_func, NULL);
pthread_detach(thread);
pthread_cancel(thread);
pthread_join(thread, NULL);
pthread_atfork(NULL, NULL, NULL);
pthread_exit(NULL);

return 0;
}

Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /home/xxx/work/yolov8/tensorrtx/yolov8/build/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/make -f Makefile cmTC_74e77/fast && /usr/bin/make -f CMakeFiles/cmTC_74e77.dir/build.make CMakeFiles/cmTC_74e77.dir/build
make[1]: Entering directory '/home/xxx/work/yolov8/tensorrtx/yolov8/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_74e77.dir/CheckFunctionExists.c.o
/usr/bin/cc -fPIC -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_74e77.dir/CheckFunctionExists.c.o -c /home/xxx/miniforge3/envs/yolo8/lib/python3.8/site-packages/cmake/data/share/cmake-3.20/Modules/CheckFunctionExists.c
Linking C executable cmTC_74e77
/home/xxx/miniforge3/envs/yolo8/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_74e77.dir/link.txt --verbose=1
/usr/bin/cc -fPIC -DCHECK_FUNCTION_EXISTS=pthread_create CMakeFiles/cmTC_74e77.dir/CheckFunctionExists.c.o -o cmTC_74e77 -lpthreads
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_74e77.dir/build.make:98: recipe for target 'cmTC_74e77' failed
make[1]: *** [cmTC_74e77] Error 1
make[1]: Leaving directory '/home/xxx/work/yolov8/tensorrtx/yolov8/build/CMakeFiles/CMakeTmp'
Makefile:127: recipe for target 'cmTC_74e77/fast' failed
make: *** [cmTC_74e77/fast] Error 2