零基础配置jetson nano深度学习yolov5环境（三、配置YOLOV5检测导出环境以及使用tensorrt加速推理，加速有python和c++两个版本）

LaternZ

已于 2023-08-17 10:41:31 修改

阅读量376

点赞数

分类专栏：零基础配置jetson nano深度学习yolov5 文章标签：深度学习 YOLO python

于 2023-08-14 18:12:57 首次发布

本文链接：https://blog.csdn.net/qq_45874142/article/details/132271997

版权

零基础配置jetson nano深度学习yolov5 专栏收录该内容

4 篇文章 6 订阅

订阅专栏

零基础配置jetson nano深度学习yolov5环境（三、配置YOLOV5检测导出环境以及使用tensorrt加速推理，加速有python和c++两个版本）

前言

前面已经配置成功了torch以及cuda环境，接下来就配置yolo检测的环境，并用官方的detect检测pt模型，以及使用export导出tensorrt加速的engine模型，并进行检测。除了使用官方的python调用tensorrt检测以外，也用c++调用tensorrt加速。
先说一下结果：具体速度看下图。python调用启动速度很慢，容易崩，检测速度跟c++差不多，但是c++启动很快，占用内存也少。从我的感受来看，半精度下速度快，但c++貌似精度丢失更多。
因此为了加速，也为了精度高，选择c++版本。
建议通篇阅读再操作，这样才明白为什么要这么操作。
在这里插入图片描述

一、配置YOLOv5检测环境并进行检测

1、下载YOLOv5

建议下载6.0版本以上的，因为我用的v6.0不错。上github，搜索yolov5就出来了，选择分支为6.0的。后面下载tensorrt也要6.0版本以上。

2、配置环境

这个比较简单，直接运行detect.py。缺什么就补什么吧，注意提前在requirements.txt文件中注释掉不用的，你可以在电脑上打开查看detect.py文件都需要什么。请参考我的。主要是一个qt。
在这里插入图片描述
运行起来后，报错缺的是cv2和torch。于是安装cv2，百度一大堆，可以自己找。

pip install opencv-python
#可能需要安装依赖，缺什么就装什么，直接pip  install。一般可能不止缺一个，安装这个又会缺另一个，所以套娃式安装就行。

再就是提示没有torch模块，我在这里卡了很久。可以看这个专栏的上一节的最后的遇到的问题。大多数人应该不会遇到这个问题。

2、运行检测

官方的detect.py文件里面说的很清楚。
在这里插入图片描述
具体的参数见后面的参数配置，无非就用前三个。

二、配置YOLOV5导出engine模型

1.直接运行export.py，缺什么补什么

在这里插入图片描述
具体参数看后面，其实就使用两个参数：–weights，这个是原pt文件，再就是–include，这个是导出模型类型。

2.缺onnx

自己百度或者直接pip试试。

pip install onnx
#可能安装不上，需要指定版本安装。这是我的版本pip install onnx==1.9

3.再次导出engine模型

这个配置好基本没啥大问题，一般都可以导出来。include后面的只需要写一个engine就行，默认是FP32全精度，你也可以最后加上–half 导出FP16半精度模型。
这是我的一个流程。我没多少内存了，所以就主动断了。而且都非常慢，大概20分钟左右，安静等待。
在这里插入图片描述

4.检测engine模型

调用detect.py可以检测，就是启动速度有点慢，占用内存多。检测要退出，点击在视频窗口，再按下q就行。这样结果会保存到runs里面。

三、C++调用tensorrt加速

这个的好处是启动速度快，检测帧率或许也有提升，但不会降低。

1.了解目前主要的加速方法

目前加速有很多的方法，但都是先转onnx再说，然后再转engine。很多介绍也没有说的很明白的。大家这个可以看看手写AI的b站介绍。我觉得介绍的不错。
目前主要用的最多的是wangxinyu的转wts再转engine。
也有deepstream方式的，这个自己百度吧，时间长我忘记了。这个检测速度跟c++差不多的。但技多不压身，也可以搞一下。
也有官方的转onnx，再转engine或者trt模型的。yolo官方可直接导出，用tensorrt官方的trtexc也可以导出。
也有手写AI的调整onnx结构然后编译输出engine的，我看了一点，调整onnx不难，就是调整后不会了，也没有一个指导说明文件，总体操作也不简单。但听大佬说维护起来比较方便。

2.用wts转engine方法

我们这里用主流的wts转engine方法，大佬主页www.github.con/wang-xinyu
我们进去找tensorrtx项目下的yoloV5中的6.0分支，这个必须选。他的readme有详细介绍。

1. 下载到板子上

下载的保存位置随便，我建议下载到yolov5里面，其中yolov5trt6就是我下载好的。
在这里插入图片描述
下载的文件内容如下，我这里已经自行创建好build了

2. 在我们的yolo路径下，利用他的gen-wts文件导出wts

首先你要拷贝gen-wts.py返回到yolo的路径下，也就是有utils的文件夹路径下，因为会用到utils里面的库函数。然后就执行下面语句。这个导出过程需要等一会儿。

python gen_wts.py -w yolov5s.pt -o yolov5s.wts

3.在build里面编译运行

进入他的文件夹里面，也就是我们下载好的yolov5trt6，新创建一个build文件夹，文件名其实很随便。
在创建好后进入这个build文件夹里面，把刚刚生成的wts文件拷贝过来一份。
然后需要编译，编译之前需要确定一些参数，每个版本的readme都有介绍，这个版本说是在 yololayer.h中，所以在这个文件里面更改。一般更改一个FP16或者32，再就是置信度阈值，以及bs。
更改之后就保存，之后就在build路径下cmake …, 再make -j2，多线程卡的话，用make就行。
ps：没有cmake就下载一个cmake。这个自行百度一下，安装好后直接-V看看版本，如果没有那就需要配置环境变量。

4.编译成功后，利用wts转engine

就不要手动改里面任何文件的名字了。按照你生成的wts文件可以生成你的engine文件，这个过程有点费时间，静静等待。不会超过半个小时。

sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

5.检测测试

检测后会保存到output文件夹下。但是仅仅能检测图片，而且检测标签没有置信度。

sudo ./yolov5 -d yolov5s.engine  图片路径

3.更改摄像头、RTSP视频检测和增加置信度

注意的是，更改视频检测更换的是yolov5.cpp文件。如下图。
在这里插入图片描述

1.更换思路

在更换之后需要重新创建一个新的build，以免破坏之前的build，需要用之前的build的话，只需要切换回来原来的yolov5.cpp就行了，当然你也可以直接替换，无非就是每次都要删掉build文件，重新cmake。创建新的build再进行cmake… 和make，如果内存大，可以多线程编译 make -j8。注意：大改动就需要删掉build里面的，小改动就不需要删了，直接最后make -j2就行。

2.增加置信度标签显示，增加视频检测（本地摄像头和远程RTSP流），增加双目摄像头分割显示单路，

（1）增加置信度

下面是我改的视频检测yolov5.cpp，在别人基础上加了置信度显示。置信度就是res[j].conf。我现在显示置信度小数有好几位，你也可以再修改显示前两位小数，我用的是FP32，置信度阈值是0.25，内存小的话，bs别设置太大，先试试1，查看一下内存利用率咋样，很高的话就不建议改了。
在这里插入图片描述

（2）视频检测

首先是摄像头和视频流检测，只改这两行就行了，别的不用管。
在这里插入图片描述
然后是双目摄像投的单目处理。
因为我的摄像头是双目，打开摄像头是两路一个窗口显示。所以我强行切分一半，选取了左侧的视频进行预测处理。你也可以手动或者用gpt改改。切分就是这个地方。

这里改完后，记得显示窗口也要改，ctrl+f查找frame，别漏掉。
在这里插入图片描述

当然如果是单目，上面的frame就改回去原来的，然后分割成左右的frame也要注释掉。

3.检测效果展：

在这里插入图片描述

//yolov5.cpp
#include <iostream>
#include <chrono>
#include <cmath>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"

#define USE_FP32  // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0  // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.25
#define BATCH_SIZE 1

// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1;  // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;

char* my_classes[] = { "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
         "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
         "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
         "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard","surfboard",
         "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
         "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
         "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
         "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
         "hair drier", "toothbrush" };

static int get_width(int x, float gw, int divisor = 8) {
    //return math.ceil(x / divisor) * divisor
    if (int(x * gw) % divisor == 0) {
        return int(x * gw);
    }
    return (int(x * gw / divisor) + 1) * divisor;
}

static int get_depth(int x, float gd) {
    if (x == 1) {
        return 1;
    }
    else {
        return round(x * gd) > 1 ? round(x * gd) : 1;
    }
}

ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
    INetworkDefinition* network = builder->createNetworkV2(0U);

    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
    assert(data);

    std::map<std::string, Weights> weightMap = loadWeights(wts_name);

    /* ------ yolov5 backbone------ */
    auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
    auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
    auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
    auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
    auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
    auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
    auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");
    auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, "model.8");

    /* ------ yolov5 head ------ */
    auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.9");
    auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, "model.10");

    auto upsample11 = network->addResize(*conv10->getOutput(0));
    assert(upsample11);
    upsample11->setResizeMode(ResizeMode::kNEAREST);
    upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions());

    ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) };
    auto cat12 = network->addConcatenation(inputTensors12, 2);
    auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.13");
    auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, "model.14");

    auto upsample15 = network->addResize(*conv14->getOutput(0));
    assert(upsample15);
    upsample15->setResizeMode(ResizeMode::kNEAREST);
    upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions());

    ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) };
    auto cat16 = network->addConcatenation(inputTensors16, 2);

    auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.17");

    // yolo layer 0
    IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);
    auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, "model.18");
    ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) };
    auto cat19 = network->addConcatenation(inputTensors19, 2);
    auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.20");
    //yolo layer 1
    IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);
    auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, "model.21");
    ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) };
    auto cat22 = network->addConcatenation(inputTensors22, 2);
    auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
    IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);

    auto yolo = addYoLoLayer(network, weightMap, "model.24", std::vector<IConvolutionLayer*>{det0, det1, det2});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));

    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
    std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
    assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);
#endif

    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;

    // Don't need the network any more
    network->destroy();

    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*)(mem.second.values));
    }

    return engine;
}

ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
    INetworkDefinition* network = builder->createNetworkV2(0U);

    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
    assert(data);

    std::map<std::string, Weights> weightMap = loadWeights(wts_name);

    /* ------ yolov5 backbone------ */
    auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
    auto c3_2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
    auto conv3 = convBlock(network, weightMap, *c3_2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
    auto c3_4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
    auto conv5 = convBlock(network, weightMap, *c3_4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
    auto c3_6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
    auto conv7 = convBlock(network, weightMap, *c3_6->getOutput(0), get_width(768, gw), 3, 2, 1, "model.7");
    auto c3_8 = C3(network, weightMap, *conv7->getOutput(0), get_width(768, gw), get_width(768, gw), get_depth(3, gd), true, 1, 0.5, "model.8");
    auto conv9 = convBlock(network, weightMap, *c3_8->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.9");
    auto spp10 = SPP(network, weightMap, *conv9->getOutput(0), get_width(1024, gw), get_width(1024, gw), 3, 5, 7, "model.10");
    auto c3_11 = C3(network, weightMap, *spp10->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.11");

    /* ------ yolov5 head ------ */
    auto conv12 = convBlock(network, weightMap, *c3_11->getOutput(0), get_width(768, gw), 1, 1, 1, "model.12");
    auto upsample13 = network->addResize(*conv12->getOutput(0));
    assert(upsample13);
    upsample13->setResizeMode(ResizeMode::kNEAREST);
    upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions());
    ITensor* inputTensors14[] = { upsample13->getOutput(0), c3_8->getOutput(0) };
    auto cat14 = network->addConcatenation(inputTensors14, 2);
    auto c3_15 = C3(network, weightMap, *cat14->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.15");

    auto conv16 = convBlock(network, weightMap, *c3_15->getOutput(0), get_width(512, gw), 1, 1, 1, "model.16");
    auto upsample17 = network->addResize(*conv16->getOutput(0));
    assert(upsample17);
    upsample17->setResizeMode(ResizeMode::kNEAREST);
    upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions());
    ITensor* inputTensors18[] = { upsample17->getOutput(0), c3_6->getOutput(0) };
    auto cat18 = network->addConcatenation(inputTensors18, 2);
    auto c3_19 = C3(network, weightMap, *cat18->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.19");

    auto conv20 = convBlock(network, weightMap, *c3_19->getOutput(0), get_width(256, gw), 1, 1, 1, "model.20");
    auto upsample21 = network->addResize(*conv20->getOutput(0));
    assert(upsample21);
    upsample21->setResizeMode(ResizeMode::kNEAREST);
    upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions());
    ITensor* inputTensors21[] = { upsample21->getOutput(0), c3_4->getOutput(0) };
    auto cat22 = network->addConcatenation(inputTensors21, 2);
    auto c3_23 = C3(network, weightMap, *cat22->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.23");

    auto conv24 = convBlock(network, weightMap, *c3_23->getOutput(0), get_width(256, gw), 3, 2, 1, "model.24");
    ITensor* inputTensors25[] = { conv24->getOutput(0), conv20->getOutput(0) };
    auto cat25 = network->addConcatenation(inputTensors25, 2);
    auto c3_26 = C3(network, weightMap, *cat25->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.26");

    auto conv27 = convBlock(network, weightMap, *c3_26->getOutput(0), get_width(512, gw), 3, 2, 1, "model.27");
    ITensor* inputTensors28[] = { conv27->getOutput(0), conv16->getOutput(0) };
    auto cat28 = network->addConcatenation(inputTensors28, 2);
    auto c3_29 = C3(network, weightMap, *cat28->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.29");

    auto conv30 = convBlock(network, weightMap, *c3_29->getOutput(0), get_width(768, gw), 3, 2, 1, "model.30");
    ITensor* inputTensors31[] = { conv30->getOutput(0), conv12->getOutput(0) };
    auto cat31 = network->addConcatenation(inputTensors31, 2);
    auto c3_32 = C3(network, weightMap, *cat31->getOutput(0), get_width(2048, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.32");

    /* ------ detect ------ */
    IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.0.weight"], weightMap["model.33.m.0.bias"]);
    IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.1.weight"], weightMap["model.33.m.1.bias"]);
    IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.2.weight"], weightMap["model.33.m.2.bias"]);
    IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.3.weight"], weightMap["model.33.m.3.bias"]);

    auto yolo = addYoLoLayer(network, weightMap, "model.33", std::vector<IConvolutionLayer*>{det0, det1, det2, det3});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));

    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
    std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
    assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);
#endif

    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;

    // Don't need the network any more
    network->destroy();

    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*)(mem.second.values));
    }

    return engine;
}

void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, float& gd, float& gw, std::string& wts_name) {
    // Create builder
    IBuilder* builder = createInferBuilder(gLogger);
    IBuilderConfig* config = builder->createBuilderConfig();

    // Create model to populate the network, then set the outputs and create an engine
    ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);
    assert(engine != nullptr);

    // Serialize the engine
    (*modelStream) = engine->serialize();

    // Close everything down
    engine->destroy();
    builder->destroy();
    config->destroy();
}

void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, float* input, float* output, int batchSize) {
    // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
    CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
    context.enqueue(batchSize, buffers, stream, nullptr);
    CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
    cudaStreamSynchronize(stream);
}

bool parse_args(int argc, char** argv, std::string& engine) {
    if (argc < 3) return false;
    if (std::string(argv[1]) == "-v" && argc == 3) {
        engine = std::string(argv[2]);
    }
    else {
        return false;
    }
    return true;
}

int main(int argc, char** argv) {
    cudaSetDevice(DEVICE);

    //std::string wts_name = "";
    std::string engine_name = "";
    //float gd = 0.0f, gw = 0.0f;
    //std::string img_dir;

    if (!parse_args(argc, argv, engine_name)) {
        std::cerr << "arguments not right!" << std::endl;
        std::cerr << "./yolov5 -v [.engine] // run inference with camera" << std::endl;
        return -1;
    }

    std::ifstream file(engine_name, std::ios::binary);
    if (!file.good()) {
        std::cerr << " read " << engine_name << " error! " << std::endl;
        return -1;
    }
    char* trtModelStream{ nullptr };
    size_t size = 0;
    file.seekg(0, file.end);
    size = file.tellg();
    file.seekg(0, file.beg);
    trtModelStream = new char[size];
    assert(trtModelStream);
    file.read(trtModelStream, size);
    file.close();


    // prepare input data ---------------------------
    static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
    //for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
    //    data[i] = 1.0;
    static float prob[BATCH_SIZE * OUTPUT_SIZE];
    IRuntime* runtime = createInferRuntime(gLogger);
    assert(runtime != nullptr);
    ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
    assert(engine != nullptr);
    IExecutionContext* context = engine->createExecutionContext();
    assert(context != nullptr);
    delete[] trtModelStream;
    assert(engine->getNbBindings() == 2);
    void* buffers[2];
    // In order to bind the buffers, we need to know the names of the input and output tensors.
    // Note that indices are guaranteed to be less than IEngine::getNbBindings()
    const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
    const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
    assert(inputIndex == 0);
    assert(outputIndex == 1);
    // Create GPU buffers on device
    CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));
    CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
    // Create stream
    cudaStream_t stream;
    CUDA_CHECK(cudaStreamCreate(&stream));

    cv::VideoCapture capture(0);
    //cv::VideoCapture capture("rtsp:");
    //cv::VideoCapture capture("rtsp:");
    //int fourcc = cv::VideoWriter::fourcc('M','J','P','G');
    //capture.set(cv::CAP_PROP_FOURCC, fourcc);
    if (!capture.isOpened()) {
        std::cout << "Error opening video stream or file" << std::endl;
        return -1;
    }

    int key;
    int fcount = 0;
    while (1)
    {
        cv::Mat frame;
        capture >> frame;
        if (frame.empty())
        {
            std::cout << "Fail to read image from camera!" << std::endl;
            break;
        }

        fcount++;
        cv::Mat lframe = frame(cv::Rect(0, 0, frame.cols / 2, frame.rows));
        cv::Mat rframe = frame(cv::Rect(frame.cols / 2, 0, frame.cols / 2, frame.rows));

        //if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue;
        for (int b = 0; b < fcount; b++) {
            //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
            cv::Mat img = frame;
            if (img.empty()) continue;
            cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB
            int i = 0;
            for (int row = 0; row < INPUT_H; ++row) {
                uchar* uc_pixel = pr_img.data + row * pr_img.step;
                for (int col = 0; col < INPUT_W; ++col) {
                    data[b * 3 * INPUT_H * INPUT_W + i] = (float)uc_pixel[2] / 255.0;
                    data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float)uc_pixel[1] / 255.0;
                    data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float)uc_pixel[0] / 255.0;
                    uc_pixel += 3;
                    ++i;
                }
            }
        }

        // Run inference
        auto start = std::chrono::system_clock::now();
        doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
        auto end = std::chrono::system_clock::now();
        //std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
        int fps = 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
        std::vector<std::vector<Yolo::Detection>> batch_res(fcount);
        for (int b = 0; b < fcount; b++) {
            auto& res = batch_res[b];
            nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
        }
        for (int b = 0; b < fcount; b++) {
            auto& res = batch_res[b];
            //std::cout << res.size() << std::endl;
            //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
            for (size_t j = 0; j < res.size(); j++) {
                cv::Rect r = get_rect(frame, res[j].bbox);
                cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);

                std::string label = my_classes[(int)res[j].class_id];

                float confidence = res[j].conf;
                std::string label_with_confidence = label + " " + std::to_string(confidence);

                cv::putText(frame, label_with_confidence, cv::Point(r.x, r.y - 1), cv::FONT_HERSHEY_PLAIN, 1.5, cv::Scalar(0, 255, 0), 2);
                //add FPS in the windows
				std::string jetson_fps = "FPS: " + std::to_string(fps);
				cv::putText(frame, jetson_fps, cv::Point(11,80), cv::FONT_HERSHEY_PLAIN, 3, cv::Scalar(0, 0, 255), 2, cv::LINE_AA);
            }
            //cv::imwrite("_" + file_names[f - fcount + 1 + b], img);
        }
        cv::imshow("video detection", frame);
        key = cv::waitKey(1);
        if (key == 'q') {
            break;
        }
        fcount = 0;
    }

    capture.release();
    // Release stream and buffers
    cudaStreamDestroy(stream);
    CUDA_CHECK(cudaFree(buffers[inputIndex]));
    CUDA_CHECK(cudaFree(buffers[outputIndex]));
    // Destroy the engine
    context->destroy();
    engine->destroy();
    runtime->destroy();

    return 0;
}

有个严重问题

每次开机一直卡在nvidia界面，无法进入系统。也重新拔插并清理了sd卡，仍然有问题。这个板子自己设计了启动系统的按键，可以自动启动，也可以手动按键启动，自动启动一次没成功过。只能手动启动，但是重复半个小时也没有启动，真的是离谱。我觉得应该就是那个sd卡插口有问题，我弹出来重新按回去，并往板子的方向紧按了一下才好了。
这国产替代板真的还有很多路要走啊，各种元器件拼凑堆叠肯定不行的，必须上下一体，全部用自己的底层写才更安全。我看了他的核心板，系统sd卡是插在核心板上的，这跟官方原版不一样啊，难免出问题啊。核心板温度高，程序运行多，确定不会影响SD卡？

更新

这个解决了，是供电问题，直接接电池和电源输入不太行，我加了一个开关，接电以后开关才会给板子供电，这样开机就没问题了。

总结

这部分介绍了yolov5环境的配置，完成pt模型的检测，并完成了engine的导出以及调用检测。同时也用c++调用tensorrt加速推理。

LaternZ

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
零基础配置jetson nano深度学习yolov5环境（三、配置YOLOV5检测导出环境以及使用tensorrt加速推理，加速有python和c++两个版本）

前面已经配置成功了torch以及cuda环境，接下来就配置yolo检测的环境，并用官方的detect检测pt模型，以及使用export导出tensorrt加速的engine模型，并进行检测。除了使用官方的python调用tensorrt检测以外，也用c++调用tensorrt加速。python调用启动速度很慢，容易崩，检测速度跟c++差不多，但是c++启动很快，占用内存也少。
复制链接

扫一扫