![](https://img-blog.csdnimg.cn/20201014180756927.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
TensorRT
冬日and暖阳
Github: https://github.com/pengfeidip
邮箱:pengfeidip@qq.com
展开
-
python numpy 错误:AttributeError: module ‘numpy‘ has no attribute ‘bool‘
numpy错误原创 2023-01-11 15:04:50 · 10931 阅读 · 2 评论 -
TensorRT多卡并行
这次做TensorRT的多卡并行加速积累了一些经验教训概念device: GPUhost:cpuengine: TensorRT优化得到的模型,里面保存有关此模型的所有信息(比如权重,输入输出的名字/维度)context:由engine生成,用于实际的推理。一个engine可以拥有多个context,这是为了方便同一个模型权重用于不同的任务。stream:cuda 流,用于控制异步cuda操作遇到的问题问题 一0卡上的一个context前向没问题,但是1卡上的有问题,报错“cuda er原创 2022-01-14 17:58:26 · 2721 阅读 · 5 评论 -
TensorRT里的专业名词解析
专业名次解析BatchA batch is a collection of inputs that can all be processed uniformly. Each instance in thebatch has the same shape and flows through the network in exactly the same way. Allinstances can therefore be computed in parallel.BuilderTensorRT’s原创 2021-08-15 17:03:13 · 371 阅读 · 0 评论 -
TensorRT解析caffemodel支持的层
TensorRT版本:7.2根据官方文档,caffe parser可以解析以下层BatchNormalizationBNLLClip12ConcatenationConvolutionCropDeconvolutionDropoutElementWiseELUInnerProductInputLeakyReLULRNPermutePoolingPowerReductionReLU, TanH, and SigmoidReshapeSoftMaxScale原创 2021-05-24 10:06:27 · 248 阅读 · 0 评论 -
TensorRT engine的相关信息
原文The Engine interface allows the application to execute inference. It supports synchronous and asynchronous execution, profiling, and enumeration and querying of the bindings for the engine inputs and outputs. A single-engine can have multiple execution原创 2021-05-06 08:47:41 · 366 阅读 · 0 评论 -
TensorRT 分析engine的每一层的耗时
转载自:https://blog.csdn.net/hjxu2016/article/details/109258566核心代码如下// profile类,继承自 IProfilerstruct Profiler : public IProfiler{ typedef std::pair<std::string, float> Record; std::vector<Record> mProfile; // 将每一层的运行时间存放到 vector中转载 2021-04-26 14:30:39 · 1189 阅读 · 0 评论 -
TensorRT trtexec的用法说明
TensorRT Command-Line Wrapper: trtexecTable Of ContentsDescriptionBuilding trtexecUsing trtexecExample 1: Simple MNIST model from CaffeExample 2: Profiling a custom layerExample 3: Running a network on DLAExample 4: Running an ONNX model with ful原创 2021-04-25 18:52:28 · 16373 阅读 · 0 评论 -
TensorRT FP16 half2model
\qquad TensorRT can use 16-bit instead of 32-bit arithmetic and tensors, but this alone may not deliver significant performance benefits. Half2Mode is an execution mode where internal tensors interleave 16-bits from adjacent pairs of images, and is the fas原创 2021-02-28 16:58:02 · 546 阅读 · 0 评论 -
Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Magic tag does not match)
反序列化模型的时候遇到以下提示问题:Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Magic tag does not match)原因:模型是在TensorRT5上面序列化得到的,但是目前项目我切换到了TensorRT7,因此用TensorRT7的代码试图去加载TensorRT5的序列化得到的模型就会出现以上错误...原创 2021-02-22 10:25:37 · 4116 阅读 · 19 评论 -
TensorRT得到的CUDA引擎序列化以后模型大小不一样
现象TensorRT得到的CUDA引擎序列化以后模型大小不一样。我是从caffemodel解析生成的TensorRT-engine,发现同一个caffemodel得到的TensorRT-engine大小不一,大小每次相差几到几百KB,但是模型的检测结果一致。 疑问在于,优化过程应该不存在随机的过程,所以这个模型体积大小的变化从何而来呢?原因有人遇到了类似的问题,官方给了解答(需要梯子),见下图总的来说,生成CUDA-engine的时候,结合了你的GPU, os/kernel,cpu, 系统负载,原创 2021-01-14 11:18:40 · 827 阅读 · 0 评论 -
TensorRT setMaxWorkspaceSize的含义
NVIDIA的官方入门博客有一段说的比较详细TensorRT allows you to increase GPU memory footprint during the engine building phase with the setMaxWorkspaceSize function. Increasing the limit may affect the number of applications that could share the GPU at the same time. Setti原创 2020-12-28 08:44:22 · 3759 阅读 · 0 评论 -
TensorRT API疑问记录
基于TensorRT 5以及TensorRT 7问题1 builder->setHalf2Mode()以及builder->setInt8Mode()的实际含义是什么目前找到的答案如下:Enable INT8 mode.Setting this flag ensures that builder auto - tuner will consider INT8 implementation问题2 上面提到的auto-tuner是什么...原创 2020-12-23 17:48:12 · 190 阅读 · 0 评论 -
TensorRT 7.0.0.0 int8量化的说明
本文主要是翻译、总结INT8量化的官方例子里面的README.md一、翻译DescriptionsampleINT8这个例子,执行的是INT8的校准和推理具体的说,这个例子演示了如何使用INT8推理。INT8推理仅在计算能力6.1以及7.X的GPU上可用。网络校准之后,校准的输出会缓存下来以避免重复这个过程。然后,你可以用任何深度学习框架重现你自己的实验,以便在ImageNet网络上验证你的结果。How does this sample work?INT8 engine和FP32 engine以原创 2020-12-23 16:55:17 · 2669 阅读 · 1 评论 -
TensorRT7.2版本,构建网络API整理
参考字,官方APInvinfer1::INetworkDefinition 用于构建网络结构时,支持的API操作(version 7.2.1.6)实例化对象 network操作名接口废弃说明network -> addInputnetwork -> addConvolutionSuperseded by addConvolutionNd and will be removed in TensorRT 9.0.network -> a原创 2020-12-19 16:27:54 · 832 阅读 · 1 评论 -
TensorRT插件RPROI的使用说明
NvPluginFasterRCNN PluginTable Of ContentsDescriptionStructureParametersAdditional resourcesLicenseChangelogKnown issuesDescriptionThe NvPluginFasterRCNN performs object detection for the Faster R-CNN model. This plugin is included in Tensor原创 2020-12-18 10:05:24 · 445 阅读 · 0 评论 -
TensorRT fp16 int8优化原理
https://blog.csdn.net/meng825/article/details/103968626转载 2020-12-04 10:43:59 · 1144 阅读 · 2 评论 -
TensorRT多GPU的使用
来自于开发者手册Q: How do I use TensorRT on multiple GPUs?如何在多GPU环境中使用TensorRTA: Each ICudaEngine object is bound to a specific GPU when it is instantiated, eitherby the builder or on deserialization. To select the GPU, use cudaSetDevice() beforecalling the b原创 2020-12-03 17:25:57 · 3469 阅读 · 1 评论 -
TensorRT解析caffemodel,获取每一层的名字
IBuilder* builder = createInferBuilder(gLogger); INetworkDefinition* network = builder->createNetwork(); for (int i = 0; i < network->getNbInputs(); ++i) { std::string tName = network->getInput(i)->getName(); std::cout << tNam.原创 2020-12-03 10:00:38 · 633 阅读 · 0 评论 -
TensorRT 一个builder只能在一个线程上运行
2.4. Thread SafetyThe TensorRT builder may only be used by one thread at a time. If you need to run multiple builds simultaneously, you will need to create multiple builders.参考:https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html...原创 2020-11-23 18:54:58 · 426 阅读 · 0 评论 -
windows下TensorRT的配置
参考官方文档:https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html#installing下载tensorrt并解压到D:\Environment:https://developer.nvidia.com/nvidia-tensorrt-5x-download添加环境变量:D:\Environment\TensorRT-5.1.2.2\lib复制lib下dll 至 C:\Program Files\NVI转载 2020-05-11 09:05:31 · 1424 阅读 · 0 评论