TensorRT的循环样例代码

官方文档地址 https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#define-loops

非顺序结构,其内容确实有点乱,而且没有完整可运行的样例。

在这里插入图片描述
可以有多个IIteratorLayer, IRecurrenceLayer, and ILoopOutputLayer 层,最多有2个ITripLimitLayers层。

这里贴2个可运行的代码样例,分别是2种ITripLimitLayers层(TripLimit::kCOUNT 、 TripLimit::kWHILE),简单可运行的代码以帮助入门:

trt.TripLimit.COUNT

loop.add_trip_limit(trip_limit.get_output(0), trt.TripLimit.COUNT)

trt.TripLimit.WHILE

实现类似于for(i = 0; i<3;i++)

i_init = network.add_constant(shape=(), weights=trt.Weights(np.array([0], dtype=np.dtype("i"))))
i_one = network.add_constant(shape=(), weights=trt.Weights(np.array([1], dtype=np.dtype("i"))))
i_stop = network.add_constant(shape=(), weights=trt.Weights(np.array([num_iterations], dtype=np.dtype("i"))))
iRec = loop.add_recurrence(i_init.get_output(0))
iContinue = network.add_elementwise(iRec.get_output(0), i_stop.get_output(0), op=trt.ElementWiseOperation.LESS)
loop.add_trip_limit(iContinue.get_output(0), trt.TripLimit.WHILE)
iNext = network.add_elementwise(iRec.get_output(0), i_one.get_output(0), op=trt.ElementWiseOperation.SUM)
iRec.set_input(1, iNext.get_output(0))

可运行的完整样例

import numpy as np
import tensorrt as trt
from tensorrt import INetworkDefinition
from trt_inference import TRTInference


logger = trt.Logger(trt.Logger.WARNING)
# class MyLogger(trt.ILogger):
#     def __init__(self):
#        trt.ILogger.__init__(self)

#     def log(self, severity, msg):
#         pass # Your custom logging implementation here
# logger = MyLogger()

builder = trt.Builder(logger)
network = builder.create_network(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION)


num_iterations = 3
trip_limit = network.add_constant(shape=(), weights=trt.Weights(np.array([num_iterations], dtype=np.dtype("i"))))
accumaltor_value = network.add_input("input1", dtype=trt.float32, shape=(2, 3))
accumaltor_added_value = network.add_input("input2", dtype=trt.float32, shape=(2, 3))
loop = network.add_loop()
# setting the ITripLimit layer to stop after `num_iterations` iterations
loop.add_trip_limit(trip_limit.get_output(0), trt.TripLimit.COUNT)
# initialzing the IRecurrenceLayer with a init value
rec = loop.add_recurrence(accumaltor_value)
# eltwise inputs are 'accumaltor_added_value', and the IRecurrenceLayer output.
eltwise = network.add_elementwise(accumaltor_added_value, rec.get_output(0), op=trt.ElementWiseOperation.SUM)
# wiring the IRecurrenceLayer with the output of eltwise.
# The IRecurrenceLayer output would now be `accumaltor_value` for the first iteration, and the eltwise output for any other iteration
rec.set_input(1, eltwise.get_output(0))
# marking the IRecurrenceLayer output as the Loop output
loop_out = loop.add_loop_output(rec.get_output(0), trt.LoopOutput.LAST_VALUE)
# marking the Loop output as the network output
network.mark_output(loop_out.get_output(0))


inputs = {}
outputs = {}
expected = {}

inputs[accumaltor_value.name] = np.array(
    [
        [2.7, -4.9, 23.34],
        [8.9, 10.3, -19.8],
    ])
inputs[accumaltor_added_value.name] = np.array(
    [
        [1.1, 2.2, 3.3],
        [-5.7, 1.3, 4.6],
    ])

outputs[loop_out.get_output(0).name] = eltwise.get_input(0).shape
expected[loop_out.get_output(0).name] = inputs[accumaltor_value.name] + inputs[accumaltor_added_value.name] * num_iterations
print("Expected:", expected)

builder_config = builder.create_builder_config()
builder_config.set_flag(trt.BuilderFlag.VERSION_COMPATIBLE)
builder_config.set_flag(trt.BuilderFlag.EXCLUDE_LEAN_RUNTIME)
plan = builder.build_serialized_network(network, builder_config)

# v10_runtime = trt.Runtime(logger)
# v8_shim_runtime = v10_runtime.load_runtime('/home/mark.yj/TensorRT-8.6.1.6/bin/trtexec')
# engine = v10_runtime.deserialize_cuda_engine(plan)
trtInfer = TRTInference(plan)
r = trtInfer.infer(inputs, outputs)
print("Prediction:", r)

  • 7
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个基本的Python代码,用于使用TensorRT推理一个经过优化的模型: ```python import tensorrt as trt import pycuda.autoinit import pycuda.driver as cuda import numpy as np # Load the serialized engine from file TRT_LOGGER = trt.Logger(trt.Logger.WARNING) with open("model.engine", "rb") as f: engine_data = f.read() # Deserialize the engine runtime = trt.Runtime(TRT_LOGGER) engine = runtime.deserialize_cuda_engine(engine_data) # Allocate input and output memory buffers input_shape = (1, 3, 224, 224) output_shape = (1, 1000) input_host = cuda.pagelocked_empty(np.prod(input_shape), dtype=np.float32) output_host = cuda.pagelocked_empty(np.prod(output_shape), dtype=np.float32) input_device = cuda.mem_alloc(input_host.nbytes) output_device = cuda.mem_alloc(output_host.nbytes) # Create a CUDA stream for device memory operations stream = cuda.Stream() # Create an execution context from the deserialized engine context = engine.create_execution_context() # Copy input data to device memory cuda.memcpy_htod_async(input_device, input_host, stream) # Execute the inference engine context.execute_async_v2(bindings=[int(input_device), int(output_device)], stream_handle=stream.handle) # Copy output data from device memory to host memory cuda.memcpy_dtoh_async(output_host, output_device, stream) # Synchronize the stream to ensure the computation is complete stream.synchronize() # Print the output tensor print(output_host) ``` 在这个示中,我们从文件中加载序列化的TensorRT引擎,并使用它来创建一个执行上下文。然后,我们使用PyCUDA来分配输入和输出内存缓冲区,并使用CUDA流将输入数据从主机内存复制到设备内存。接下来,我们执行推理引擎,并使用CUDA流将输出数据从设备内存复制回主机内存。最后,我们打印输出张量以查看结果。 请注意,这只是一个基本的示,可以根据您的具体要求进行修改和扩展。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值