TensorRT多线程下运行

最新推荐文章于 2024-02-21 13:35:56 发布

项哥

最新推荐文章于 2024-02-21 13:35:56 发布

阅读量4.4k

点赞数 2

分类专栏： python

本文链接：https://blog.csdn.net/liufang1991/article/details/118084965

版权

python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

TensorRT多线程背景

TensorRT 在主线程速度比 tensorflow 快了很多，30-60倍的提升。
pycuda多线程的demo，是一块显卡启动一个线程
实际生产环境中用的thrift RPC，每个连接都是用 threading.Thread 多线程下运行的

解决方案

threading.lock加锁可以在多线程下限制单线程可以运行起来，平均一次计算耗时4.5ms左右，表现比单线程慢了2-3倍，但是比tensorflow要快了很多

coding如下，也可以用threading.Semaphore(1)来加锁

lock = threading.Lock()

def predict():
	with lock:
		self.cfx.push()
		....
		self.cfx.pop()

多线程代码参考
如果可以使用多进程实现性能会更好

备注

做过其他类型的多线程尝试都以失败告终，或者性能非常差
developer-guide中列举的方式执行的时候，显存得不到释放，并且多线程下执行速度低下

with engine.create_execution_context() as context:
		# Transfer input data to the GPU.
		cuda.memcpy_htod_async(d_input, h_input, stream)
		# Run inference.
		context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
		# Transfer predictions back from the GPU.
		cuda.memcpy_dtoh_async(h_output, d_output, stream)
		# Synchronize the stream
		stream.synchronize()
		# Return the host output. 
return h_output

项哥

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
TensorRT多线程下运行

TensorRT多线程背景TensorRT 在主线程速度比 tensorflow 快了很多，30-60倍的提升。TensorRT 官方多线程的demo，是一个显卡启动一个线程实际生产环境中用的thrift RPC，每个连接都是用 threading.Thread 跑的，也就是需要在多线程下运行解决方案threading.lock加锁可以在多线程下限制单线程可以运行起来，平均一次计算耗时4.5ms左右，表现比单线程慢了2-3倍，但是比tensorflow要快了很多coding如下，也可以用th
复制链接

扫一扫

专栏目录