TensorRT FP16 half2model

最新推荐文章于 2024-05-09 22:11:25 发布

冬日and暖阳

最新推荐文章于 2024-05-09 22:11:25 发布

阅读量573

点赞数

分类专栏： TensorRT

本文链接：https://blog.csdn.net/qq_29007291/article/details/114230374

版权

TensorRT 专栏收录该内容

20 篇文章 3 订阅

订阅专栏

$\qquad$ TensorRT can use 16-bit instead of 32-bit arithmetic and tensors, but this alone may not deliver significant performance benefits. Half2Mode is an execution mode where internal tensors interleave 16-bits from adjacent pairs of images, and is the fastest mode of operation for batch sizes greater than one.

$\qquad$ To use Half2Mode, two additional steps are required: Create an input network with 16-bit weights, by supplying the DataType::kHALF2 parameter to the parser. For example:

const IBlobNameToTensor *blobNameToTensor = 
  parser->parse(locateFile(deployFile).c_str(),
                locateFile(modelFile).c_str(),
                *network,
                DataType::kHALF);