\qquad
TensorRT can use 16-bit instead of 32-bit arithmetic and tensors, but this alone may not deliver significant performance benefits. Half2Mode
is an execution mode where internal tensors interleave 16-bits from adjacent pairs of images, and is the fastest mode of operation for batch sizes greater than one.
\qquad
To use Half2Mode
, two additional steps are required: Create an input network with 16-bit weights, by supplying the DataType::kHALF2 parameter to the parser. For example:
const IBlobNameToTensor *blobNameToTensor =
parser->parse(locateFile(deployFile).c_str(),
locateFile(modelFile).c_str(),
*network,
DataType::kHALF);
\qquad Configure the builder to use Half2Mode.
builder->setHalf2Mode(true);
参考
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt_210/tensorrt-user-guide/