[trtexec] GPU latency/Host latency/End-to-End Host Latency

cc嘿嘿嘿

已于 2022-09-07 14:08:30 修改

阅读量1.3k

点赞数 1

文章标签：人工智能

于 2022-06-29 17:55:13 首次发布

本文链接：https://blog.csdn.net/ccheiheihei/article/details/125526842

版权

[06/29/2022-09:11:15] [I] Starting inference
[06/29/2022-09:11:19] [I] Warmup completed 1 queries over 200 ms
[06/29/2022-09:11:19] [I] Timing trace has 284 queries over 3.02829 s
[06/29/2022-09:11:19] [I]
[06/29/2022-09:11:19] [I] === Trace details ===
[06/29/2022-09:11:19] [I] Trace averages of 10 runs:
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 11.7504 ms - Host latency: 12.3357 ms (end to end 23.004 ms, enqueue 1.67042 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.6535 ms - Host latency: 11.2494 ms (end to end 21.0531 ms, enqueue 2.76558 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.6565 ms - Host latency: 11.2509 ms (end to end 18.8047 ms, enqueue 3.31723 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5039 ms - Host latency: 11.103 ms (end to end 20.4064 ms, enqueue 3.31278 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5855 ms - Host latency: 11.1831 ms (end to end 19.6155 ms, enqueue 3.30339 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5316 ms - Host latency: 11.1292 ms (end to end 19.9902 ms, enqueue 3.31148 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.4856 ms - Host latency: 11.0847 ms (end to end 20.2623 ms, enqueue 3.30531 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5329 ms - Host latency: 11.126 ms (end to end 19.0728 ms, enqueue 3.30277 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.4821 ms - Host latency: 11.0815 ms (end to end 20.6302 ms, enqueue 3.3278 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5074 ms - Host latency: 11.1038 ms (end to end 19.837 ms, enqueue 3.31754 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.6043 ms - Host latency: 11.1984 ms (end to end 19.5715 ms, enqueue 3.30488 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5132 ms - Host latency: 11.1122 ms (end to end 20.2935 ms, enqueue 3.31482 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5712 ms - Host latency: 11.1672 ms (end to end 19.7753 ms, enqueue 3.32629 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5256 ms - Host latency: 11.1198 ms (end to end 19.3723 ms, enqueue 3.28785 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.4888 ms - Host latency: 11.0905 ms (end to end 20.6929 ms, enqueue 3.30397 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5379 ms - Host latency: 11.1367 ms (end to end 19.6342 ms, enqueue 3.29697 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5494 ms - Host latency: 11.1469 ms (end to end 20.2805 ms, enqueue 3.32018 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5082 ms - Host latency: 11.1064 ms (end to end 19.3606 ms, enqueue 3.31279 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.532 ms - Host latency: 11.1318 ms (end to end 19.9801 ms, enqueue 3.30342 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5677 ms - Host latency: 11.1641 ms (end to end 19.6043 ms, enqueue 3.30513 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.4653 ms - Host latency: 11.0666 ms (end to end 20.6561 ms, enqueue 3.31731 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5572 ms - Host latency: 11.1566 ms (end to end 20.5119 ms, enqueue 3.31047 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.4537 ms - Host latency: 11.0541 ms (end to end 20.6229 ms, enqueue 3.30085 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5344 ms - Host latency: 11.1318 ms (end to end 20.1909 ms, enqueue 3.30251 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5301 ms - Host latency: 11.1286 ms (end to end 20.3403 ms, enqueue 3.31208 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.6022 ms - Host latency: 11.2004 ms (end to end 20.1109 ms, enqueue 3.31121 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.482 ms - Host latency: 11.0801 ms (end to end 20.3935 ms, enqueue 3.30435 ms)
[06/29/2022-09:11:19] [I] Average on 10 runs - GPU latency: 10.5625 ms - Host latency: 11.158 ms (end to end 18.9621 ms, enqueue 3.31709 ms)
[06/29/2022-09:11:19] [I]
[06/29/2022-09:11:19] [I] === Performance summary ===
[06/29/2022-09:11:19] [I] Throughput: 93.7823 qps
[06/29/2022-09:11:19] [I] Latency: min = 10.9785 ms, max = 13.0938 ms, mean = 11.1774 ms, median = 11.088 ms, percentile(99%) = 13.0858 ms
[06/29/2022-09:11:19] [I] End-to-End Host Latency: min = 11.1913 ms, max = 24.8253 ms, mean = 20.1003 ms, median = 20.6526 ms, percentile(99%) = 24.7872 ms
[06/29/2022-09:11:19] [I] Enqueue Time: min = 0.694763 ms, max = 3.48291 ms, mean = 3.23341 ms, median = 3.30945 ms, percentile(99%) = 3.41162 ms
[06/29/2022-09:11:19] [I] H2D Latency: min = 0.569611 ms, max = 0.611679 ms, mean = 0.589204 ms, median = 0.590134 ms, percentile(99%) = 0.59375 ms
[06/29/2022-09:11:19] [I] GPU Compute Time: min = 10.3752 ms, max = 12.5153 ms, mean = 10.5801 ms, median = 10.4904 ms, percentile(99%) = 12.501 ms
[06/29/2022-09:11:19] [I] D2H Latency: min = 0.00195312 ms, max = 0.013916 ms, mean = 0.00804332 ms, median = 0.00947571 ms, percentile(99%) = 0.0137939 ms
[06/29/2022-09:11:19] [I] Total Host Walltime: 3.02829 s
[06/29/2022-09:11:19] [I] Total GPU Compute Time: 3.00475 s
[06/29/2022-09:11:19] [W] * GPU compute time is unstable, with coefficient of variance = 2.95502%.
[06/29/2022-09:11:19] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[06/29/2022-09:11:19] [I] Explanations of the performance metrics are printed in the verbose logs.

各输出的解释：

[V] === Explanations of the performance metrics ===
[V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[V] End-to-End Host Latency: the duration from when the H2D of a query is called to when the D2H of the same query is completed, which includes the latency to wait for the completion of the previous query. This is the latency of a query if multiple queries are enqueued consecutively.