CUDA:使用nvprof工具计时

CUDA在运行程序时加上nvprof会对程序进行性能分析,这种性能分析最重要的就是统计不同函数的运行时间(占比)。

-bash-4.1$ nvprof ./a
./a Starting...
==19114== NVPROF is profiling process 19114, command: ./a
Vector size 32
Execution configure <<<1, 32>>>
Arrays match.

==19114== Profiling application: ./a
==19114== Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
 38.04%  3.1040us         3  1.0340us     896ns  1.2800us  [CUDA memcpy HtoD]
 33.73%  2.7520us         1  2.7520us  2.7520us  2.7520us  sumArraysOnGPU(float*, float*, float*, int)
 28.24%  2.3040us         1  2.3040us  2.3040us  2.3040us  [CUDA memcpy DtoH]

==19114== API calls:
Time(%)      Time     Calls       Avg       Min       Max  Name
 63.31%  302.79ms         3  100.93ms  9.2340us  302.77ms  cudaMalloc
 35.96%  171.98ms         1  171.98ms  171.98ms  171.98ms  cudaDeviceReset
  0.53%  2.5342ms       332  7.6330us     327ns  285.83us  cuDeviceGetAttribute
  0.06%  268.30us         4  67.076us  66.332us  67.675us  cuDeviceTotalMem
  0.05%  253.84us         4  63.460us  61.246us  68.736us  cuDeviceGetName
  0.05%  230.18us         3  76.727us  10.958us  201.12us  cudaFree
  0.02%  87.188us         4  21.797us  10.783us  32.713us  cudaMemcpy
  0.01%  42.401us         1  42.401us  42.401us  42.401us  cudaLaunch
  0.00%  20.420us         1  20.420us  20.420us  20.420us  cudaSetDevice
  0.00%  6.4060us         4  1.6010us     515ns  4.4500us  cudaSetupArgument
  0.00%  5.1900us         8     648ns     410ns  1.0300us  cuDeviceGet
  0.00%  4.6600us         2  2.3300us     850ns  3.8100us  cuDeviceGetCount
  0.00%  3.0990us         1  3.0990us  3.0990us  3.0990us  cudaConfigureCall
-bash-4.1$ 


评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值