nsight compute cli 测试程序获得参数都是n/a的解决办法
设备环境
显卡: 1660Ti
CUDA版本: 10.1
CUDA driver版本: 455.45.01
操作系统: Ubuntu18.04
问题描述
利用nsight compute cli对CUDA程序进行分析,在终端中输入:
sudo /usr/local/cuda-10.1/NsightCompute-2019.1/nv-nsight-cu-cli --f ./test
获得结果如下:
==PROF== Connected to process 4835
==PROF== Profiling "matrix_add_2D" - 1: 0%....50%....100% - 3 passes
Success!
==PROF== Disconnected from process 4835
[4835] test@127.0.0.1
matrix_add_2D, 2020-Dec-08 22:44:46, Context 1, Stream 7
Section: GPU Speed Of Light
---------------------------------------------------------------------- --------------- ------------------------------
Memory Frequency (!) n/a
SOL FB (!) n/a
Elapsed Cycles (!) n/a
SM Frequency (!) n/a
Memory [%] (!) n/a
Duration (!) n/a
SOL L2 (!) n/a
SOL TEX (!) n/a
SM Active Cycles (!) n/a
SM [%] (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------
Section: Compute Workload Analysis
---------------------------------------------------------------------- --------------- ------------------------------
Executed Ipc Active (!) n/a
Executed Ipc Elapsed (!) n/a
Issue Slots Max (!) n/a
Issued Ipc Active (!) n/a
Issue Slots Busy (!) n/a
SM Busy (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------
Section: Memory Workload Analysis
---------------------------------------------------------------------- --------------- ------------------------------
Memory Throughput (!) n/a
Mem Busy (!) n/a
Max Bandwidth (!) n/a
L2 Hit Rate (!) n/a
Mem Pipes Busy (!) n/a
L1 Hit Rate (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------
Section: Scheduler Statistics
---------------------------------------------------------------------- --------------- ------------------------------
Active Warps Per Scheduler (!) n/a
Eligible Warps Per Scheduler (!) n/a
No Eligible (!) n/a
Instructions Per Active Issue Slot (!) n/a
Issued Warp Per Scheduler (!) n/a
One or More Eligible (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------
Section: Warp State Statistics
---------------------------------------------------------------------- --------------- ------------------------------
Avg. Not Predicated Off Threads Per Warp (!) n/a
Avg. Active Threads Per Warp (!) n/a
Warp Cycles Per Executed Instruction (!) n/a
Warp Cycles Per Issued Instruction (!) n/a
Warp Cycles Per Issue Active (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------
Section: Instruction Statistics
---------------------------------------------------------------------- --------------- ------------------------------
Avg. Executed Instructions Per Scheduler (!) n/a
Executed Instructions (!) n/a
Avg. Issued Instructions Per Scheduler (!) n/a
Issued Instructions (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------
Section: Launch Statistics
---------------------------------------------------------------------- --------------- ------------------------------
Block Size 1,024
Grid Size 1,024
Registers Per Thread register/thread 16
Shared Memory Configuration Size Kbyte 49.15
Dynamic Shared Memory Per Block byte/block 0
Static Shared Memory Per Block byte/block 0
Threads thread 1,048,576
Waves Per SM 42.67
---------------------------------------------------------------------- --------------- ------------------------------
Section: Occupancy
---------------------------------------------------------------------- --------------- ------------------------------
Block Limit SM block 16
Block Limit Registers register 4
Block Limit Shared Mem byte nan
Block Limit Warps warp 1
Achieved Active Warps Per SM (!) n/a
Achieved Occupancy (!) n/a
Theoretical Active Warps per SM warp/cycle 32
Theoretical Occupancy % 100
---------------------------------------------------------------------- --------------- ------------------------------
可以看到许多信息都显示为n/a,无法获得正确的参数值
解决方式
改变CUDA版本,由v10.1到v10.1 update2,之前看不到的参数就能正确显示。
参考资料
https://developer.nvidia.com/blog/using-nsight-compute-to-inspect-your-kernels/