rocprof 的使用
- 申明一个input.txt文件, 大致如下[1]:
# 度量(Metrics)相关参数
pmc: Wavefronts VALUInsts SALUInsts SFetchInsts
# Counters相关参数
pmc : TCC_HIT[0], TCC_MISS[0]
# 支持的范围格式: "3:9", "3:", "3"
range: 1 : 4
# profiler 的 GPU
gpu: 0 1 2 3
# 具体的 kernel 函数名,删除该行可以实现对所有 kernel 进行 profile
kernel: simple Pass1 simpleConvolutionPass2
- 运行
rocprof -i input.txt ./a.out
- 产生input.csv文件数据
- 查看参数
-
默认参数
Index - kernels dispatch order index
KernelName - the dispatched kernel name
gpu-id - GPU id the kernel was submitted to
queue-id - the ROCm queue unique id the kernel was submitted to
queue-index - The ROCm queue write index for the submitted AQL packet
tid - system application thread id which submitted the kernel
grd - the kernel’s grid size
wgr - the kernel’s work group size
lds - the kernel’s LDS memory size
scr - the kernel’s scratch memory size
vgpr - the kernel’s VGPR size
sgpr - the kernel’s SGPR size
fbar - the kernel’s barriers limitation
sig - the kernel’s completion signal -
rocprof --list-basic
-
rocprof --list-derived
- ALUStalledByLDS: ALU因为LDS写入读取暂停的百分比
- LDSBankConflict: LDS bank conflict 的百分比
- L2CacheHit: L2缓存命中率
- SALUBusy
- VALUBusy
- VALUUtilization
- GPUBusy
-
可以参考的资源
[1] https://zhuanlan.zhihu.com/p/545296023