nvprof --query-metrics

本文通过实例解析GPU性能分析工具nvprof的--query-metrics选项,关注点包括ipc、缓存命中率、事务数量等关键指标,帮助优化GPU计算效率。
摘要由CSDN通过智能技术生成
Available Metrics:

Name Description

Device 0 (GeForce GTX 970M):

sm_efficiency: The percentage of time at least one warp is active on a specific multiprocessor

 

achieved_occupancy: Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor

 

ipc: Instructions executed per cycle

 

issued_ipc: Instructions issued per cycle

 

inst_per_warp: Average number of instructions executed by each warp

 

branch_efficiency: Ratio of non-divergent branches to total branches

 

warp_execution_efficiency: Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor

 

warp_nonpred_execution_efficiency: Ratio of the average active threads per warp executing non-predicated instructions to the maximum number of threads per warp supported on a multiprocessor

 

inst_replay_overhead: Average number of replays for each instruction executed

 

issue_slot_utilization: Percentage of issue slots that issued at least one instruction, averaged across all cycles

 

shared_load_transactions_per_request: Average number of shared memory load transactions performed for each shared memory load

 

shared_store_transactions_per_request: Average number of shared memory store transactions performed for each shared memory store

 

local_load_transactions_per_request: Average number of local memory load transactions performed for each local memory load

 

local_store_transactions_per_request: Average number of local memory store transactions performed for each local memory store

 

gld_transactions_per_request: Average number of global memory load transactions performed for each global memory load.

 

gst_transactions_per_request: Average number of global memory store transactions performed for each global memory store

 

shared_store_transactions: Number of shared memory store transactions

 

shared_load_transactions: Number of shared memory load transactions

 

local_load_transactions: Number of local memory load transactions

 

local_store_transactions: Number of local memory store transactions

 

gld_transactions: Number of global memory load transactions

 

gst_transactions: Number of global memory store transactions

 

dram_read_transactions: Device memory read transactions

 

dram_write_transactions: Device memory write transactions

 

global_hit_rate: Hit rate for global loads in unified l1/tex cache

 

local_hit_rate: Hit rate for local loads and stores

 

gld_requested_throughput: Requested global memory load throughput

 

gst_requested_throughput: Requested global memory store throughput

 

gld_throughput: Global memory load throughput

 

gst_throughpu
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ShaderJoy

您的打赏是我继续写博客的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值