Available Metrics:
Name Description
Device 0 (GeForce GTX 970M):
sm_efficiency: The percentage of time at least one warp is active on a specific multiprocessor
achieved_occupancy: Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor
ipc: Instructions executed per cycle
issued_ipc: Instructions issued per cycle
inst_per_warp: Average number of instructions executed by each warp
branch_efficiency: Ratio of non-divergent branches to total branches
warp_execution_efficiency: Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor
warp_nonpred_execution_efficiency: Ratio of the average active threads per warp executing non-predicated instructions to the maximum number of threads per warp supported on a multiprocessor
inst_replay_overhead: Average number of replays for each instruction executed
issue_slot_utilization: Percentage of issue slots that issued at least one instruction, averaged across all cycles
shared_load_transactions_per_request: Average number of shared memory load transactions performed for each shared memory load
shared_store_transactions_per_request: Average number of shared memory store transactions performed for each shared memory store
local_load_transactions_per_request: Average number of local memory load transactions performed for each local memory load
local_store_transactions_per_request: Average number of local memory store transactions performed for each local memory store
gld_transactions_per_request: Average number of global memory load transactions performed for each global memory load.
gst_transactions_per_request: Average number of global memory store transactions performed for each global memory store
shared_store_transactions: Number of shared memory store transactions
shared_load_transactions: Number of shared memory load transactions
local_load_transactions: Number of local memory load transactions
local_store_transactions: Number of local memory store transactions
gld_transactions: Number of global memory load transactions
gst_transactions: Number of global memory store transactions
dram_read_transactions: Device memory read transactions
dram_write_transactions: Device memory write transactions
global_hit_rate: Hit rate for global loads in unified l1/tex cache
local_hit_rate: Hit rate for local loads and stores
gld_requested_throughput: Requested global memory load throughput
gst_requested_throughput: Requested global memory store throughput
gld_throughput: Global memory load throughput
gst_throughpu
nvprof --query-metrics
最新推荐文章于 2023-01-24 09:00:20 发布
本文通过实例解析GPU性能分析工具nvprof的--query-metrics选项,关注点包括ipc、缓存命中率、事务数量等关键指标,帮助优化GPU计算效率。
摘要由CSDN通过智能技术生成