nvprof --query-metrics

最新推荐文章于 2023-01-24 09:00:20 发布

ShaderJoy

最新推荐文章于 2023-01-24 09:00:20 发布

阅读量1.5k

点赞数 1

分类专栏： CUDA

本文链接：https://blog.csdn.net/panda1234lee/article/details/83513088

版权

本文通过实例解析GPU性能分析工具nvprof的--query-metrics选项，关注点包括ipc、缓存命中率、事务数量等关键指标，帮助优化GPU计算效率。

摘要由CSDN通过智能技术生成

Available Metrics:

Name Description

Device 0 (GeForce GTX 970M):

sm_efficiency: The percentage of time at least one warp is active on a specific multiprocessor

 

achieved_occupancy: Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor

 

ipc: Instructions executed per cycle

 

issued_ipc: Instructions issued per cycle

 

inst_per_warp: Average number of instructions executed by each warp

 

branch_efficiency: Ratio of non-divergent branches to total branches

 

warp_execution_efficiency: Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor

 

warp_nonpred_execution_efficiency: Ratio of the average active threads per warp executing non-predicated instructions to the maximum number of threads per warp supported on a multiprocessor

 

inst_replay_overhead: Average number of replays for each instruction executed

 

issue_slot_utilization: Percentage of issue slots that issued at least one instruction, averaged across all cycles

 

shared_load_transactions_per_request: Average number of shared memory load transactions performed for each shared memory load

 

shared_store_transactions_per_request: Average number of shared memory store transactions performed for each shared memory store

 

local_load_transactions_per_request: Average number of local memory load transactions performed for each local memory load

 

local_store_transactions_per_request: Average number of local memory store transactions performed for each local memory store

 

gld_transactions_per_request: Average number of global memory load transactions performed for each global memory load.

 

gst_transactions_per_request: Average number of global memory store transactions performed for each global memory store

 

shared_store_transactions: Number of shared memory store transactions

 

shared_load_transactions: Number of shared memory load transactions

 

local_load_transactions: Number of local memory load transactions

 

local_store_transactions: Number of local memory store transactions

 

gld_transactions: Number of global memory load transactions

 

gst_transactions: Number of global memory store transactions

 

dram_read_transactions: Device memory read transactions

 

dram_write_transactions: Device memory write transactions

 

global_hit_rate: Hit rate for global loads in unified l1/tex cache

 

local_hit_rate: Hit rate for local loads and stores

 

gld_requested_throughput: Requested global memory load throughput

 

gst_requested_throughput: Requested global memory store throughput

 

gld_throughput: Global memory load throughput

 

gst_throughpu