Occupancy

定义

Active Wraps

从线程束中的线程开始执行,直到线程束中的线程执行完毕,该线程束认为是Active的

Occupancy

occupancy = active wraps per SM / maximum wraps per SM

Theoretical Occupancy

theoretical occupancy可以理解为在设备计算能力,线程组织方式,kernel对资源的使用情况均已知的情况下,理论上能够达到的occupancy。

Achieved Occupancy

在每一个时钟周期线程束调度器上的计数器会统计时钟周期active wrap的数量,并将最终结果处于周期数得到平均active wrap数,进而计算出实际occupancy

Warp Issue Efficiency

在所有时钟周期中,至少有一个可调度(就绪)线程束的时钟周期和没有可调度线程束的时钟周期的个数比

性能分析

Occupancy对性能的影响

  • 过低的occupancy会导致没有足够多的就绪线程束供调度器调度,使得某些指令延迟无法被掩盖
  • 当就绪线程足够多或者指令延迟足够少时,过高的occupancy会导致每个线程能够分配得到的资源减少(寄存器溢出导致local memory的使用),从而降低性能

可能导致low theoretical occupancy的原因

  • 对SM而言Active Block的数量是有上限的,当每个block中的wrap数太小

    device limited active blocks * wraps per block << device limited active wraps
  • 线程使用过多的register或shared memory会导致SM上active wraps的数量减少

可能导致low achieved occupancy的原因

  • 同一个块内不同线程束执行的时间不平衡,导致在计算收尾阶段Active Wrap数量减少(由于资源是以块为单位分配的,所以即使块中的部分线程束执行完毕,他们的资源无法回收,导致没有办法提供调度新的线程块到SM上)
  • 启动过少的线程块。根据nsight用户手册,theoretical occupancy没有将启动的线程块数纳入计算因素中

性能优化

低的occupancy未必意味着低的性能,应该通过观察Warp Issue Efficiency,如果无可调度线程束的时钟周期比例太高,才认为应该提高occupancy。具体做法包括:

- if the theoretical occupancy is low, try to optimize the execution configuration of the kernel launch, using the Occupancy table to identify which factor(s) are limiting occupancy. If you are register limited do not rule out experimenting with launch bounds to increase occupancy, even if this results in some register spilling.
- if the achieved occupancy is well below the theoretical occupancy, check the Instruction Statistics experiment for highly unbalanced workloads or tail effects. Potential strategies may include splitting the kernel grid in a more fine granular way, distribute work across the blocks in a more balanced way, avoiding gathering the final result on a single block, warp, or thread.
- if the Pipe Utilization experiment shows a particular pipeline is already fully utilized, increasing active warps is unlikely to results in more eligible warps, because all additional active warps will stall trying to access the oversubscribed pipeline. In this case, try to reduce the load on this pipeline or investigate if the expected peak performance for the target hardware is already reached.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值