Occupancy is defined as the ratio of active warps on an SM to the maximum number of active warps supported by the SM.
即Occupancy = active warps / maximum number of active warps on this SM
Low occupancy results in poor instruction issue efficiency, because there are not enough eligible warps to hide latency between dependent instructions.
即这里的“active warps",是指可以随时context switch切换到处理器上的那些warps,不是指处理器上正在执行的warps!
active warps多些,可以将因IO等因素hang住的warps隐藏起来,处理器一直忙碌;
每种型号的SM有理论上限,这些是制约Theoretical Occupancy的因素:
1. Warps per SM;
2. Blocks per SM; (同一个SM上可以同时执行多个Block)
3. Registers per SM;
4. Shared memory per SM;
5. Registers & Shared memory used per Block;
Achieved Occupancy:Occupancy在不同时刻,所有SM范围,上的平均值;
其低于Theoretical Occupancy的原因:
1. block内部执行时间不balance: 有的warp结束得早,有的结束得晚;(tail effect)
2. block之间不balance: 有的block结束得早,有的结束得晚;
3. Launch的block数目太少;