- 定义:
the ratio of the number of active warps per multiprocessor to the maximum number of possible active warps
每个多处理器上的活跃warps/最大可能warps的比值
percentage of the hardware's ability to process warps that is actively in use.
目前正在使用的处理warps的硬件能力的比例
- 决定因素
- 可用寄存器的数目:register availability; 寄存器分配给每一个Block,如果一个thread block用了太多寄存器,那么可以留在处理器上的block的数目就减少了,从而降低占用率;
- 每个线程可以使用的寄存器数目:可以编译阶段确定:The maximum number of registers per thread can be set manually at compilation time per-file using the -maxrregcount option or per-kernel using the __launch_bounds__ qualifier
- 举例: 8192个寄存器,最多的线程数768。100%的占有率的话,每个线程最多10个寄存器。因此,寄存器的个数对占有率的影响没有考虑寄存器分配的颗粒度。假设最多12个线程,每个block128个线程,占有率=(8192/128/12) * 128 / 768 = 83%; 如果每个block256个线程, 占有率=(8192/256/12)* 256 / 768 = 66%
- 可用寄存器,最大常驻线程,分配的颗粒度不同类别GPU不同。The number of registers available, the maximum number of simultaneous threads resident on each multiprocessor, and the register allocation granularity vary over different compute capabilities
- With the CUDA Driver API, a CUDA application process can potentially create more than one context for a given GPU. If multiple CUDA application processes access the same GPU concurrently, this almost always implies multiple contexts, since a context is tied to a particular host process unless CUDA Multi-Process Service is in use