GPU Occupacy

最新推荐文章于 2024-07-19 10:20:45 发布

SpeechScience

最新推荐文章于 2024-07-19 10:20:45 发布

阅读量252

点赞数

分类专栏： linux 文章标签： GPU

本文链接：https://blog.csdn.net/Inger_H/article/details/90202618

版权

1 篇文章 0 订阅

订阅专栏

the ratio of the number of active warps per multiprocessor to the maximum number of possible active warps

每个多处理器上的活跃warps/最大可能warps的比值

percentage of the hardware's ability to process warps that is actively in use.

目前正在使用的处理warps的硬件能力的比例

可用寄存器的数目：register availability；寄存器分配给每一个Block，如果一个thread block用了太多寄存器，那么可以留在处理器上的block的数目就减少了，从而降低占用率；
每个线程可以使用的寄存器数目：可以编译阶段确定：The maximum number of registers per thread can be set manually at compilation time per-file using the -maxrregcount option or per-kernel using the __launch_bounds__ qualifier
举例：　8192个寄存器，最多的线程数768。100%的占有率的话，每个线程最多10个寄存器。因此，寄存器的个数对占有率的影响没有考虑寄存器分配的颗粒度。假设最多12个线程，每个block128个线程，占有率=(8192/128/12) * 128 / 768 = 83%; 如果每个block256个线程，占有率=（8192/256/12）* 256 / 768 = 66%
可用寄存器，最大常驻线程，分配的颗粒度不同类别GPU不同。The number of registers available, the maximum number of simultaneous threads resident on each multiprocessor, and the register allocation granularity vary over different compute capabilities

With the CUDA Driver API, a CUDA application process can potentially create more than one context for a given GPU. If multiple CUDA application processes access the same GPU concurrently, this almost always implies multiple contexts, since a context is tied to a particular host process unless CUDA Multi-Process Service is in use