TURING STREAMING MULTIPROCESSOR (SM) : Shared Memory Architecture

最新推荐文章于 2024-02-25 22:08:37 发布

妖怪哪里走

最新推荐文章于 2024-02-25 22:08:37 发布

阅读量200

点赞数

分类专栏： GPU

本文链接：https://blog.csdn.net/royalfizz/article/details/93305012

版权

GPU 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Turing’s SM also introduces a new unified architecture for shared memory, L1, and texture caching. This unified design allows the L1 cache to leverage resources, increasing its hit bandwidth by 2x per TPC compared to Pascal, and allows it to be reconfigured to grow larger when shared memory allocations are not using all the shared memory capacity. The Turing L1 can be as large as 64 KB in size, combined with a 32 KB per SM shared memory allocation, or it can reduce to 32 KB, allowing 64 KB of allocation to be used for shared memory. Turing’s L2 cache capacity has also been increased.

Figure 6 shows how the new combined L1 data cache and shared memory subsystem of the Turing SM significantly improves performance while also simplifying programming and reducing the tuning required to attain at or near-peak application performance. Combining the L1 data cache with the shared memory reduces latency and provides higher bandwidth than the L1 cache implementation used previously in Pascal GPUs.