Nvidia GPU 的存储架构发展

最新推荐文章于 2024-06-30 20:19:23 发布

__DARK__

最新推荐文章于 2024-06-30 20:19:23 发布

阅读量3.7k

点赞数 1

分类专栏：深入理解体系结构概念文章标签： gpu 存储

本文链接：https://blog.csdn.net/dark5669/article/details/53739856

版权

深入理解体系结构概念专栏收录该内容

16 篇文章 4 订阅

订阅专栏

查阅了好多论文，以及英伟达的白皮书，最后终于搞明白了。。

从Fermi 到Pascal,cache 的体系结构发生了变化；

1. Fermi

L1dcahce 是与 Shared mem 可配置的64kB的大小，一般为 16/48 or 48/16,可读可写 ;
还有专有的对图像渲染的texture cache 和存放常量的constant cache，只读；
以上L1层的cache是对SM 私有的；为了保证cache coherence 的问题，l1dcache 的写请求也不会被cache了
综上，l1层总是只读的
l2对于所有sm共享，可读可写
当l2中的数据被写，恰好l1中还存在这个数据，那么将l1中这个数据使失效，保持了cache coherency；

2. Kepler

cache 层基本继承于Fermi架构，其对于Fermi架构的新的特性就是增加了48KB的READ-ONLY DATA-CACHE.
专门用来缓存只读的数据。

其他同上

3.Maxwell

这一次　英伟达有了一次较大的改变，，完全放弃了在L1层的写，将l1d 与 tex 等专用cache 进行统一；
根据workload 的不同进行选择。

global loads are cached in L2 only
local loads are cached in L2 only
手册原文

    Maxwell combines the functionality of the L1 and texture caches into a single unit.

    As with Kepler, global loads in Maxwell are cached in L2 only, unless using the LDG
read-only data cache mechanism introduced in Kepler.

    In a manner similar to Kepler GK110B, GM204 retains this behavior by default but also
allows applications to opt-in to caching of global loads in its unified L1/Texture cache.

    The opt-in mechanism is the same as with GK110B: pass the -Xptxas -dlcm=ca flag to
nvcc at compile time.

    Local loads also are cached in L2 only, which could increase the cost of register spilling
if L1 local load hit rates were high with Kepler. 

    The balance of occupancy versus spilling should therefore be reevaluated to ensure best performance. 
Especially given the improvements to arithmetic latencies, code built for Maxwell may benefit from
somewhat lower occupancy (due to increased registers per thread) in exchange for lower spilling.

    The unified L1/texture cache acts as a coalescing buffer for memory accesses, gathering
up the data requested by the threads of a warp prior to delivery of that data to the warp.
This function previously was served by the separate L1 cache in Fermi and Kepler.

Pascal

除了增大了l2cache大小之外，cache 架构也是继承与上一代 Maxwell 的

__DARK__

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Nvidia GPU 的存储架构发展

查阅了好多论文，以及英伟达的白皮书，最后终于搞明白了。。从Fermi 到Pascal,cache 的体系结构发生了变化；1. Fermi ![这里写图片描述](http://img.blog.csdn.net/20161219223508426?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvZGFyazU2Njk=/font/5a6L
复制链接

扫一扫