cache line & False sharing mark

最新推荐文章于 2021-04-18 18:36:07 发布

风竹夜

最新推荐文章于 2021-04-18 18:36:07 发布

阅读量367

点赞数

分类专栏： c/c++

本文链接：https://blog.csdn.net/GW569453350game/article/details/79806453

版权

c/c++ 专栏收录该内容

69 篇文章 2 订阅

订阅专栏

CPU的缓存是由SRAM（通俗理解，互锁的晶体三极管）构成的，因此速度快，容量小。而内存是由DRAM（晶体三极管和电容）构成的，容量大，速度慢。

例如，一个可能的时间数量级列举如下(without prefetching)：

L1 cache: 1ns to 2ns (2-4 cycles)
L2 cache: 3ns to 5ns (6-10 cycles)
L3 cache: 12ns to 20ns (24-40 cycles)
RAM: 60ns (120 cycles)

但是CPU读取内存时，不是根据变量大小来获取内存块，而是根据cache line（64 Byte）的大小一次缓存一个chunk（或者几个chunk）的数据。

因此，考虑到要充分利用 cache line 的数据，提高cache命中率。可能需要在数据结构后加上 padding 以 64字节对其(参考 c++11的 alignas 关键字)，并且避免多线程的 False sharing 问题（相邻的变量属于同一个cache line，但是被不同的 cpu 各自 load 到其本地cache中，当其中一个cpu触发写操作，导致另外一个cpu的cache失效的问题）。

注： CPU have private L1/L2 caches and a shared L3 cache.

To ensure data consistency across multiple caches, multiprocessor-capable Intel® processors follow the MESI (Modified/Exclusive/Shared/Invalid) protocol. On first load of a cache line, the processor will mark the cache line as ‘Exclusive’ access. As long as the cache line is marked exclusive, subsequent loads are free to use the existing data in cache. If the processor sees the same cache line loaded by another processor on the bus, it marks the cache line with ‘Shared’ access. If the processor stores a cache line marked as ‘S’, the cache line is marked as ‘Modified’ and all other processors are sent an ‘Invalid’ cache line message. If the processor sees the same cache line which is now marked ‘M’ being accessed by another processor, the processor stores the cache line back to memory and marks its cache line as ‘Shared’. The other processor that is accessing the same cache line incurs a cache miss.

see link:
https://stackoverflow.com/questions/8469427/how-and-when-to-align-to-cache-line-size
https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads

另外，在某些场合，线程之间使用锁来保持数据互斥会导致程序性能下降，尤其是多线程频繁访问互斥变量的情况下（lock free programing）。这是spinlock可以用来提高多线程的性能。

spinlock有Fairness的问题，即某个线程在释放spinlock之后是否能马上再次获得锁。见这个链接关于各种spinlock的性能分析：https://geidav.wordpress.com/2016/03/23/test-and-set-spinlocks/

这个链接见 lock-free-algorithm:
http://www.1024cores.net/home/lock-free-algorithms/introduction

to be continue…

links:
https://software.intel.com/zh-cn/articles/avoiding-and-identifying-false-sharing-among-threads
http://igoro.com/archive/gallery-of-processor-cache-effects/
http://danluu.com/3c-conflict/
https://www.akkadia.org/drepper/cpumemory.pdf or https://lwn.net/Articles/250967/
https://en.wikipedia.org/wiki/False_sharing
https://stackoverflow.com/questions/3928995/how-do-cache-lines-work
https://stackoverflow.com/questions/9826274/how-many-bytes-the-cache-controller-fetches-a-time-from-main-memory-to-l2-cache

风竹夜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
cache line & False sharing mark

CPU的缓存是由SRAM（互锁的三极管）构成的，因此速度快，容量小。而内存是由DRAM（三极管和电容）构成的，容量大，速度慢。例如，一个可能的时间数量级列举如下：L1 cache: 1ns to 2ns (2-4 cycles)L2 cache: 3ns to 5ns (6-10 cycles)L3 cache: 12ns to 20ns (24-40 cycles)RAM: 60ns (1
复制链接

扫一扫