cache line & False sharing mark

CPU的缓存是由SRAM(通俗理解,互锁的晶体三极管)构成的, 因此速度快,容量小。而内存是由DRAM(晶体三极管和电容)构成的,容量大,速度慢。

例如,一个可能的时间数量级列举如下(without prefetching):

L1 cache: 1ns to 2ns (2-4 cycles)
L2 cache: 3ns to 5ns (6-10 cycles)
L3 cache: 12ns to 20ns (24-40 cycles)
RAM: 60ns (120 cycles)

但是CPU读取内存时,不是根据变量大小来获取内存块,而是根据cache line(64 Byte)的大小一次缓存一个chunk(或者几个chunk)的数据。

因此,考虑到要充分利用 cache line 的数据,提高cache命中率。可能需要在数据结构后加上 padding 以 64字节对其(参考 c++11的 alignas 关键字),并且避免多线程的 False sharing 问题(相邻的变量属于同一个cache line,但是被不同的 cpu 各自 load 到其本地cache中,当其中一个cpu触发写操作,导致另外一个cpu的cache失效的问题)。

注: CPU have private L1/L2 caches and a shared L3 cache.

To ensure data consistency across multiple caches, multiprocessor-capable Intel® processors follow the MESI (Modified/Exclusive/Shared/Invalid) protocol. On first load of a cache line, the processor will mark the cache line as ‘Exclusive’ access. As long as the cache line is marked exclusive, subsequent loads are free to use the existing data in cache. If the processor sees the same cache line loaded by another processor on the bus, it marks the cache line with ‘Shared’ access. If the processor stores a cache line marked as ‘S’, the cache line is marked as ‘Modified’ and all other processors are sent an ‘Invalid’ cache line message. If the processor sees the same cache line which is now marked ‘M’ being accessed by another processor, the processor stores the cache line back to memory and marks its cache line as ‘Shared’. The other processor that is accessing the same cache line incurs a cache miss.

see link:
https://stackoverflow.com/questions/8469427/how-and-when-to-align-to-cache-line-size
https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads

另外,在某些场合,线程之间使用锁来保持数据互斥会导致程序性能下降,尤其是多线程频繁访问互斥变量的情况下(lock free programing)。这是spinlock可以用来提高多线程的性能。

spinlock有Fairness的问题,即某个线程在释放spinlock之后是否能马上再次获得锁。见这个链接关于各种spinlock的性能分析:https://geidav.wordpress.com/2016/03/23/test-and-set-spinlocks/

这个链接见 lock-free-algorithm:
http://www.1024cores.net/home/lock-free-algorithms/introduction

to be continue…

links:
https://software.intel.com/zh-cn/articles/avoiding-and-identifying-false-sharing-among-threads
http://igoro.com/archive/gallery-of-processor-cache-effects/
http://danluu.com/3c-conflict/
https://www.akkadia.org/drepper/cpumemory.pdf or https://lwn.net/Articles/250967/
https://en.wikipedia.org/wiki/False_sharing
https://stackoverflow.com/questions/3928995/how-do-cache-lines-work
https://stackoverflow.com/questions/9826274/how-many-bytes-the-cache-controller-fetches-a-time-from-main-memory-to-l2-cache

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值