False share的影响或者解决

转载 2011年01月19日 10:25:00

什么是False share:

False sharing is a well-known performance issue on SMP systems, where each processor has a local cache. It occurs when threads on different processors modify variables that reside on the same cache line, as illustrated in Figure 1. This circumstance is called "false sharing" because each thread is not actually sharing access to the same variable. Access to the same variable, or true sharing, would require programmatic synchronization constructs to ensure ordered data access.

 

怎样寻找False share:

  • objects nearby in the same array, as in Example 1 above;
  • fields nearby in the same object, as in Example 4 of [3] where the head and tail pointers into the message queue had to be kept apart;
  • objects allocated close together in time (C++, Java) or by the same thread (C#, Java), as in Example 4 of [3] where the underlying list nodes had to be kept apart to eliminate contention when threads used adjacent or head/tail nodes;
  • static or global objects that the linker decided to lay out close together in memory;
  • objects that become close in memory dynamically, as when during compacting garbage collection two objects can become adjacent in memory because intervening objects became garbage and were collected; or
  • objects that for some other reason accidentally end up close together in memory.

怎样解决:

First, we can reduce the number of writes to the cache line. For example, writer threads can write intermediate results to a scratch variable most of the time, then update the variable in the popular cache line only occasionally as needed.

 

 

Second, we can separate the variables so that they aren't on the same cache line. Typically the easiest way to do this is to ensure an object has a cache line to itself that it doesn't share with any other data. To achieve that, you need to do two things:

 

  • Ensure that no other object can precede your data in the same cache line by aligning it o begin at the start of the cache line or adding sufficient padding bytes before the object.
  • Ensure that no other object can follow your data in the same cache line by adding sufficient padding bytes after the object to fill up the line.

可以将用户的自定义类型T做一个包装,然后复用

c++的例子:

// C++ (using C++0x alignment syntax)

template<typename T>

struct cache_line_storage {

   [[ align(CACHE_LINE_SIZE) ]] T data;

char pad[ CACHE_LINE_SIZE > sizeof(T)

? CACHE_LINE_SIZE - sizeof(T)

: 1 ];

};



c#的例子:

// C#: Note works for value types only

//

[StructLayout(LayoutKind.Explicit, Size=2*CACHE_LINE_SIZE)]

public struct CacheLineStorage<T>

where T : struct

{

[FieldOffset(CACHE_LINE_SIZE)] public T data;

}



c#和java引用类型会更复杂一点:

For Java and .NET full-fledged objects (reference types), the solution

is basically the same as for .NET value types, but more intrusive: You

need to add the before-and-after padding internally inside the object

itself because there is no portable way to add external padding directly

adjacent to an object.



引用资料:

http://www.drdobbs.com/high-performance-computing/217500206;jsessionid=GE4YIOKZDGNOLQE1GHPCKH4ATMY32JVN?pgno=4

http://www.drdobbs.com/223100705;jsessionid=GE4YIOKZDGNOLQE1GHPCKH4ATMY32JVN?queryText=false+share



备注:

高速缓存行

CPU 的高速缓存一般分为一级缓存和二级缓存。CPU在运行时首先从一级缓存读取数据,如果读取失败则会从二级缓存读取数据,如果仍然失败则再从内存中存读取数 据。而CPU从一级缓存或二级缓存或主内存中最终读取到数据所耗费的时钟周期差距是非常之大的。因此高速缓存的容量和速度直接影响到CPU的工作性能。 一级缓存都内置在CPU内部并与CPU同速运行,可以有效的提高CPU的运行效率。一级缓存越大,CPU的运行效率往往越高。

一级缓存又分为数据缓存和指令缓存,他们都由高速缓存行组成,对于X86架构的CPU来说,高速缓存行一般是32个字节,早期的CPU大约只有512行高速缓存行,也就是说约16k的一级缓存。而现在的CPU一般都是32K以上的一级缓存。

当CPU需要读取一个变量时,该变量所在的以32字节分组的内存数据将被一同读入高速缓存行,所以,对于性能要求严格的程序来说,充分利用高速缓存行的优势非常重要。而高速缓存行的这一特性针对现在流行的双核及多核CPU来说,却又必须小心对待。

但总体来说,对于普通应用,程序设计人员根本无需考虑高速缓存行的问题,但对于像视频监控这样的特殊应用,尤其是要想充分发挥双核CPU的性能优势,就必须认真对待这些问题。既要避免多线程同步的性能瓶颈又要充分发挥多核多线程的优势。









相关文章推荐

Operation System: Cache, False Share and Locality

缓存分为L1,L2和L3缓存。L1和L2缓存通过在每个核的片上,L3缓存通常是共享的。通过缓存也称作SRAM,两者可以不加区分。缓存的读取延时一般是2ns.  缓存的结构: 首先,...

多线程false sharing带来的影响和一些优化.

最近在线项目中测试一个无锁队列的性能的时候发现,在一个线程push另一个线程pop整型数据的时候,吞吐量竟然和std::queue+spinlock类似甚至更差,这样完全体现不出lockfree的优势...

通过UIAlertView或者ActionSheet控件调用share方法

/* 通过UIAlertView或者ActionSheet控件调用share方法时,如果在UIAlertView消失之前调用会出现崩溃,必须在UIAlertView消失以后调用。 因为我们自己的U...

requestValidationMode 导致 ValidateRequest=False 失效或者ASP.NET 4.0事件消息: 发生了验证错误;检测到有潜在危险的Request.Form值

[转帖]requestValidationMode 导致 ValidateRequest=False 失效或者ASP.NET 4.0事件消息: 发生了验证错误;检测到有潜在危险的Request.For...

IE6下背景图片不缓存问题或者document.execCommand("BackgroundImageCache",false,true)

【题记】 偶然看到 document.execCommand("BackgroundImageCache",false,true) 这行代码,字面意思理解,解决背景图片缓存问题,然后我百度了下,...
  • liaobc
  • liaobc
  • 2012年05月19日 10:21
  • 749

android:hint属性对TextView(或者EditText)的影响--源码分析

textView.setText("哈哈"); textView.setHint("哈哈哈哈哈哈"); hint属性对TextView(或者EditText)的影响,直接看下图: 使用上下两个Tex...

JFrame背景图添加(不影响按钮或者其它组件)

package Frame; import java.awt.Image; import java.awt.Toolkit; import javax.swing.ImageIcon; imp...

下拉刷新与加载,不影响listview或者recycleview本身

首先,定义了一个PullToRefreshLayout,下拉上拉处理都在这里,不影响listview或者recycleview. package com.zongsi.mikeli.pulltore...

后台任务稳定运行方案--不受终端关闭或者远程(ssh等)连接失败影响

我们经常会碰到这样的问题,用 telnet/ssh 登录了远程的 Linux 服务器,运行了一些耗时较长的任务, 结果却由于网络的不稳定导致任务中途失败。如何让命令提交后不受本地关闭终端窗口/网络断开...

乌云于 2014 年 02 月 17 日爆出支付宝登陆漏洞,该漏洞会对用户余额宝产生怎么样的影响或者是危害?

转自知乎:http://www.zhihu.com/question/22769152/answer/22570990 Evi1m0,来自知道创宇,邪红色信息安全组织创始人 ...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:False share的影响或者解决
举报原因:
原因补充:

(最多只允许输入30个字)