关闭

False share的影响或者解决

标签: cacheobjectvariablesalignmentpointerstypes
638人阅读 评论(0) 收藏 举报

什么是False share:

False sharing is a well-known performance issue on SMP systems, where each processor has a local cache. It occurs when threads on different processors modify variables that reside on the same cache line, as illustrated in Figure 1. This circumstance is called "false sharing" because each thread is not actually sharing access to the same variable. Access to the same variable, or true sharing, would require programmatic synchronization constructs to ensure ordered data access.

 

怎样寻找False share:

  • objects nearby in the same array, as in Example 1 above;
  • fields nearby in the same object, as in Example 4 of [3] where the head and tail pointers into the message queue had to be kept apart;
  • objects allocated close together in time (C++, Java) or by the same thread (C#, Java), as in Example 4 of [3] where the underlying list nodes had to be kept apart to eliminate contention when threads used adjacent or head/tail nodes;
  • static or global objects that the linker decided to lay out close together in memory;
  • objects that become close in memory dynamically, as when during compacting garbage collection two objects can become adjacent in memory because intervening objects became garbage and were collected; or
  • objects that for some other reason accidentally end up close together in memory.

怎样解决:

First, we can reduce the number of writes to the cache line. For example, writer threads can write intermediate results to a scratch variable most of the time, then update the variable in the popular cache line only occasionally as needed.

 

 

Second, we can separate the variables so that they aren't on the same cache line. Typically the easiest way to do this is to ensure an object has a cache line to itself that it doesn't share with any other data. To achieve that, you need to do two things:

 

  • Ensure that no other object can precede your data in the same cache line by aligning it o begin at the start of the cache line or adding sufficient padding bytes before the object.
  • Ensure that no other object can follow your data in the same cache line by adding sufficient padding bytes after the object to fill up the line.

可以将用户的自定义类型T做一个包装,然后复用

c++的例子:

// C++ (using C++0x alignment syntax)

template<typename T>

struct cache_line_storage {

   [[ align(CACHE_LINE_SIZE) ]] T data;

char pad[ CACHE_LINE_SIZE > sizeof(T)

? CACHE_LINE_SIZE - sizeof(T)

: 1 ];

};



c#的例子:

// C#: Note works for value types only

//

[StructLayout(LayoutKind.Explicit, Size=2*CACHE_LINE_SIZE)]

public struct CacheLineStorage<T>

where T : struct

{

[FieldOffset(CACHE_LINE_SIZE)] public T data;

}



c#和java引用类型会更复杂一点:

For Java and .NET full-fledged objects (reference types), the solution

is basically the same as for .NET value types, but more intrusive: You

need to add the before-and-after padding internally inside the object

itself because there is no portable way to add external padding directly

adjacent to an object.



引用资料:

http://www.drdobbs.com/high-performance-computing/217500206;jsessionid=GE4YIOKZDGNOLQE1GHPCKH4ATMY32JVN?pgno=4

http://www.drdobbs.com/223100705;jsessionid=GE4YIOKZDGNOLQE1GHPCKH4ATMY32JVN?queryText=false+share



备注:

高速缓存行

CPU 的高速缓存一般分为一级缓存和二级缓存。CPU在运行时首先从一级缓存读取数据,如果读取失败则会从二级缓存读取数据,如果仍然失败则再从内存中存读取数 据。而CPU从一级缓存或二级缓存或主内存中最终读取到数据所耗费的时钟周期差距是非常之大的。因此高速缓存的容量和速度直接影响到CPU的工作性能。 一级缓存都内置在CPU内部并与CPU同速运行,可以有效的提高CPU的运行效率。一级缓存越大,CPU的运行效率往往越高。

一级缓存又分为数据缓存和指令缓存,他们都由高速缓存行组成,对于X86架构的CPU来说,高速缓存行一般是32个字节,早期的CPU大约只有512行高速缓存行,也就是说约16k的一级缓存。而现在的CPU一般都是32K以上的一级缓存。

当CPU需要读取一个变量时,该变量所在的以32字节分组的内存数据将被一同读入高速缓存行,所以,对于性能要求严格的程序来说,充分利用高速缓存行的优势非常重要。而高速缓存行的这一特性针对现在流行的双核及多核CPU来说,却又必须小心对待。

但总体来说,对于普通应用,程序设计人员根本无需考虑高速缓存行的问题,但对于像视频监控这样的特殊应用,尤其是要想充分发挥双核CPU的性能优势,就必须认真对待这些问题。既要避免多线程同步的性能瓶颈又要充分发挥多核多线程的优势。









0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:25695次
    • 积分:598
    • 等级:
    • 排名:千里之外
    • 原创:34篇
    • 转载:3篇
    • 译文:1篇
    • 评论:2条
    最新评论