java 内存模型FAQ

一、什么是内存模型?

原文:http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html

在多核系统中,处理器一般有一层或者多层的缓存,这些的缓存通过加速数据访问(因为数据距离处理器更近)和降低共享内存在总线上的通讯(因为本地缓存能够满足许多内存操作)来提高CPU性能。缓存能够大大提升性能,但是它们也带来了许多挑战。例如,当两个CPU同时检查相同的内存地址时会发生什么?在什么样的条件下它们会看到相同的值?

在处理器层面上,内存模型定义了一个充要条件,“让当前的处理器可以看到其他处理器写入到内存的数据”以及“其他处理器可以看到当前处理器写入到内存的数据”。有些处理器有很强的内存模型(strong memory model),能够让所有的处理器在任何时候任何指定的内存地址上都可以看到完全相同的值。而另外一些处理器则有较弱的内存模型(weaker memory model),在这种处理器中,必须使用内存屏障(一种特殊的指令)来刷新本地处理器缓存并使本地处理器缓存无效,目的是为了让当前处理器能够看到其他处理器的写操作或者让其他处理器能看到当前处理器的写操作。这些内存屏障通常在lock和unlock操作的时候完成。内存屏障在高级语言中对程序员是不可见的。

在强内存模型下,有时候编写程序可能会更容易,因为减少了对内存屏障的依赖。但是即使在一些最强的内存模型下,内存屏障仍然是必须的。设置内存屏障往往与我们的直觉并不一致。近来处理器设计的趋势更倾向于弱的内存模型,因为弱内存模型削弱了缓存一致性,所以在多处理器平台和更大容量的内存下可以实现更好的可伸缩性

“一个线程的写操作对其他线程可见”这个问题是因为编译器对代码进行重排序导致的。例如,只要代码移动不会改变程序的语义,当编译器认为程序中移动一个写操作到后面会更有效的时候,编译器就会对代码进行移动。如果编译器推迟执行一个操作,其他线程可能在这个操作执行完之前都不会看到该操作的结果,这反映了缓存的影响。

此外,写入内存的操作能够被移动到程序里更前的时候。在这种情况下,其他的线程在程序中可能看到一个比它实际发生更早的写操作。所有的这些灵活性的设计是为了通过给编译器,运行时或硬件灵活性使其能在最佳顺序的情况下来执行操作。在内存模型的限定之内,我们能够获取到更高的性能。

看下面代码展示的一个简单例子:

Class Reordering {
  int x = 0, y = 0;
  public void writer() {
    x = 1;
    y = 2;
  }

  public void reader() {
    int r1 = y;
    int r2 = x;
  }
}

让我们看在两个并发线程中执行这段代码,读取Y变量将会得到2这个值。因为这个写入比写到X变量更晚一些,程序员可能认为读取X变量将肯定会得到1。但是,写入操作可能被重排序过。如果重排序发生了,那么,就能发生对Y变量的写入操作,读取两个变量的操作紧随其后,而且写入到X这个操作能发生。程序的结果可能是r1变量的值是2,但是r2变量的值为0。


Java内存模型描述了在多线程代码中哪些行为是合法的,以及线程如何通过内存进行交互。它描述了“程序中的变量“ 和 ”从内存或者寄存器获取或存储它们的底层细节”之间的关系。Java内存模型通过使用各种各样的硬件和编译器的优化来正确实现以上事情。

Java包含了几个语言级别的关键字,包括:volatile, final以及synchronized,目的是为了帮助程序员向编译器描述一个程序的并发需求。Java内存模型定义了volatile和synchronized的行为,更重要的是保证了同步的java程序在所有的处理器架构下面都能正确的运行。

二、其他语言,像C++,也有内存模型吗?

大部分其他的语言,像C和C++,都没有被设计成直接支持多线程。这些语言对于发生在编译器和处理器平台架构的重排序行为的保护机制会严重的依赖于程序中所使用的线程库(例如pthreads),编译器,以及代码所运行的平台所提供的保障。

三、JSR133是什么?

从1997年以来,人们不断发现Java语言规范的17章定义的Java内存模型中的一些严重的缺陷。这些缺陷会导致一些使人迷惑的行为(例如final字段会被观察到值的改变)和破坏编译器常见的优化能力。

Java内存模型是一个雄心勃勃的计划,它是编程语言规范第一次尝试合并一个能够在各种处理器架构中为并发提供一致语义的内存模型。不过,定义一个既一致又直观的内存模型远比想象要更难。JSR133为Java语言定义了一个新的内存模型,它修复了早期内存模型中的缺陷。为了实现JSR133,final和volatile的语义需要重新定义。

完整的语义见:http://www.cs.umd.edu/users/pugh/java/memoryModel,但是正式的语义不是小心翼翼的,它是令人惊讶和清醒的,目的是让人意识到一些看似简单的概念(如同步)其实有多复杂。幸运的是,你不需要懂得这些正式语义的细节——JSR133的目的是创建一组正式语义,这些正式语义提供了volatile、synchronzied和final如何工作的直观框架。

JSR 133的目标包含了:

  • 保留已经存在的安全保证(像类型安全)以及强化其他的安全保证。例如,变量值不能凭空创建:线程观察到的每个变量的值必须是被其他线程合理的设置的。
  • 正确同步的程序的语义应该尽量简单和直观。
  • 应该定义未完成或者未正确同步的程序的语义,主要是为了把潜在的安全危害降到最低。
  • 程序员应该能够自信的推断多线程程序如何同内存进行交互的。
  • 能够在现在许多流行的硬件架构中设计正确以及高性能的JVM实现。
  • 应该能提供 安全地初始化的保证。如果一个对象正确的构建了(意思是它的引用没有在构建的时候逸出,那么所有能够看到这个对象的引用的线程,在不进行同步的情况下,也将能看到在构造方法中中设置的final字段的值。
  • 应该尽量不影响现有的代码。

四、重排序意味着什么?

在很多情况下,访问一个程序变量(对象实例字段,类静态字段和数组元素)可能会使用不同的顺序执行,而不是程序语义所指定的顺序执行。编译器能够自由的以优化的名义去改变指令顺序。在特定的环境下,处理器可能会次序颠倒的执行指令。数据可能在寄存器,处理器缓冲区和主内存中以不同的次序移动,而不是按照程序指定的顺序。

例如,如果一个线程写入值到字段a,然后写入值到字段b,而且b的值不依赖于a的值,那么,处理器就能够自由的调整它们的执行顺序,而且缓冲区能够在a之前刷新b的值到主内存。有许多潜在的重排序的来源,例如编译器,JIT以及缓冲区。

编译器,运行时和硬件被期望一起协力创建好像是顺序执行的语义的假象,这意味着在单线程的程序中,程序应该是不能够观察到重排序的影响的。但是,重排序在没有正确同步了的多线程程序中开始起作用,在这些多线程程序中,一个线程能够观察到其他线程的影响,也可能检测到其他线程将会以一种不同于程序语义所规定的执行顺序来访问变量。

大部分情况下,一个线程不会关注其他线程正在做什么,但是当它需要关注的时候,这时候就需要同步了。

五、旧的内存模型有什么问题?

旧的内存模型中有几个严重的问题。这些问题很难理解,因此被广泛的违背。例如,旧的存储模型在许多情况下,不允许JVM发生各种重排序行为。旧的内存模型中让人产生困惑的因素造就了JSR-133规范的诞生。

例如,一个被广泛认可的概念就是,如果使用final字段,那么就没有必要在多个线程中使用同步来保证其他线程能够看到这个字段的值。尽管这是一个合理的假设和明显的行为,也是我们所期待的结果。实际上,在旧的内存模型中,我们想让程序正确运行起来却是不行的。在旧的内存模型中,final字段并没有同其他字段进行区别对待——这意味着同步是保证所有线程看到一个在构造方法中初始化的final字段的唯一方法。结果——如果没有正确同步的话,对一个线程来说,它可能看到一个字段的默认值,然后在稍后的时间里,又能够看到构造方法中设置的值。这意味着,一些不可变的对象,例如String,能够改变它们值——这实在很让人郁闷。

旧的内存模型允许volatile变量的写操作和非volaitle变量的读写操作一起进行重排序,这和大多数的开发人员对于volatile变量的直观感受是不一致的,因此会造成迷惑。

最后,我们将看到的是,程序员对于程序没有被正确同步的情况下将会发生什么的直观感受通常是错误的。JSR-133的目的之一就是要引起这方面的注意。

六、没有正确同步的含义是什么?

没有正确同步的代码对于不同的人来说可能会有不同的理解。在Java内存模型这个语义环境下,我们谈到“没有正确同步”,我们的意思是:

  1. 一个线程中有一个对变量的写操作,
  2. 另外一个线程对同一个变量有读操作,
  3. 而且写操作和读操作没有通过同步来保证顺序。

当这些规则被违反的时候,我们就说在这个变量上有一个“数据竞争”(data race)。一个有数据竞争的程序就是一个没有正确同步的程序。

七、同步会干些什么?

同步有几个方面的作用。最广为人知的就是互斥 ——一次只有一个线程能够获得一个监视器,因此,在一个监视器上面同步意味着一旦一个线程进入到监视器保护的同步块中,其他的线程都不能进入到同一个监视器保护的块中间,除非第一个线程退出了同步块。

但是同步的含义比互斥更广。同步保证了一个线程在同步块之前或者在同步块中的一个内存写入操作以可预知的方式对其他有相同监视器的线程可见。当我们退出了同步块,我们就释放了这个监视器,这个监视器有刷新缓冲区到主内存的效果,因此该线程的写入操作能够为其他线程所见。在我们进入一个同步块之前,我们需要获取监视器,监视器有使本地处理器缓存失效的功能,因此变量会从主存重新加载,于是其它线程对共享变量的修改对当前线程来说就变得可见了。

依据缓存来讨论同步,可能听起来这些观点仅仅会影响到多处理器的系统。但是,重排序效果能够在单一处理器上面很容易见到。对编译器来说,在获取之前或者释放之后移动你的代码是不可能的。当我们谈到在缓冲区上面进行的获取和释放操作,我们使用了简述的方式来描述大量可能的影响。

新的内存模型语义在内存操作(读取字段,写入字段,锁,解锁)以及其他线程的操作(start 和 join)中创建了一个部分排序,在这些操作中,一些操作被称为happen before其他操作。当一个操作在另外一个操作之前发生,第一个操作保证能够排到前面并且对第二个操作可见。这些排序的规则如下:

  • 线程中的每个操作happens before该线程中在程序顺序上后续的每个操作。
  • 解锁一个监视器的操作happens before随后对相同监视器进行锁的操作。
  • 对volatile字段的写操作happens before后续对相同volatile字段的读取操作。
  • 线程上调用start()方法happens before这个线程启动后的任何操作。
  • 一个线程中所有的操作都happens before从这个线程join()方法成功返回的任何其他线程。(注意思是其他线程等待一个线程的jion()方法完成,那么,这个线程中的所有操作happens before其他线程中的所有操作)

这意味着:任何内存操作,这个内存操作在退出一个同步块前对一个线程是可见的,对任何线程在它进入一个被相同的监视器保护的同步块后都是可见的,因为所有内存操作happens before释放监视器以及释放监视器happens before获取监视器。

其他如下模式的实现被一些人用来强迫实现一个内存屏障的,不会生效:

synchronized (new Object()) {}

这段代码其实不会执行任何操作,你的编译器会把它完全移除掉,因为编译器知道没有其他的线程会使用相同的监视器进行同步。要看到其他线程的结果,你必须为一个线程建立happens before关系。

重点注意:对两个线程来说,为了正确建立happens before关系而在相同监视器上面进行同步是非常重要的。以下观点是错误的:当线程A在对象X上面同步的时候,所有东西对线程A可见,线程B在对象Y上面进行同步的时候,所有东西对线程B也是可见的。释放监视器和获取监视器必须匹配(也就是说要在相同的监视器上面完成这两个操作),否则,代码就会存在“数据竞争”。

八、Final字段如何改变它们的值?

我们可以通过分析String类的实现具体细节来展示一个final变量是如何可以改变的。

String对象包含了三个字段:一个character数组,一个数组的offset和一个length。实现String类的基本原理为:它不仅仅拥有character数组,而且为了避免多余的对象分配和拷贝,多个String和StringBuffer对象都会共享相同的character数组。因此,String.substring()方法能够通过改变length和offset,而共享原始的character数组来创建一个新的String。对一个String来说,这些字段都是final型的字段。

String s1 = "/usr/tmp";
String s2 = s1.substring(4); 

字符串s2的offset的值为4,length的值为4。但是,在旧的内存模型下,对其他线程来说,看到offset拥有默认的值0是可能的,而且,稍后一点时间会看到正确的值4,好像字符串的值从“/usr”变成了“/tmp”一样。

旧的Java内存模型允许这些行为,部分JVM已经展现出这样的行为了。在新的Java内存模型里面,这些是非法的。

九、在新的Java内存模型中,final字段是如何工作的?

一个对象的final字段值是在它的构造方法里面设置的。假设对象被正确的构造了,一旦对象被构造,在构造方法里面设置给final字段的的值在没有同步的情况下对所有其他的线程都会可见。另外,引用这些final字段的对象或数组都将会看到final字段的最新值。

对一个对象来说,被正确的构造是什么意思呢?简单来说,它意味着这个正在构造的对象的引用在构造期间没有被允许逸出。(参见安全构造技术)。换句话说,不要让其他线程在其他地方能够看见一个构造期间的对象引用。不要指派给一个静态字段,不要作为一个listener注册给其他对象等等。这些操作应该在构造方法之后完成,而不是构造方法中来完成。

class FinalFieldExample {
  final int x;
  int y;
  static FinalFieldExample f;
  public FinalFieldExample() {
    x = 3;
    y = 4;
  }

  static void writer() {
    f = new FinalFieldExample();
  }

  static void reader() {
    if (f != null) {
      int i = f.x;
      int j = f.y;
    }
  }
}

上面的类展示了final字段应该如何使用。一个正在执行reader方法的线程保证看到f.x的值为3,因为它是final字段。它不保证看到f.y的值为4,因为f.y不是final字段。如果FinalFieldExample的构造方法像这样:

public FinalFieldExample() { // bad!
  x = 3;
  y = 4;
  // bad construction - allowing this to escape
  global.obj = this;
}

那么,从global.obj中读取this的引用线程不会保证读取到的x的值为3。

能够看到字段的正确的构造值固然不错,但是,如果字段本身就是一个引用,那么,你还是希望你的代码能够看到引用所指向的这个对象(或者数组)的最新值。如果你的字段是final字段,那么这是能够保证的。因此,当一个final指针指向一个数组,你不需要担心线程能够看到引用的最新值却看不到引用所指向的数组的最新值。重复一下,这儿的“正确的”的意思是“对象构造方法结尾的最新的值”而不是“最新可用的值”。

现在,在讲了如上的这段之后,如果在一个线程构造了一个不可变对象之后(对象仅包含final字段),你希望保证这个对象被其他线程正确的查看,你仍然需要使用同步才行。例如,没有其他的方式可以保证不可变对象的引用将被第二个线程看到。使用final字段的程序应该仔细的调试,这需要深入而且仔细的理解并发在你的代码中是如何被管理的。

如果你使用JNI来改变你的final字段,这方面的行为是没有定义的。

十、volatile是干什么用的?

Volatile字段是用于线程间通讯的特殊字段。每次读volatile字段都会看到其它线程写入该字段的最新值;实际上,程序员之所以要定义volatile字段是因为在某些情况下由于缓存和重排序所看到的陈旧的变量值是不可接受的。编译器和运行时禁止在寄存器里面分配它们。它们还必须保证,在它们写好之后,它们被从缓冲区刷新到主存中,因此,它们立即能够对其他线程可见。相同地,在读取一个volatile字段之前,缓冲区必须失效,因为值是存在于主存中而不是本地处理器缓冲区。在重排序访问volatile变量的时候还有其他的限制。

在旧的内存模型下,访问volatile变量不能被重排序,但是,它们可能和访问非volatile变量一起被重排序。这破坏了volatile字段从一个线程到另外一个线程作为一个信号条件的手段。

在新的内存模型下,volatile变量仍然不能彼此重排序。和旧模型不同的时候,volatile周围的普通字段的也不再能够随便的重排序了。写入一个volatile字段和释放监视器有相同的内存影响,而且读取volatile字段和获取监视器也有相同的内存影响。事实上,因为新的内存模型在重排序volatile字段访问上面和其他字段(volatile或者非volatile)访问上面有了更严格的约束。当线程A写入一个volatile字段f的时候,如果线程B读取f的话 ,那么对线程A可见的任何东西都变得对线程B可见了。

如下例子展示了volatile字段应该如何使用:

class VolatileExample {
  int x = 0;
  volatile boolean v = false;
  public void writer() {
    x = 42;
    v = true;
  }

  public void reader() {
    if (v == true) {
      //uses x - guaranteed to see 42.
    }
  }
}

假设一个线程叫做“writer”,另外一个线程叫做“reader”。对变量v的写操作会等到变量x写入到内存之后,然后读线程就可以看见v的值。因此,如果reader线程看到了v的值为true,那么,它也保证能够看到在之前发生的写入42这个操作。而这在旧的内存模型中却未必是这样的。如果v不是volatile变量,那么,编译器可以在writer线程中重排序写入操作,那么reader线程中的读取x变量的操作可能会看到0。

实际上,volatile的语义已经被加强了,已经快达到同步的级别了。为了可见性的原因,每次读取和写入一个volatile字段已经像一个半同步操作了

重点注意:对两个线程来说,为了正确的设置happens-before关系,访问相同的volatile变量是很重要的。以下的结论是不正确的:当线程A写volatile字段f的时候,线程A可见的所有东西,在线程B读取volatile的字段g之后,变得对线程B可见了。释放操作和获取操作必须匹配(也就是在同一个volatile字段上面完成)。

十一、新的内存模型是否修复了双重锁检查问题?

臭名昭著的双重锁检查(也叫多线程单例模式)是一个骗人的把戏,它用来支持lazy初始化,同时避免过度使用同步。在非常早的JVM中,同步非常慢,开发人员非常希望删掉它。双重锁检查代码如下:

// double-checked-locking - don't do this!

private static Something instance = null;

public Something getInstance() {
  if (instance == null) {
    synchronized (this) {
      if (instance == null)
        instance = new Something();
    }
  }
  return instance;
}

这看起来好像非常聪明——在公用代码中避免了同步。这段代码只有一个问题 —— 它不能正常工作。为什么呢?最明显的原因是,初始化实例的写入操作和实例字段的写入操作能够被编译器或者缓冲区重排序,重排序可能会导致返回部分构造的一些东西。就是我们读取到了一个没有初始化的对象。这段代码还有很多其他的错误,以及为什么对这段代码的算法修正是错误的。在旧的java内存模型下没有办法修复它。更多深入的信息可参见:Double-checkedlocking: Clever but broken and The “DoubleChecked Locking is broken” declaration

许多人认为使用volatile关键字能够消除双重锁检查模式的问题。在1.5的JVM之前,volatile并不能保证这段代码能够正常工作(因环境而定)。在新的内存模型下,实例字段使用volatile可以解决双重锁检查的问题,因为在构造线程来初始化一些东西和读取线程返回它的值之间有happens-before关系。

然后,对于喜欢使用双重锁检查的人来说(我们真的希望没有人这样做),仍然不是好消息。双重锁检查的重点是为了避免过度使用同步导致性能问题。从java1.0开始,不仅同步会有昂贵的性能开销,而且在新的内存模型下,使用volatile的性能开销也有所上升,几乎达到了和同步一样的性能开销。因此,使用双重锁检查来实现单例模式仍然不是一个好的选择。(修订—在大多数平台下,volatile性能开销还是比较低的)。

使用IODH来实现多线程模式下的单例会更易读:

private static class LazySomethingHolder {
  public static Something something = new Something();
}

public static Something getInstance() {
  return LazySomethingHolder.something;
}

这段代码是正确的,因为初始化是由static字段来保证的。如果一个字段设置在static初始化中,对其他访问这个类的线程来说是是能正确的保证它的可见性的。

十二、如果我需要写一个VM,我需要做些什么?

参见: http://gee.cs.oswego.edu/dl/jmm/cookbook.html .

十三、为什么我需要关注java内存模型?

为什么你需要关注java内存模型?并发程序的bug非常难找。它们经常不会在测试中发生,而是直到你的程序运行在高负荷的情况下才发生,非常难于重现和跟踪。你需要花费更多的努力提前保证你的程序是正确同步的。这不容易,但是它比调试一个没有正确同步的程序要容易的多。

 

原文分割线:-------------------------------------------------------------------------------------------------------------

What is a memory model, anyway?

In multiprocessor systems, processors generally have one or more layers of memory cache, which improves performance both by speeding access to data (because the data is closer to the processor) and reducing traffic on the shared memory bus (because many memory operations can be satisfied by local caches.) Memory caches can improve performance tremendously, but they present a host of new challenges. What, for example, happens when two processors examine the same memory location at the same time? Under what conditions will they see the same value?

At the processor level, a memory model defines necessary and sufficient conditions for knowing that writes to memory by other processors are visible to the current processor, and writes by the current processor are visible to other processors. Some processors exhibit a strong memory model, where all processors see exactly the same value for any given memory location at all times. Other processors exhibit a weaker memory model, where special instructions, called memory barriers, are required to flush or invalidate the local processor cache in order to see writes made by other processors or make writes by this processor visible to others. These memory barriers are usually performed when lock and unlock actions are taken; they are invisible to programmers in a high level language.

It can sometimes be easier to write programs for strong memory models, because of the reduced need for memory barriers. However, even on some of the strongest memory models, memory barriers are often necessary; quite frequently their placement is counterintuitive. Recent trends in processor design have encouraged weaker memory models, because the relaxations they make for cache consistency allow for greater scalability across multiple processors and larger amounts of memory.

The issue of when a write becomes visible to another thread is compounded by the compiler's reordering of code. For example, the compiler might decide that it is more efficient to move a write operation later in the program; as long as this code motion does not change the program's semantics, it is free to do so.  If a compiler defers an operation, another thread will not see it until it is performed; this mirrors the effect of caching.

Moreover, writes to memory can be moved earlier in a program; in this case, other threads might see a write before it actually "occurs" in the program.  All of this flexibility is by design -- by giving the compiler, runtime, or hardware the flexibility to execute operations in the optimal order, within the bounds of the memory model, we can achieve higher performance.

A simple example of this can be seen in the following code:

<span style="color:#000000"><code>Class Reordering {
  int x = 0, y = 0;
  public void writer() {
    x = 1;
    y = 2;
  }

  public void reader() {
    int r1 = y;
    int r2 = x;
  }
}
</code></span>

Let's say that this code is executed in two threads concurrently, and the read of y sees the value 2. Because this write came after the write to x, the programmer might assume that the read of x must see the value 1. However, the writes may have been reordered. If this takes place, then the write to y could happen, the reads of both variables could follow, and then the write to x could take place. The result would be that r1 has the value 2, but r2 has the value 0.

The Java Memory Model describes what behaviors are legal in multithreaded code, and how threads may interact through memory. It describes the relationship between variables in a program and the low-level details of storing and retrieving them to and from memory or registers in a real computer system. It does this in a way that can be implemented correctly using a wide variety of hardware and a wide variety of compiler optimizations.

Java includes several language constructs, including volatile, final, and synchronized, which are intended to help the programmer describe a program's concurrency requirements to the compiler. The Java Memory Model defines the behavior of volatile and synchronized, and, more importantly, ensures that a correctly synchronized Java program runs correctly on all processor architectures.

Do other languages, like C++, have a memory model?

Most other programming languages, such as C and C++, were not designed with direct support for multithreading. The protections that these languages offer against the kinds of reorderings that take place in compilers and architectures are heavily dependent on the guarantees provided by the threading libraries used (such as pthreads), the compiler used, and the platform on which the code is run.  

What is JSR 133 about?

Since 1997, several serious flaws have been discovered in the Java Memory Model as defined in Chapter 17 of the Java Language Specification. These flaws allowed for confusing behaviors (such as final fields being observed to change their value) and undermined the compiler's ability to perform common optimizations.

The Java Memory Model was an ambitious undertaking; it was the first time that a programming language specification attempted to incorporate a memory model which could provide consistent semantics for concurrency across a variety of architectures. Unfortunately, defining a memory model which is both consistent and intuitive proved far more difficult than expected. JSR 133 defines a new memory model for the Java language which fixes the flaws of the earlier memory model. In order to do this, the semantics of final and volatile needed to change.

The full semantics are available at http://www.cs.umd.edu/users/pugh/java/memoryModel, but the formal semantics are not for the timid. It is surprising, and sobering, to discover how complicated seemingly simple concepts like synchronization really are. Fortunately, you need not understand the details of the formal semantics -- the goal of JSR 133 was to create a set of formal semantics that provides an intuitive framework for how volatile, synchronized, and final work.

The goals of JSR 133 include:

  • Preserving existing safety guarantees, like type-safety, and strengthening others. For example, variable values may not be created "out of thin air": each value for a variable observed by some thread must be a value that can reasonably be placed there by some thread.
  • The semantics of correctly synchronized programs should be as simple and intuitive as possible.
  • The semantics of incompletely or incorrectly synchronized programs should be defined so that potential security hazards are minimized.
  • Programmers should be able to reason confidently about how multithreaded programs interact with memory.
  • It should be possible to design correct, high performance JVM implementations across a wide range of popular hardware architectures.
  • A new guarantee of initialization safety should be provided. If an object is properly constructed (which means that references to it do not escape during construction), then all threads which see a reference to that object will also see the values for its final fields that were set in the constructor, without the need for synchronization.
  • There should be minimal impact on existing code.

What is meant by reordering?

There are a number of cases in which accesses to program variables (object instance fields, class static fields, and array elements) may appear to execute in a different order than was specified by the program. The compiler is free to take liberties with the ordering of instructions in the name of optimization. Processors may execute instructions out of order under certain circumstances. Data may be moved between registers, processor caches, and main memory in different order than specified by the program.

For example, if a thread writes to field a and then to field b, and the value of b does not depend on the value of a, then the compiler is free to reorder these operations, and the cache is free to flush b to main memory before a. There are a number of potential sources of reordering, such as the compiler, the JIT, and the cache.

The compiler, runtime, and hardware are supposed to conspire to create the illusion of as-if-serial semantics, which means that in a single-threaded program, the program should not be able to observe the effects of reorderings. However, reorderings can come into play in incorrectly synchronized multithreaded programs, where one thread is able to observe the effects of other threads, and may be able to detect that variable accesses become visible to other threads in a different order than executed or specified in the program.

Most of the time, one thread doesn't care what the other is doing. But when it does, that's what synchronization is for.

What was wrong with the old memory model?

There were several serious problems with the old memory model. It was difficult to understand, and therefore widely violated. For example, the old model did not, in many cases, allow the kinds of reorderings that took place in every JVM. This confusion about the implications of the old model was what compelled the formation of JSR-133.

One widely held belief, for example, was that if final fields were used, then synchronization between threads was unnecessary to guarantee another thread would see the value of the field. While this is a reasonable assumption and a sensible behavior, and indeed how we would want things to work, under the old memory model, it was simply not true. Nothing in the old memory model treated final fields differently from any other field -- meaning synchronization was the only way to ensure that all threads see the value of a final field that was written by the constructor. As a result, it was possible for a thread to see the default value of the field, and then at some later time see its constructed value. This means, for example, that immutable objects like String can appear to change their value -- a disturbing prospect indeed.

The old memory model allowed for volatile writes to be reordered with nonvolatile reads and writes, which was not consistent with most developers intuitions about volatile and therefore caused confusion.

Finally, as we shall see, programmers' intuitions about what can occur when their programs are incorrectly synchronized are often mistaken. One of the goals of JSR-133 is to call attention to this fact.

What do you mean by “incorrectly synchronized”?

Incorrectly synchronized code can mean different things to different people. When we talk about incorrectly synchronized code in the context of the Java Memory Model, we mean any code where

  1. there is a write of a variable by one thread,
  2. there is a read of the same variable by another thread and
  3. the write and read are not ordered by synchronization

When these rules are violated, we say we have a data race on that variable. A program with a data race is an incorrectly synchronized program.

What does synchronization do?

Synchronization has several aspects. The most well-understood is mutual exclusion -- only one thread can hold a monitor at once, so synchronizing on a monitor means that once one thread enters a synchronized block protected by a monitor, no other thread can enter a block protected by that monitor until the first thread exits the synchronized block.

But there is more to synchronization than mutual exclusion. Synchronization ensures that memory writes by a thread before or during a synchronized block are made visible in a predictable manner to other threads which synchronize on the same monitor. After we exit a synchronized block, we release the monitor, which has the effect of flushing the cache to main memory, so that writes made by this thread can be visible to other threads. Before we can enter a synchronized block, we acquire the monitor, which has the effect of invalidating the local processor cache so that variables will be reloaded from main memory. We will then be able to see all of the writes made visible by the previous release.

Discussing this in terms of caches, it may sound as if these issues only affect multiprocessor machines. However, the reordering effects can be easily seen on a single processor. It is not possible, for example, for the compiler to move your code before an acquire or after a release. When we say that acquires and releases act on caches, we are using shorthand for a number of possible effects.

The new memory model semantics create a partial ordering on memory operations (read field, write field, lock, unlock) and other thread operations (start and join), where some actions are said to happen before other operations. When one action happens before another, the first is guaranteed to be ordered before and visible to the second. The rules of this ordering are as follows:

  • Each action in a thread happens before every action in that thread that comes later in the program's order.
  • An unlock on a monitor happens before every subsequent lock on that same monitor.
  • A write to a volatile field happens before every subsequent read of that same volatile.
  • A call to start() on a thread happens before any actions in the started thread.
  • All actions in a thread happen before any other thread successfully returns from a join() on that thread.

This means that any memory operations which were visible to a thread before exiting a synchronized block are visible to any thread after it enters a synchronized block protected by the same monitor, since all the memory operations happen before the release, and the release happens before the acquire.

Another implication is that the following pattern, which some people use to force a memory barrier, doesn't work:

<span style="color:#000000"><code>synchronized (new Object()) {}</code></span>

This is actually a no-op, and your compiler can remove it entirely, because the compiler knows that no other thread will synchronize on the same monitor. You have to set up a happens-before relationship for one thread to see the results of another.

Important Note: Note that it is important for both threads to synchronize on the same monitor in order to set up the happens-before relationship properly. It is not the case that everything visible to thread A when it synchronizes on object X becomes visible to thread B after it synchronizes on object Y. The release and acquire have to "match" (i.e., be performed on the same monitor) to have the right semantics. Otherwise, the code has a data race.

How can final fields appear to change their values?

One of the best examples of how final fields' values can be seen to change involves one particular implementation of the String class.

A String can be implemented as an object with three fields -- a character array, an offset into that array, and a length. The rationale for implementing String this way, instead of having only the character array, is that it lets multiple String and StringBuffer objects share the same character array and avoid additional object allocation and copying. So, for example, the methodString.substring() can be implemented by creating a new string which shares the same character array with the original String and merely differs in the length and offset fields. For a String, these fields are all final fields.

<span style="color:#000000"><code>String s1 = "/usr/tmp";
String s2 = s1.substring(4); 
</code></span>

The string s2 will have an offset of 4 and a length of 4. But, under the old model, it was possible for another thread to see the offset as having the default value of 0, and then later see the correct value of 4, it will appear as if the string "/usr" changes to "/tmp".

The original Java Memory Model allowed this behavior; several JVMs have exhibited this behavior. The new Java Memory Model makes this illegal.

How do final fields work under the new JMM?

The values for an object's final fields are set in its constructor. Assuming the object is constructed "correctly", once an object is constructed, the values assigned to the final fields in the constructor will be visible to all other threads without synchronization. In addition, the visible values for any other object or array referenced by those final fields will be at least as up-to-date as the final fields.

What does it mean for an object to be properly constructed? It simply means that no reference to the object being constructed is allowed to "escape" during construction. (See Safe Construction Techniques for examples.)  In other words, do not place a reference to the object being constructed anywhere where another thread might be able to see it; do not assign it to a static field, do not register it as a listener with any other object, and so on. These tasks should be done after the constructor completes, not in the constructor.

<span style="color:#000000"><code>class FinalFieldExample {
  final int x;
  int y;
  static FinalFieldExample f;
  public FinalFieldExample() {
    x = 3;
    y = 4;
  }

  static void writer() {
    f = new FinalFieldExample();
  }

  static void reader() {
    if (f != null) {
      int i = f.x;
      int j = f.y;
    }
  }
}
</code></span>

The class above is an example of how final fields should be used. A thread executing reader is guaranteed to see the value 3 for f.x, because it is final. It is not guaranteed to see the value 4 for y, because it is not final. If FinalFieldExample's constructor looked like this:

<span style="color:#000000"><code>public FinalFieldExample() { // bad!
  x = 3;
  y = 4;
  // bad construction - allowing this to escape
  global.obj = this;
}
</code></span>

then threads that read the reference to this from global.obj are not guaranteed to see 3 for x.

The ability to see the correctly constructed value for the field is nice, but if the field itself is a reference, then you also want your code to see the up to date values for the object (or array) to which it points. If your field is a final field, this is also guaranteed. So, you can have a final pointer to an array and not have to worry about other threads seeing the correct values for the array reference, but incorrect values for the contents of the array. Again, by "correct" here, we mean "up to date as of the end of the object's constructor", not "the latest value available".

Now, having said all of this, if, after a thread constructs an immutable object (that is, an object that only contains final fields), you want to ensure that it is seen correctly by all of the other thread, you still typically need to use synchronization. There is no other way to ensure, for example, that the reference to the immutable object will be seen by the second thread. The guarantees the program gets from final fields should be carefully tempered with a deep and careful understanding of how concurrency is managed in your code.

There is no defined behavior if you want to use JNI to change final fields.

What does volatile do?

Volatile fields are special fields which are used for communicating state between threads. Each read of a volatile will see the last write to that volatile by any thread; in effect, they are designated by the programmer as fields for which it is never acceptable to see a "stale" value as a result of caching or reordering. The compiler and runtime are prohibited from allocating them in registers. They must also ensure that after they are written, they are flushed out of the cache to main memory, so they can immediately become visible to other threads. Similarly, before a volatile field is read, the cache must be invalidated so that the value in main memory, not the local processor cache, is the one seen. There are also additional restrictions on reordering accesses to volatile variables.

Under the old memory model, accesses to volatile variables could not be reordered with each other, but they could be reordered with nonvolatile variable accesses. This undermined the usefulness of volatile fields as a means of signaling conditions from one thread to another.

Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them. Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

Here is a simple example of how volatile fields can be used:

<span style="color:#000000"><code>class VolatileExample {
  int x = 0;
  volatile boolean v = false;
  public void writer() {
    x = 42;
    v = true;
  }

  public void reader() {
    if (v == true) {
      //uses x - guaranteed to see 42.
    }
  }
}
</code></span>

Assume that one thread is calling writer, and another is calling reader. The write to v in writer releases the write to x to memory, and the read of v acquires that value from memory. Thus, if the reader sees the value true for v, it is also guaranteed to see the write to 42 that happened before it. This would not have been true under the old memory model.  If v were not volatile, then the compiler could reorder the writes in writer, and reader's read of x might see 0.

Effectively, the semantics of volatile have been strengthened substantially, almost to the level of synchronization. Each read or write of a volatile field acts like "half" a synchronization, for purposes of visibility.

Important Note: Note that it is important for both threads to access the same volatile variable in order to properly set up the happens-before relationship. It is not the case that everything visible to thread A when it writes volatile field f becomes visible to thread B after it reads volatile field g. The release and acquire have to "match" (i.e., be performed on the same volatile field) to have the right semantics.

Does the new memory model fix the "double-checked locking" problem?

The (infamous) double-checked locking idiom (also called the multithreaded singleton pattern) is a trick designed to support lazy initialization while avoiding the overhead of synchronization. In very early JVMs, synchronization was slow, and developers were eager to remove it -- perhaps too eager. The double-checked locking idiom looks like this:

<span style="color:#000000"><code>// double-checked-locking - don't do this!

private static Something instance = null;

public Something getInstance() {
  if (instance == null) {
    synchronized (this) {
      if (instance == null)
        instance = new Something();
    }
  }
  return instance;
}
</code></span>

This looks awfully clever -- the synchronization is avoided on the common code path. There's only one problem with it -- it doesn't work. Why not? The most obvious reason is that the writes which initialize instance and the write to the instance field can be reordered by the compiler or the cache, which would have the effect of returning what appears to be a partially constructed Something. The result would be that we read an uninitialized object. There are lots of other reasons why this is wrong, and why algorithmic corrections to it are wrong. There is no way to fix it using the old Java memory model. More in-depth information can be found at Double-checked locking: Clever, but broken and The "Double Checked Locking is broken" declaration

Many people assumed that the use of the volatile keyword would eliminate the problems that arise when trying to use the double-checked-locking pattern. In JVMs prior to 1.5, volatilewould not ensure that it worked (your mileage may vary). Under the new memory model, making the instance field volatile will "fix" the problems with double-checked locking, because then there will be a happens-before relationship between the initialization of the Something by the constructing thread and the return of its value by the thread that reads it.

However, for fans of double-checked locking (and we really hope there are none left), the news is still not good. The whole point of double-checked locking was to avoid the performance overhead of synchronization. Not only has brief synchronization gotten a LOT less expensive since the Java 1.0 days, but under the new memory model, the performance cost of using volatile goes up, almost to the level of the cost of synchronization. So there's still no good reason to use double-checked-locking. Redacted -- volatiles are cheap on most platforms.

Instead, use the Initialization On Demand Holder idiom, which is thread-safe and a lot easier to understand:

<span style="color:#000000"><code>private static class LazySomethingHolder {
  public static Something something = new Something();
}

public static Something getInstance() {
  return LazySomethingHolder.something;
}
</code></span>

This code is guaranteed to be correct because of the initialization guarantees for static fields; if a field is set in a static initializer, it is guaranteed to be made visible, correctly, to any thread that accesses that class.

What if I'm writing a VM?

You should look at http://gee.cs.oswego.edu/dl/jmm/cookbook.html .

Why should I care?

Why should you care? Concurrency bugs are very difficult to debug. They often don't appear in testing, waiting instead until your program is run under heavy load, and are hard to reproduce and trap. You are much better off spending the extra effort ahead of time to ensure that your program is properly synchronized; while this is not easy, it's a lot easier than trying to debug a badly synchronized application.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值