jsr cookbook 133

The JSR-133Cookbook for Compiler Writers

by Doug Lea, with help from members of the JMM mailing list.

dl@cs.oswego.edu.

Preface: Over the10+ years since this was initially written, many processor and language memorymodel specifications and issues have become clearer and better understood. Andmany have not. While this guide is maintained to remain accurate, it isincomplete about some of these evolving details. For more extensive coverage,see especially the work of Peter Sewell and the Cambridge RelaxedMemory Concurrency Group

This is anunofficial guide to implementing the new Java Memory Model (JMM) specified by JSR-133 . It provides atmost brief backgrounds about why various rules exist, instead concentrating ontheir consequences for compilers and JVMs with respect to instructionreorderings, multiprocessor barrier instructions, and atomic operations. Itincludes a set of recommended recipes for complying to JSR-133. This guide is"unofficial" because it includes interpretations of particularprocessor properties and specifications. We cannot guarantee that theintepretations are correct. Also, processor specifications and implementationsmay change over time.

Reorderings

For a compilerwriter, the JMM mainly consists of rules disallowing reorderings of certaininstructions that access fields (where "fields" include arrayelements) as well as monitors (locks).

Volatiles andMonitors

The main JMM rules for volatiles and monitors can be viewed as a matrixwith cells indicating that you cannot reorder instructions associated withparticular sequences of bytecodes. This table is not itself the JMMspecification; it is just a useful way of viewing its main consequences forcompilers and runtime systems.

Can Reorder

2nd operation

1st operation

Normal Load
Normal Store

Volatile Load 
MonitorEnter

Volatile Store 
MonitorExit

Normal Load
Normal Store

No

Volatile Load 
MonitorEnter

No

No

No

Volatile store 
MonitorExit

No

No

Where:

  • Normal Loads are getfield, getstatic, array load of non-volatile fields.
  • Normal Stores are putfield, putstatic, array store of non-volatile fields
  • Volatile Loads are getfield, getstatic of volatile fields that are accessible by multiple threads
  • Volatile Stores are putfield, putstatic of volatile fields that are accessible by multiple threads
  • MonitorEnters (including entry to synchronized methods) are for lock objects accessible by multiple threads.
  • MonitorExits (including exit from synchronized methods) are for lock objects accessible by multiple threads.

The cells forNormal Loads are the same as for Normal Stores, those for Volatile Loads arethe same as MonitorEnter, and those for Volatile Stores are same asMonitorExit, so they are collapsed together here (but are expanded out asneeded in subsequent tables). We consider here only variables that are readableand writable as an atomic unit -- that is, no bit fields, unaligned accesses, oraccesses larger than word sizes available on a platform.

Any number ofother operations might be present between the indicated 1st and 2nd operationsin the table. So, for example, the "No" in cell [Normal Store,Volatile Store] says that a non-volatile store cannot be reordered with ANYsubsequent volatile store; at least any that can make a difference inmultithreaded program semantics.

The JSR-133specification is worded such that the rules for both volatiles and monitorsapply only to those that may be accessed by multiple threads. If a compiler cansomehow (usually only with great effort) prove that a lock is only accessiblefrom a single thread, it may be eliminated. Similarly, a volatile fieldprovably accessible from only a single thread acts as a normal field. Morefine-grained analyses and optimizations are also possible, for example, thoserelying on provable inaccessibility from multiple threads only during certainintervals.

Blank cells in thetable mean that the reordering is allowed if the accesses aren't otherwisedependent with respect to basic Java semantics (as specified in the JLS). For exampleeven though the table doesn't say so, you can't reorder a load with a subsequentstore to the same location. But you can reorder a load and store to twodistinct locations, and may wish to do so in the course of various compilertransformations and optimizations. This includes cases that aren't usuallythought of as reorderings; for example reusing a computed value based on aloaded field rather than reloading and recomputing the value acts as areordering. However, the JMM spec permits transformations that eliminateavoidable dependencies, and in turn allow reorderings.

In all cases,permitted reorderings must maintain minimal Java safety properties even whenaccesses are incorrectly synchronized by programmers: All observed field valuesmust be either the default zero/null "pre-construction" values, orthose written by some thread. This usually entails zeroing all heap memoryholding objects before it is used in constructors and never reordering otherloads with the zeroing stores. A good way to do this is to zero out reclaimedmemory within the garbage collector. See the JSR-133 spec for rules dealingwith other corner cases surrounding safety guarantees.

The rules andproperties described here are for accesses to Java-level fields. In practice,these will additionally interact with accesses to internal bookkeeping fieldsand data, for example object headers, GC tables, and dynamically generatedcode.

Final Fields

Loads and Storesof final fields act as "normal" accesses with respect to locks andvolatiles, but impose two additional reordering rules:

1.    A store of a final field (inside aconstructor) and, if the field is a reference, any store that this final canreference, cannot be reordered with a subsequent store (outside thatconstructor) of the reference to the object holding that field into a variableaccessible to other threads. For example, you cannot reorder
      x.finalField = v; ... ; sharedRef = x;
This comes into play for example when inlining constructors, where "..." spans thelogical end of the constructor. You cannot move stores of finals withinconstructors down below a store outside of the constructor that might make theobject visible to other threads. (As seen below, this may also require issuinga barrier). Similarly, you cannot reorder either of the first two with thethird assignment in:
      v.afield = 1; x.finalField = v; ... ;sharedRef = x;

2.    The initial load (i.e., the very firstencounter by a thread) of a final field cannot be reordered with the initialload of the reference to the object containing the final field. This comes intoplay in:
      x = sharedRef; ... ; i = x.finalField;
A compiler would never reorder these since they are dependent, but there can beconsequences of this rule on some processors.

These rules implythat reliable use of final fields by Java programmers requires that the load ofa shared reference to an object with a final field itself be synchronized,volatile, or final, or derived from such a load, thus ultimately ordering theinitializing stores in constructors with subsequent uses outside constructors.

Memory Barriers

Compilers andprocessors must both obey reordering rules. No particular effort is required toensure that uniprocessors maintain proper ordering, since they all guarantee"as-if-sequential" consistency. But on multiprocessors, guaranteeingconformance often requires emitting barrier instructions. Even if a compileroptimizes away a field access (for example because a loaded value is not used),barriers must still be generated as if the access were still present. (Althoughsee below about independently optimizing away barriers.)

Memory barriersare only indirectly related to higher-level notions described in memory modelssuch as "acquire" and "release". And memory barriers arenot themselves "synchronization barriers". And memory barriers areunrelated to the kinds of "write barriers" used in some garbagecollectors. Memory barrier instructions directly control only the interactionof a CPU with its cache, with its write-buffer that holds stores waiting to beflushed to memory, and/or its buffer of waiting loads or speculatively executedinstructions. These effects may lead to further interaction among caches, mainmemory and other processors. But there is nothing in the JMM that mandates anyparticular form of communication across processors so long as stores eventuallybecome globally performed; i.e., visible across all processors, and that loadsretrieve them when they are visible.

Categories

Nearly allprocessors support at least a coarse-grained barrier instruction, often justcalled a Fence, that guarantees thatall loads and stores initiated before the fence will be strictly ordered beforeany load or store initiated after the fence. This is usually among the mosttime-consuming instructions on any given processor (often nearly as, or evenmore expensive than atomic instructions). Most processors additionally supportmore fine-grained barriers.

A property ofmemory barriers that takes some getting used to is that they apply BETWEEN memory accesses. Despite the names givenfor barrier instructions on some processors, the right/best barrier to usedepends on the kinds of accesses it separates. Here's a common categorizationof barrier types that maps pretty well to specific instructions (sometimesno-ops) on existing processors:

LoadLoad Barriers

The sequence: Load1; LoadLoad; Load2
ensures that Load1's data are loaded before data accessed by Load2 and allsubsequent load instructions are loaded. In general, explicit LoadLoadbarriers are needed on processors that perform speculative loads and/orout-of-order processing in which waiting load instructions can bypass waitingstores. On processors that guarantee to always preserve load ordering, thebarriers amount to no-ops.

StoreStore Barriers

The sequence: Store1; StoreStore; Store2
ensures that Store1's data are visible to other processors (i.e., flushed tomemory) before the data associated with Store2 and all subsequent storeinstructions. In general, StoreStore barriers are needed on processors that do not otherwise guarantee strictordering of flushes from write buffers and/or caches to other processors ormain memory.

LoadStore Barriers

The sequence: Load1; LoadStore; Store2
ensures that Load1's data are loaded before all data associated with Store2 andsubsequent store instructions are flushed. LoadStore barriers areneeded only on those out-of-order procesors in which waiting store instructionscan bypass loads.

StoreLoad Barriers

The sequence: Store1; StoreLoad; Load2
ensures that Store1's data are made visible to other processors (i.e., flushedto main memory) before data accessed by Load2 and all subsequent loadinstructions are loaded. StoreLoad barriers protect against a subsequent load incorrectly using Store1's datavalue rather than that from a more recent store to the same location performedby a different processor. Because of this, on the processors discussed below, a StoreLoad is strictlynecessary only for separating stores from subsequent loads of the same location(s) as were stored before thebarrier. StoreLoad barriers are needed on nearly all recent multiprocessors, and are usuallythe most expensive kind. Part of the reason they are expensive is that theymust disable mechanisms that ordinarily bypass cache to satisfy loads fromwrite-buffers. This might be implemented by letting the buffer fully flush,among other possible stalls.

On all processorsdiscussed below, it turns out that instructions that perform StoreLoad also obtain theother three barrier effects, so StoreLoad can serve as a general-purpose (but usually expensive) Fence. (This is an empirical fact, not a necessity.) The opposite doesn't holdthough. It is NOT usually the casethat issuing any combination of other barriers gives the equivalent of a StoreLoad.

The followingtable shows how these barriers correspond to JSR-133 ordering rules.

Required barriers

2nd operation

1st operation

Normal Load

Normal Store

Volatile Load 
MonitorEnter

Volatile Store 
MonitorExit

Normal Load

LoadStore

Normal Store

StoreStore

Volatile Load 
MonitorEnter

LoadLoad

LoadStore

LoadLoad

LoadStore

Volatile Store 
MonitorExit

StoreLoad

StoreStore

Plus the specialfinal-field rule requiring a StoreStore barrier in
      x.finalField = v; StoreStore; sharedRef = x;

Here's an exampleshowing placements.

Java

Instructions

class X {
  int a, b;
  volatile int v, u;
  void f() {
    int i, j;
   
    i = a;
    j = b;
    i = v;
   
    j = u;
   
    a = i;
    b = j;
   
    v = i;
   
    u = j;
   
    i = u;
   
   
    j = b;
    a = i;
  }
}
      







load a
load b
load v
   LoadLoad
load u
   LoadStore
store a
store b
   StoreStore
store v
   StoreStore
store u
   StoreLoad
load u
   LoadLoad
   LoadStore
load b
store a

Data Dependencyand Barriers

The need for LoadLoad and LoadStore barriers on someprocessors interacts with their ordering guarantees for dependent instructions.On some (most) processors, a load or store that is dependent on the value of aprevious load are ordered by the processor without need for an explicitbarrier. This commonly arises in two kinds of cases, indirection:
      Load x; Load x.field
and control
      Load x; if (predicate(x)) Load or Store y;

Processors that do NOT respect indirection ordering in particularrequire barriers for final field access for references initially obtainedthrough shared references:
      x = sharedRef; ... ; LoadLoad; i = x.finalField;

Conversely, asdiscussed below, processors that DO respect datadependencies provide several opportunities to optimize away LoadLoad and LoadStore barrierinstructions that would otherwise need to be issued. (However, dependency does NOT automatically remove the need for StoreLoad barriers on anyprocessor.)

Interactions withAtomic Instructions

The kinds ofbarriers needed on different processors further interact with implementation ofMonitorEnter and MonitorExit. Locking and/or unlocking usually entail the useof atomic conditional update operations CompareAndSwap (CAS) orLoadLinked/StoreConditional (LL/SC) that have the semantics of performing avolatile load followed by a volatile store. While CAS or LL/SC minimallysuffice, some processors also support other atomic instructions (for example,an unconditional exchange) that can sometimes be used instead of or inconjunction with atomic conditional updates.

On all processors,atomic operations protect against read-after-write problems for the locationsbeing read/updated. (Otherwise standard loop-until-success constructionswouldn't work in the desired way.) But processors differ in whether atomicinstructions provide more general barrier properties than the implicit StoreLoad for their targetlocations. On some processors these instructions also intrinsically performbarriers that would otherwise be needed for MonitorEnter/Exit; on others someor all of these barriers must be specifically issued.

Volatiles andMonitors have to be separated to disentangle these effects, giving:

Required Barriers

2nd operation

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值