synchronization performance optimization in jdk6 官方文档

Java SE 6 includes several new features and enhancements to improve performance in many areas of the platform. Improvements include: synchronization performance optimizations, compiler performance optimizations, the new Parallel Compaction Collector, better ergonomics for the Concurrent Low Pause Collector and application start-up performance.


2.1   Runtime performance optimizations

2.1.1 Biased locking   ——偏向锁



Biased Locking is a class of optimizations that improves uncontended synchronization performance by eliminating atomic operations associated with the Java language’s synchronization primitives. These optimizations rely on the property that not only are most monitors uncontended, they are locked by at most one thread during their lifetime.
偏向锁是一类提高不存在竞争的锁的性能的优化方式。这些优化依赖于monitors大多数在不竞争状态,而且总是被一个线程锁住。

An object is "biased" toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations can be performed by that thread without using atomic operations resulting in much better performance, particularly on multiprocessor machines. 
一个对象偏向于第一个获取到该对象的monitor的线程,获取monitor的方式可以是monitorenter指令或者synchronized 方法调用;后续与monitor相关的操作可以不使用原子操作而继续由该线程执行,从而导致更好的性能,特别是在多处理器机器上。

Locking attempts by threads other than the one toward which the object is "biased" will cause a relatively expensive operation whereby the bias is revoked. The benefit of the elimination of atomic operations must exceed the penalty of revocation for this optimization to be profitable. 
如果有不是该对象偏向的线程的线程尝试获取锁,将会撤销偏向锁,而这需要一个相对昂贵的操作。消除原子操作的好处必须超过撤销偏向锁的坏处,才能让这个优化有利可图。

Applications with substantial amounts of uncontended synchronization may attain significant speedups while others with certain patterns of locking may see slowdowns. 
具有大量的锁不存在竞争状态的应用程序可能会获得显著的加速,而其他具有某些锁定模式的应用程序可能会看到速度减慢。

Biased Locking is enabled by default in Java SE 6 and later. To disable Biased Locking, please add to the command line -XX:-UseBiasedLocking .


For more on Biased Locking, please refer to the ACM OOPSLA 2006 paper by Kenneth Russell and David Detlefs: " Eliminating Synchronization-Related Atomic Operations with Biased Locking and Bulk Rebiasing".


2.1.2  Lock coarsening


There are some patterns of locking where a lock is released and then reacquired within a piece of code where no observable operations occur in between. The lock coarsening optimization technique implemented in hotspot eliminates the unlock and relock operations in those situations (when a lock is released and then reacquired with no meaningful work done in between those operations). It basically reduces the amount of synchronization work by enlarging an existing synchronized region. Doing this around a loop could cause a lock to be held for long periods of times, so the technique is only used on non-looping control flow.


This feature is on by default. To disable it, please add the following option to the command line: -XX:-EliminateLocks

2.1.3  Adaptive spinning


Adaptive spinning is an optimization technique where a two-phase spin-then-block strategy is used by threads attempting a contended synchronized enter operation. This technique enables threads to avoid undesirable effects that impact performance such as context switching and repopulation of Translation Lookaside Buffers (TLBs). It is “adaptive" because the duration of the spin is determined by policy decisions based on factors such as the rate of success and/or failure of recent spin attempts on the same monitor and the state of the current lock owner.

转载自:oracle官网 java SE6性能优化

======================================

转载自:Synchronization optimizations   

Whenever mutable variables are shared between threads, you must use synchronization to ensure that updates made by one thread are visible to other threads on a timely basis. The primary means of synchronization is the use of synchronized blocks, which provide both mutual exclusion and visibility guarantees. (Other forms of synchronization include volatile variables, the Lock objects injava.util.concurrent.locks, and atomic variables.) When two threads want to access a shared mutable variable, not only must both threads use synchronization, but if they are using synchronizedblocks, those synchronized blocks must use the same lock object.

In practice, locking falls into two categories: mostly contended and mostly uncontended. Locks that are mostly contended are the "hot" locks in an application, such as the locks that guard the shared work queue for a thread pool. The data protected by these locks is needed constantly by many threads, and you can expect that when you go to acquire this lock, you may have to wait until someone else is done with it. Mostly uncontended locks are those that guard data that is not accessed so frequently, so that most of the time when a thread goes to acquire the lock, no other thread is holding that lock. Most locks are not frequently contended, so improving the performance of uncontended locking can improve overall application performance substantially.

The JVM has separate code paths for contended ("slow path") and uncontended ("fast path") lock acquisition. A lot of effort has already gone into optimizing the fast path; Mustang further improves both the fast path and the slow path and adds a number of optimizations that can eliminate some locking entirely.

Lock elision

The Java Memory Model says that one thread exiting a synchronized block happens-before another thread enters a synchronized block protected by that same lock; this means that whatever memory operations are visible to thread A when it exits a synchronized block protected by lock M are visible to thread B when it enters a synchronized block protected by M, as shown in Figure 1. For synchronizedblocks that use different locks, we cannot assume anything about their ordering -- it is as if there was no synchronization at all.

Figure 1. Synchronization and visibility in the Java Memory Model
Synchronization and visibility in the Java Memory Model

It stands to reason, then, that if a thread enters a synchronized block protected by a lock that no other thread will ever synchronize on, then that synchronization has no effect and can therefore be removed by the optimizer. (The Java Language Specification explicitly allows this optimization.) This scenario may sound unlikely, but there are cases when this is clear to the compiler. Listing 1 shows a simplified example of a thread-local lock object:

Listing 1. Using a thread-local object as a lock
1
2
3
synchronized (new Object()) {
     doSomething();
}

Because the reference to the lock object disappears before any other thread could possibly use it, the compiler can tell that the above synchronization can be removed because it is impossible for two threads to synchronize using the same lock. While no one would ever directly use the idiom in Listing 1, this code is very similar to the case where the lock associated with a synchronized block can be proven to be a thread-local variable. "Thread-local" doesn't necessarily mean that it is implemented with the ThreadLocal class; it could be any variable that the compiler can prove is never accessed by another thread. Objects referenced by local variables and that never escape from their defining scope meet this test -- if an object is confined to the stack of some thread, no other thread can ever see a reference to that object. (The only way an object can be shared across threads is if a reference to it is published to the heap.)

As luck would have it, escape analysis, which we discussed last month, provides the compiler with exactly the information it needs to optimize away synchronized blocks that use thread-local lock objects. If the compiler can prove (using escape analysis) that an object is never published to the heap, it must be a thread-local object and therefore any synchronized blocks that use that object as a lock will have no effect under the Java Memory Model (JMM) and can be eliminated. This optimization is called lock elision and is another of the JVM optimizations slated for Mustang.

The use of locking with thread-local objects happens more often than you might think. There are many classes, such as StringBuffer and java.util.Random, which are thread-safe because they might be used from multiple threads, but they are often used in a thread-local manner.

Consider the code in Listing 2, which uses a Vector in the course of building up a string value. ThegetStoogeNames() method creates a Vector, adds several strings to it, and then calls toString() to convert it into a string. Each of the calls to one of the Vector methods -- three calls to add() and one to toString() -- requires acquiring and releasing the lock on the Vector. While all of the lock acquisitions will be uncontended and therefore fast, the compiler can actually eliminate the synchronization entirely using lock elision.

Listing 2. Candidate for lock elision
1
2
3
4
5
6
7
public String getStoogeNames() {
      Vector v = new Vector();
      v.add("Moe");
      v.add("Larry");
      v.add("Curly");
      return v.toString();
}

Because no reference to the Vector ever escapes the getStoogeNames() method, it is necessarily thread-local and therefore any synchronized block that uses it as a lock will have no effect under the JMM. The compiler can inline the calls to the add() and toString() methods, and then it will recognize that it is acquiring and releasing a lock on a thread-local object and can optimize away all four lock-unlock operations.

I've stated in the past that trying to avoid synchronization is generally a bad idea. The arrival of optimizations such as lock elision adds yet one more reason to avoid trying to eliminate synchronization -- the compiler can do it automatically where it is safe, and leave it in place otherwise.

Adaptive locking

In addition to escape analysis and lock elision, Mustang has some other optimizations for locking performance as well. When two threads contend for a lock, one of them will get the lock and the other will have to block until the lock is available. There are two obvious techniques for implementing blocking; having the operating system suspend the thread until it is awakened later or using spin locks. A spin lock basically amounts to the following code:

1
2
while (lockStillInUse)
     ;

While spin locks are CPU-intensive and appear inefficient, they can be more efficient than suspending the thread and subsequently waking it up if the lock in question is held for a very short time. There is significant overhead to suspending and rescheduling a thread, which involves work by the JVM, operating system, and hardware. This presents a problem for JVM implementers: if locks are only held for a short time, then spinning is more efficient; if locks are held for a long time, then suspension is more efficient. Without information on what the distribution of lock times will be for a given application, most JVMs are conservative and simply suspend a thread when it fails to acquire a lock.

However, it is possible to do better. For each lock, the JVM can adaptively choose between spinning and suspension based on the behavior of past acquires. It can try spinning and, if that succeeds after a short period of time, keep using that approach. If, on the other hand, a certain amount of spinning fails to acquire the lock, it can decide that this is a lock that is generally held for "a long time" and recompile the method to use only suspension. This decision can be made on a per-lock or per-lock-site basis.

The result of this adaptive approach is better overall performance. After the JVM has had some time to acquire some profiling information on lock usage patterns, it can use spinning for the locks that are generally held for a short time and suspension for the locks that are generally held for a long time. An optimization like this would not be possible in a statically compiled environment because the information about lock usage patterns is not available at static compilation time.

Lock coarsening

Another optimization that can be used to reduce the cost of locking is lock coarsening. Lock coarsening is the process of merging adjacent synchronized blocks that use the same lock object. If the compiler cannot eliminate the locking using lock elision, it may be able to reduce the overhead by using lock coarsening.

The code in Listing 3 is not necessarily a candidate for lock elision (although after inliningaddStoogeNames(), the JVM might still be able to elide the locking), but it could still benefit from lock coarsening. The three calls to add() alternate between acquiring the lock on the Vector, doing something, and releasing the lock. The compiler can observe that there is a sequence of adjacent blocks that operate on the same lock, and merge them into a single block.

Listing 3. Candidate for lock coarsening
1
2
3
4
5
public void addStooges(Vector v) {
      v.add("Moe");
      v.add("Larry");
      v.add("Curly");
}

Not only does this reduce the locking overhead, but by merging the blocks, it turns the contents of the three method calls into one big synchronized block, giving the optimizer a much larger basic block to work with. So lock coarsening can also enable other optimizations that have nothing to do with locking.

Even if there are other statements between the synchronized blocks or method calls, the compiler can still perform lock coarsening. The compiler is allowed to move statements into a synchronizedblock -- just not out of it. So if there were intervening statements between the calls to add(), the compiler could still turn the whole of addStooges() into one big synchronized block that uses theVector as the lock object.

Lock coarsening may entail a trade-off between performance and responsiveness. By combining the three lock-unlock pairs into a single lock-unlock pair, performance is improved by reducing instruction count and reducing the amount of synchronization traffic on the memory bus. The cost is that the duration for which the lock is held may be lengthened, increasing the amount of time other threads might be locked out and also increasing the probability of lock contention. However, the lock is held for a relatively short period of time in either case, and the compiler can apply heuristics based on the length of the code protected by synchronization to make a reasonable trade-off here. (And depending on what other optimizations are enabled by the larger basic block, the duration for which the lock is held may not even be lengthened under coarsening.)

Conclusion

Uncontended locking performance has improved with nearly every JVM version. Mustang continues this trend by improving both the raw performance of uncontended and contended locking and introducing optimizations that can eliminate many lock operations.



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值