JVM对synchronized的优化——锁膨胀
前言
- 通常我们在synchronized(…)中传一个对象,即可实现加锁,非常简单。而使用锁,常见的问题就是效率慢。在较早期的Java中synchronized会直接调用系统的重量级互斥锁(monitor)来实现加锁,效率较慢。锁膨胀则是针对该问题的优化方案:先由JVM自己管理锁,如果不行才调用系统的重量级锁(由无锁升级为偏向锁,再升级为轻量级锁,最后升级为重量级锁)。
synchronized字节码操作
- synchronized代码示例
public class Demo01 { public static void main(String[] args) { synchronized ("lockObj") { System.out.println("Demo01.main"); } } }
- 执行 javap -c .\Demo01.class,反编译生成字节码操作
- main方法的字节码操作
public static void main(java.lang.String[]); Code: 0: ldc #2 // String lockObj 2: dup 3: astore_1 4: monitorenter 5: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream; 8: ldc #4 // String Demo01.main 10: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 13: aload_1 14: monitorexit 15: goto 23 18: astore_2 19: aload_1 20: monitorexit 21: aload_2 22: athrow 23: return
- 锁的指令
- monitorenter 加锁
- monitorexit 释放锁
锁标志位、偏向信息
- 锁膨胀的原理:通过修改对象头的锁标志位、偏向信息等,从而标识出对象的不同锁状态
- 锁状态信息存在于对象头中,包括以下几种:
- 无锁:新建一个对象的默认状态
- 偏向锁:只需比较 Thread ID,适用于单个线程重入
- 轻量级锁:CAS自旋,速度快,但存在CPU空转问题、线程长时间获取不到锁问题、CPU高速缓存的频繁同步问题等
- 重量级锁:需调用系统级互斥锁(mutex/monitor),效率低
- GC标记:由markSweep使用,标记一个对象为无效状态
- 看到这里,你应该明白了“为什么synchronized(…)中只能传对象,不能传基础数据类型?”(基础数据类型不是对象,没有对象头,也就没有锁信息)
锁膨胀的流程
Tips:建议对着MarkWord结构图看
- 执行synchronized同步块,JVM进行优化。
- —> 单个线程A获取锁,使用CAS修改对象的锁信息(偏向线程id),[无锁]变为[偏向锁]
- 修改锁对象 MarkWord 中的Thread ID为该线程
- 锁对象 MarkWord 中的偏向值转为1
- —> 线程A再次获取锁,因为是偏向锁,所以非常快
- —> 线程A运行一会儿后,退出了同步代码块
- —> 线程B过来获取锁,因为锁记录的偏向线程id是线程A,所以进行CAS修改会失败,此时会暂停(安全点)原偏向线程A并检测线程A的状态。因为A之前已退出同步代码块,此时需要线程A释放锁(即修改锁对象为无锁状态)。
- —> 一会儿后,线程B拿到了偏向锁
- —> —> 线程C开始过来抢锁,判断锁的偏向线程,显然CAS修改会失败,同时,若线程B仍然存活(未退出同步代码块),原偏向线程B会将 [偏向锁]将升级为[轻量级锁]
- 在 Thread 栈上建立 LockRecord
- 拷贝锁对象的 MarkWord 到 Thread 栈上的 LockRecord
- CAS 替换锁对象的 MarkWord 的 LockRecord 指针 (即指向该 Thread 的 LockRecord)
- —> —> 线程C持续进行CAS自旋获取执行权(即替换 MarkWord 的 LockRecord 指针),如果成功,获得轻量级锁(CAS自旋存在CPU空转问题)
- —> —> —> 如果线程C持续进行CAS自旋超过n次(早期版本是 -XX:PreBlockSpin=10,新的是自适应自旋次数),[轻量级锁]将升级为[重量级锁]
- 锁对象 MarkWord 中的 LockRecord 指针指向系统的重量级锁monitor
- Thread 进入系统的EntryList队列,等待系统调用,获取到锁
- 关于膨胀流程,博客dreamtobe有一份示意图如下
为什么要进行锁膨胀?较轻的锁就一定好?
- 首先,由低级别的锁逐步升级到高级别的锁,可以优先使用较轻的锁,一般来说效率更好。
- 但是,较轻的锁也有自身的问题:
- 偏向锁,锁标志位是01(和无锁状态一样),通过偏向信息来确定情况,只适合同一线程重入锁的情况(这个时候效率非常高)。当其他线程过来获取锁时,如果竞争激烈,需要经常执行偏向锁撤销与升级为偏向锁的操作,效率较慢。因此,遇到多个线程同时竞争时,需要升级锁,以便于提高效率。
- 轻量级锁,锁标志位是00,通过CAS自旋实现,效率也不错。但CAS自旋存在浪费CPU性能问题,为避免浪费性能,所以需要规定达到某个自旋次数后(自适应),不再自旋。此时,就需要转为重量级锁。
- 因此,不能认为较轻的锁(偏向锁、轻量级锁)性能就一定好,应该根据情况达到一个合适的点
- JVM参数
- 是否使用偏向锁:默认值 -XX:+UseBiasedLocking
附:OpenJDK8 源码
- 对象锁状态 hotspot/src/share/vm/oops/markOop.hpp
enum { locked_value = 0, // 偏向锁 unlocked_value = 1, // 无锁 monitor_value = 2, // 重量级锁 marked_value = 3, // GC标记 biased_lock_pattern = 5 // 偏向锁 };
- CAS操作 hotspot/src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp (windows同理)
// int 的 CAS操作 inline jint Atomic::cmpxchg (jint exchange_value, volatile jint* dest, jint compare_value) { int mp = os::is_MP(); __asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)" : "=a" (exchange_value) : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp) : "cc", "memory"); return exchange_value; } // long 的 CAS操作 inline jlong Atomic::cmpxchg (jlong exchange_value, volatile jlong* dest, jlong compare_value) { bool mp = os::is_MP(); __asm__ __volatile__ (LOCK_IF_MP(%4) "cmpxchgq %1,(%3)" : "=a" (exchange_value) : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp) : "cc", "memory"); return exchange_value; }
- 偏向锁 hotspot/src/share/vm/runtime/synchronizer.cpp
// ----------------------------------------------------------------------------- // Fast Monitor Enter/Exit // This the fast monitor enter. The interpreter and compiler use // some assembly copies of this code. Make sure update those code // if the following function is changed. The implementation is // extremely sensitive to race condition. Be careful. void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock, bool attempt_rebias, TRAPS) { if (UseBiasedLocking) { // JVM参数: -XX:+UseBiasedLocking (是否使用偏向锁) if (!SafepointSynchronize::is_at_safepoint()) { // 获取偏向锁 BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD); if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) { return; } } else { assert(!attempt_rebias, "can not rebias toward VM thread"); BiasedLocking::revoke_at_safepoint(obj); } assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now"); } slow_enter (obj, lock, THREAD) ; }
- 轻量级锁 hotspot/src/share/vm/runtime/synchronizer.cpp
// ----------------------------------------------------------------------------- // Interpreter/Compiler Slow Case // This routine is used to handle interpreter/compiler slow case // We don't need to use fast path here, because it must have been // failed in the interpreter/compiler code. void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) { markOop mark = obj->mark(); // 获取对象的mark assert(!mark->has_bias_pattern(), "should not see bias pattern here"); if (mark->is_neutral()) { // 是否为无锁状态 // Anticipate successful CAS -- the ST of the displaced mark must // be visible <= the ST performed by the CAS. lock->set_displaced_header(mark); // 将mark保存到线程的的锁记录中 // 使用CAS操作将对象的锁记录指针指向 mark if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj()->mark_addr(), mark)) { TEVENT (slow_enter: release stacklock) ; return ; } // Fall through to inflate() ... } else if (mark->has_locker() && THREAD->is_lock_owned((address)mark->locker())) { // 如果处于加锁状态,且mark指向当前线程,继续执行同步代码 assert(lock != mark->locker(), "must not re-lock the same lock"); assert(lock != (BasicLock*)obj->mark(), "don't relock with same BasicLock"); lock->set_displaced_header(NULL); return; } #if 0 // The following optimization isn't particularly useful. if (mark->has_monitor() && mark->monitor()->is_entered(THREAD)) { lock->set_displaced_header (NULL) ; return ; } #endif // The object header will never be displaced to this lock, // so it does not matter what the value is, except that it // must be non-zero to avoid looking like a re-entrant lock, // and must not look locked either. lock->set_displaced_header(markOopDesc::unused_mark()); ObjectSynchronizer::inflate(THREAD, obj())->enter(THREAD); }
- 锁膨胀函数
ObjectMonitor * ATTR ObjectSynchronizer::inflate (Thread * Self, oop object) { // Inflate mutates the heap ... // Relaxing assertion for bug 6320749. assert (Universe::verify_in_progress() || !SafepointSynchronize::is_at_safepoint(), "invariant") ; for (;;) { // 自旋 const markOop mark = object->mark() ; assert (!mark->has_bias_pattern(), "invariant") ; // The mark can be in one of the following states: // * Inflated - just return // * Stack-locked - coerce it to inflated // * INFLATING - busy wait for conversion to complete // * Neutral - aggressively inflate the object. // * BIASED - Illegal. We should never see this // CASE: inflated if (mark->has_monitor()) { // 是否已经是重量级锁 ObjectMonitor * inf = mark->monitor() ; // 如果是,那就获取到monitor,然后返回它 assert (inf->header()->is_neutral(), "invariant"); assert (inf->object() == object, "invariant") ; assert (ObjectSynchronizer::verify_objmon_isinpool(inf), "monitor is invalid"); return inf ; } // CASE: inflation in progress - inflating over a stack-lock. // Some other thread is converting from stack-locked to inflated. // Only that thread can complete inflation -- other threads must wait. // The INFLATING value is transient. // Currently, we spin/yield/park and poll the markword, waiting for inflation to finish. // We could always eliminate polling by parking the thread on some auxiliary list. // 如果处于膨胀中,等待完成(被其他线程执行膨胀) if (mark == markOopDesc::INFLATING()) { TEVENT (Inflate: spin while INFLATING) ; ReadStableMark(object) ; continue ; // 继续自旋 } // CASE: stack-locked // Could be stack-locked either by this thread or by some other thread. // // Note that we allocate the objectmonitor speculatively, _before_ attempting // to install INFLATING into the mark word. We originally installed INFLATING, // allocated the objectmonitor, and then finally STed the address of the // objectmonitor into the mark. This was correct, but artificially lengthened // the interval in which INFLATED appeared in the mark, thus increasing // the odds of inflation contention. // // We now use per-thread private objectmonitor free lists. // These list are reprovisioned from the global free list outside the // critical INFLATING...ST interval. A thread can transfer // multiple objectmonitors en-mass from the global free list to its local free list. // This reduces coherency traffic and lock contention on the global free list. // Using such local free lists, it doesn't matter if the omAlloc() call appears // before or after the CAS(INFLATING) operation. // See the comments in omAlloc(). if (mark->has_locker()) { // 是否是轻量级锁 ObjectMonitor * m = omAlloc (Self) ; // Optimistically prepare the objectmonitor - anticipate successful CAS // We do this before the CAS in order to minimize the length of time // in which INFLATING appears in the mark. m->Recycle(); m->_Responsible = NULL ; m->OwnerIsThread = 0 ; m->_recursions = 0 ; m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ; // Consider: maintain by type/class // 通过CAS操作标识为正在膨胀中 markOop cmp = (markOop) Atomic::cmpxchg_ptr (markOopDesc::INFLATING(), object->mark_addr(), mark) ; if (cmp != mark) { omRelease (Self, m, true) ; // 失败,继续重试自旋(已经被其他线程标识膨胀了) continue ; // Interference -- just retry } // We've successfully installed INFLATING (0) into the mark-word. // This is the only case where 0 will appear in a mark-work. // Only the singular thread that successfully swings the mark-word // to 0 can perform (or more precisely, complete) inflation. // // Why do we CAS a 0 into the mark-word instead of just CASing the // mark-word from the stack-locked value directly to the new inflated state? // Consider what happens when a thread unlocks a stack-locked object. // It attempts to use CAS to swing the displaced header value from the // on-stack basiclock back into the object header. Recall also that the // header value (hashcode, etc) can reside in (a) the object header, or // (b) a displaced header associated with the stack-lock, or (c) a displaced // header in an objectMonitor. The inflate() routine must copy the header // value from the basiclock on the owner's stack to the objectMonitor, all // the while preserving the hashCode stability invariants. If the owner // decides to release the lock while the value is 0, the unlock will fail // and control will eventually pass from slow_exit() to inflate. The owner // will then spin, waiting for the 0 value to disappear. Put another way, // the 0 causes the owner to stall if the owner happens to try to // drop the lock (restoring the header from the basiclock to the object) // while inflation is in-progress. This protocol avoids races that might // would otherwise permit hashCode values to change or "flicker" for an object. // Critically, while object->mark is 0 mark->displaced_mark_helper() is stable. // 0 serves as a "BUSY" inflate-in-progress indicator. // fetch the displaced mark from the owner's stack. // The owner can't die or unwind past the lock while our INFLATING // object is in the mark. Furthermore the owner can't complete // an unlock on the object, either. markOop dmw = mark->displaced_mark_helper() ; assert (dmw->is_neutral(), "invariant") ; // Setup monitor fields to proper values -- prepare the monitor m->set_header(dmw) ; // Optimization: if the mark->locker stack address is associated // with this thread we could simply set m->_owner = Self and // m->OwnerIsThread = 1. Note that a thread can inflate an object // that it has stack-locked -- as might happen in wait() -- directly // with CAS. That is, we can avoid the xchg-NULL .... ST idiom. m->set_owner(mark->locker()); m->set_object(object); // TODO-FIXME: assert BasicLock->dhw != 0. // Must preserve store ordering. The monitor state must // be stable at the time of publishing the monitor address. guarantee (object->mark() == markOopDesc::INFLATING(), "invariant") ; object->release_set_mark(markOopDesc::encode(m)); // Hopefully the performance counters are allocated on distinct cache lines // to avoid false sharing on MP systems ... if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ; TEVENT(Inflate: overwrite stacklock) ; if (TraceMonitorInflation) { if (object->is_instance()) { ResourceMark rm; tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s", (void *) object, (intptr_t) object->mark(), object->klass()->external_name()); } } return m ; } // CASE: neutral // TODO-FIXME: for entry we currently inflate and then try to CAS _owner. // If we know we're inflating for entry it's better to inflate by swinging a // pre-locked objectMonitor pointer into the object header. A successful // CAS inflates the object *and* confers ownership to the inflating thread. // In the current implementation we use a 2-step mechanism where we CAS() // to inflate and then CAS() again to try to swing _owner from NULL to Self. // An inflateTry() method that we could call from fast_enter() and slow_enter() // would be useful. // 无锁状态 assert (mark->is_neutral(), "invariant"); ObjectMonitor * m = omAlloc (Self) ; // prepare m for installation - set monitor to initial state m->Recycle(); m->set_header(mark); m->set_owner(NULL); m->set_object(object); m->OwnerIsThread = 1 ; m->_recursions = 0 ; m->_Responsible = NULL ; m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ; // consider: keep metastats by type/class if (Atomic::cmpxchg_ptr (markOopDesc::encode(m), object->mark_addr(), mark) != mark) { m->set_object (NULL) ; m->set_owner (NULL) ; m->OwnerIsThread = 0 ; m->Recycle() ; omRelease (Self, m, true) ; m = NULL ; continue ; // interference - the markword changed - just retry. // The state-transitions are one-way, so there's no chance of // live-lock -- "Inflated" is an absorbing state. } // Hopefully the performance counters are allocated on distinct // cache lines to avoid false sharing on MP systems ... if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ; TEVENT(Inflate: overwrite neutral) ; if (TraceMonitorInflation) { if (object->is_instance()) { ResourceMark rm; tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s", (void *) object, (intptr_t) object->mark(), object->klass()->external_name()); } } return m ; } }
- 关于自适应自旋
- 请查阅hotspot/src/share/vm/runtime/objectMonitor.cpp 中的 _SpinDuration