JVM对synchronized的优化——锁膨胀

蒋含竹

已于 2023-08-14 16:44:41 修改

阅读量2.1k

点赞数 1

分类专栏： Java JVM 文章标签： java synchronized 锁膨胀锁标志位 jvm

于 2019-12-27 21:37:55 首次发布

本文链接：https://blog.csdn.net/alionsss/article/details/103738483

版权

Java 同时被 2 个专栏收录

61 篇文章 1 订阅

订阅专栏

JVM

6 篇文章 0 订阅

订阅专栏

文章目录

JVM对synchronized的优化——锁膨胀

JVM对synchronized的优化——锁膨胀

前言

通常我们在synchronized(…)中传一个对象，即可实现加锁，非常简单。而使用锁，常见的问题就是效率慢。在较早期的Java中synchronized会直接调用系统的重量级互斥锁(monitor)来实现加锁，效率较慢。锁膨胀则是针对该问题的优化方案：先由JVM自己管理锁，如果不行才调用系统的重量级锁（由无锁升级为偏向锁，再升级为轻量级锁，最后升级为重量级锁）。

synchronized字节码操作

synchronized代码示例

public class Demo01 {

    public static void main(String[] args) {
        synchronized ("lockObj") {
            System.out.println("Demo01.main");
        }
    }

}

执行 javap -c .\Demo01.class，反编译生成字节码操作

main方法的字节码操作

public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                  // String lockObj
       2: dup
       3: astore_1
       4: monitorenter
       5: getstatic     #3                  // Field java/lang/System.out:Ljava/io/PrintStream;
       8: ldc           #4                  // String Demo01.main
      10: invokevirtual #5                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      13: aload_1
      14: monitorexit
      15: goto          23
      18: astore_2
      19: aload_1
      20: monitorexit
      21: aload_2
      22: athrow
      23: return

锁的指令
- monitorenter 加锁
- monitorexit 释放锁

锁标志位、偏向信息

锁膨胀的原理：通过修改对象头的锁标志位、偏向信息等，从而标识出对象的不同锁状态
锁状态信息存在于对象头中，包括以下几种：
- 无锁：新建一个对象的默认状态
- 偏向锁：只需比较 Thread ID，适用于单个线程重入
- 轻量级锁：CAS自旋，速度快，但存在CPU空转问题、线程长时间获取不到锁问题、CPU高速缓存的频繁同步问题等
- 重量级锁：需调用系统级互斥锁(mutex/monitor)，效率低
- GC标记：由markSweep使用，标记一个对象为无效状态
看到这里，你应该明白了“为什么synchronized(…)中只能传对象，不能传基础数据类型？”（基础数据类型不是对象，没有对象头，也就没有锁信息）

锁膨胀的流程

Tips：建议对着MarkWord结构图看

执行synchronized同步块，JVM进行优化。
—> 单个线程A获取锁，使用CAS修改对象的锁信息（偏向线程id），[无锁]变为[偏向锁]
- 修改锁对象 MarkWord 中的Thread ID为该线程
- 锁对象 MarkWord 中的偏向值转为1
—> 线程A再次获取锁，因为是偏向锁，所以非常快
—> 线程A运行一会儿后，退出了同步代码块
—> 线程B过来获取锁，因为锁记录的偏向线程id是线程A，所以进行CAS修改会失败，此时会暂停（安全点）原偏向线程A并检测线程A的状态。因为A之前已退出同步代码块，此时需要线程A释放锁（即修改锁对象为无锁状态）。
—> 一会儿后，线程B拿到了偏向锁
—> —> 线程C开始过来抢锁，判断锁的偏向线程，显然CAS修改会失败，同时，若线程B仍然存活（未退出同步代码块），原偏向线程B会将 [偏向锁]将升级为[轻量级锁]
- 在 Thread 栈上建立 LockRecord
- 拷贝锁对象的 MarkWord 到 Thread 栈上的 LockRecord
- CAS 替换锁对象的 MarkWord 的 LockRecord 指针 (即指向该 Thread 的 LockRecord)
—> —> 线程C持续进行CAS自旋获取执行权（即替换 MarkWord 的 LockRecord 指针），如果成功，获得轻量级锁(CAS自旋存在CPU空转问题)
—> —> —> 如果线程C持续进行CAS自旋超过n次（早期版本是 -XX:PreBlockSpin=10，新的是自适应自旋次数），[轻量级锁]将升级为[重量级锁]
- 锁对象 MarkWord 中的 LockRecord 指针指向系统的重量级锁monitor
- Thread 进入系统的EntryList队列，等待系统调用，获取到锁
关于膨胀流程，博客dreamtobe有一份示意图如下

为什么要进行锁膨胀？较轻的锁就一定好？

首先，由低级别的锁逐步升级到高级别的锁，可以优先使用较轻的锁，一般来说效率更好。
但是，较轻的锁也有自身的问题：
- 偏向锁，锁标志位是01（和无锁状态一样），通过偏向信息来确定情况，只适合同一线程重入锁的情况（这个时候效率非常高）。当其他线程过来获取锁时，如果竞争激烈，需要经常执行偏向锁撤销与升级为偏向锁的操作，效率较慢。因此，遇到多个线程同时竞争时，需要升级锁，以便于提高效率。
- 轻量级锁，锁标志位是00，通过CAS自旋实现，效率也不错。但CAS自旋存在浪费CPU性能问题，为避免浪费性能，所以需要规定达到某个自旋次数后（自适应），不再自旋。此时，就需要转为重量级锁。
因此，不能认为较轻的锁（偏向锁、轻量级锁）性能就一定好，应该根据情况达到一个合适的点
JVM参数
- 是否使用偏向锁：默认值 -XX:+UseBiasedLocking

附：OpenJDK8 源码

对象锁状态 hotspot/src/share/vm/oops/markOop.hpp

 enum { locked_value             = 0, // 偏向锁
        unlocked_value           = 1, // 无锁
        monitor_value            = 2, // 重量级锁
        marked_value             = 3, // GC标记
        biased_lock_pattern      = 5  // 偏向锁
 };

CAS操作 hotspot/src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp （windows同理）

// int 的 CAS操作
inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  int mp = os::is_MP();
  __asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
                    : "cc", "memory");
  return exchange_value;
}

// long 的 CAS操作
inline jlong    Atomic::cmpxchg    (jlong    exchange_value, volatile jlong*    dest, jlong    compare_value) {
  bool mp = os::is_MP();
  __asm__ __volatile__ (LOCK_IF_MP(%4) "cmpxchgq %1,(%3)"
                        : "=a" (exchange_value)
                        : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
                        : "cc", "memory");
  return exchange_value;
}

偏向锁 hotspot/src/share/vm/runtime/synchronizer.cpp

// -----------------------------------------------------------------------------
//  Fast Monitor Enter/Exit
// This the fast monitor enter. The interpreter and compiler use
// some assembly copies of this code. Make sure update those code
// if the following function is changed. The implementation is
// extremely sensitive to race condition. Be careful.

void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock, bool attempt_rebias, TRAPS) {
 if (UseBiasedLocking) { // JVM参数: -XX:+UseBiasedLocking (是否使用偏向锁)
    if (!SafepointSynchronize::is_at_safepoint()) {
      // 获取偏向锁
      BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD);
      if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) {
        return;
      }
    } else {
      assert(!attempt_rebias, "can not rebias toward VM thread");
      BiasedLocking::revoke_at_safepoint(obj);
    }
    assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
 }

 slow_enter (obj, lock, THREAD) ;
}

轻量级锁 hotspot/src/share/vm/runtime/synchronizer.cpp

// -----------------------------------------------------------------------------
// Interpreter/Compiler Slow Case
// This routine is used to handle interpreter/compiler slow case
// We don't need to use fast path here, because it must have been
// failed in the interpreter/compiler code.
void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
  markOop mark = obj->mark(); // 获取对象的mark
  assert(!mark->has_bias_pattern(), "should not see bias pattern here");

  if (mark->is_neutral()) { // 是否为无锁状态
    // Anticipate successful CAS -- the ST of the displaced mark must
    // be visible <= the ST performed by the CAS.
    lock->set_displaced_header(mark); // 将mark保存到线程的的锁记录中
    // 使用CAS操作将对象的锁记录指针指向 mark
    if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj()->mark_addr(), mark)) {
      TEVENT (slow_enter: release stacklock) ;
      return ;
    }
    // Fall through to inflate() ...
  } else
  if (mark->has_locker() && THREAD->is_lock_owned((address)mark->locker())) {
    // 如果处于加锁状态，且mark指向当前线程，继续执行同步代码
    assert(lock != mark->locker(), "must not re-lock the same lock");
    assert(lock != (BasicLock*)obj->mark(), "don't relock with same BasicLock");
    lock->set_displaced_header(NULL);
    return;
  }

#if 0
  // The following optimization isn't particularly useful.
  if (mark->has_monitor() && mark->monitor()->is_entered(THREAD)) {
    lock->set_displaced_header (NULL) ;
    return ;
  }
#endif

  // The object header will never be displaced to this lock,
  // so it does not matter what the value is, except that it
  // must be non-zero to avoid looking like a re-entrant lock,
  // and must not look locked either.
  lock->set_displaced_header(markOopDesc::unused_mark());
  ObjectSynchronizer::inflate(THREAD, obj())->enter(THREAD);
}

锁膨胀函数

ObjectMonitor * ATTR ObjectSynchronizer::inflate (Thread * Self, oop object) {
  // Inflate mutates the heap ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;

  for (;;) { // 自旋
      const markOop mark = object->mark() ;
      assert (!mark->has_bias_pattern(), "invariant") ;

      // The mark can be in one of the following states:
      // *  Inflated     - just return
      // *  Stack-locked - coerce it to inflated
      // *  INFLATING    - busy wait for conversion to complete
      // *  Neutral      - aggressively inflate the object.
      // *  BIASED       - Illegal.  We should never see this

      // CASE: inflated
      if (mark->has_monitor()) { // 是否已经是重量级锁
          ObjectMonitor * inf = mark->monitor() ; // 如果是，那就获取到monitor，然后返回它
          assert (inf->header()->is_neutral(), "invariant");
          assert (inf->object() == object, "invariant") ;
          assert (ObjectSynchronizer::verify_objmon_isinpool(inf), "monitor is invalid");
          return inf ;
      }

      // CASE: inflation in progress - inflating over a stack-lock.
      // Some other thread is converting from stack-locked to inflated.
      // Only that thread can complete inflation -- other threads must wait.
      // The INFLATING value is transient.
      // Currently, we spin/yield/park and poll the markword, waiting for inflation to finish.
      // We could always eliminate polling by parking the thread on some auxiliary list.
      // 如果处于膨胀中，等待完成（被其他线程执行膨胀）
      if (mark == markOopDesc::INFLATING()) {
         TEVENT (Inflate: spin while INFLATING) ;
         ReadStableMark(object) ;
         continue ; // 继续自旋
      }

      // CASE: stack-locked
      // Could be stack-locked either by this thread or by some other thread.
      //
      // Note that we allocate the objectmonitor speculatively, _before_ attempting
      // to install INFLATING into the mark word.  We originally installed INFLATING,
      // allocated the objectmonitor, and then finally STed the address of the
      // objectmonitor into the mark.  This was correct, but artificially lengthened
      // the interval in which INFLATED appeared in the mark, thus increasing
      // the odds of inflation contention.
      //
      // We now use per-thread private objectmonitor free lists.
      // These list are reprovisioned from the global free list outside the
      // critical INFLATING...ST interval.  A thread can transfer
      // multiple objectmonitors en-mass from the global free list to its local free list.
      // This reduces coherency traffic and lock contention on the global free list.
      // Using such local free lists, it doesn't matter if the omAlloc() call appears
      // before or after the CAS(INFLATING) operation.
      // See the comments in omAlloc().

      if (mark->has_locker()) { // 是否是轻量级锁
          ObjectMonitor * m = omAlloc (Self) ;
          // Optimistically prepare the objectmonitor - anticipate successful CAS
          // We do this before the CAS in order to minimize the length of time
          // in which INFLATING appears in the mark.
          m->Recycle();
          m->_Responsible  = NULL ;
          m->OwnerIsThread = 0 ;
          m->_recursions   = 0 ;
          m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ;   // Consider: maintain by type/class
          
          // 通过CAS操作标识为正在膨胀中
          markOop cmp = (markOop) Atomic::cmpxchg_ptr (markOopDesc::INFLATING(), object->mark_addr(), mark) ;
          if (cmp != mark) {
             omRelease (Self, m, true) ;
             // 失败，继续重试自旋（已经被其他线程标识膨胀了）
             continue ;       // Interference -- just retry
          }

          // We've successfully installed INFLATING (0) into the mark-word.
          // This is the only case where 0 will appear in a mark-work.
          // Only the singular thread that successfully swings the mark-word
          // to 0 can perform (or more precisely, complete) inflation.
          //
          // Why do we CAS a 0 into the mark-word instead of just CASing the
          // mark-word from the stack-locked value directly to the new inflated state?
          // Consider what happens when a thread unlocks a stack-locked object.
          // It attempts to use CAS to swing the displaced header value from the
          // on-stack basiclock back into the object header.  Recall also that the
          // header value (hashcode, etc) can reside in (a) the object header, or
          // (b) a displaced header associated with the stack-lock, or (c) a displaced
          // header in an objectMonitor.  The inflate() routine must copy the header
          // value from the basiclock on the owner's stack to the objectMonitor, all
          // the while preserving the hashCode stability invariants.  If the owner
          // decides to release the lock while the value is 0, the unlock will fail
          // and control will eventually pass from slow_exit() to inflate.  The owner
          // will then spin, waiting for the 0 value to disappear.   Put another way,
          // the 0 causes the owner to stall if the owner happens to try to
          // drop the lock (restoring the header from the basiclock to the object)
          // while inflation is in-progress.  This protocol avoids races that might
          // would otherwise permit hashCode values to change or "flicker" for an object.
          // Critically, while object->mark is 0 mark->displaced_mark_helper() is stable.
          // 0 serves as a "BUSY" inflate-in-progress indicator.


          // fetch the displaced mark from the owner's stack.
          // The owner can't die or unwind past the lock while our INFLATING
          // object is in the mark.  Furthermore the owner can't complete
          // an unlock on the object, either.
          markOop dmw = mark->displaced_mark_helper() ;
          assert (dmw->is_neutral(), "invariant") ;

          // Setup monitor fields to proper values -- prepare the monitor
          m->set_header(dmw) ;

          // Optimization: if the mark->locker stack address is associated
          // with this thread we could simply set m->_owner = Self and
          // m->OwnerIsThread = 1. Note that a thread can inflate an object
          // that it has stack-locked -- as might happen in wait() -- directly
          // with CAS.  That is, we can avoid the xchg-NULL .... ST idiom.
          m->set_owner(mark->locker());
          m->set_object(object);
          // TODO-FIXME: assert BasicLock->dhw != 0.

          // Must preserve store ordering. The monitor state must
          // be stable at the time of publishing the monitor address.
          guarantee (object->mark() == markOopDesc::INFLATING(), "invariant") ;
          object->release_set_mark(markOopDesc::encode(m));

          // Hopefully the performance counters are allocated on distinct cache lines
          // to avoid false sharing on MP systems ...
          if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ;
          TEVENT(Inflate: overwrite stacklock) ;
          if (TraceMonitorInflation) {
            if (object->is_instance()) {
              ResourceMark rm;
              tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s",
                (void *) object, (intptr_t) object->mark(),
                object->klass()->external_name());
            }
          }
          return m ;
      }

      // CASE: neutral
      // TODO-FIXME: for entry we currently inflate and then try to CAS _owner.
      // If we know we're inflating for entry it's better to inflate by swinging a
      // pre-locked objectMonitor pointer into the object header.   A successful
      // CAS inflates the object *and* confers ownership to the inflating thread.
      // In the current implementation we use a 2-step mechanism where we CAS()
      // to inflate and then CAS() again to try to swing _owner from NULL to Self.
      // An inflateTry() method that we could call from fast_enter() and slow_enter()
      // would be useful.
		
	  // 无锁状态
      assert (mark->is_neutral(), "invariant");
      ObjectMonitor * m = omAlloc (Self) ;
      // prepare m for installation - set monitor to initial state
      m->Recycle();
      m->set_header(mark);
      m->set_owner(NULL);
      m->set_object(object);
      m->OwnerIsThread = 1 ;
      m->_recursions   = 0 ;
      m->_Responsible  = NULL ;
      m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ;       // consider: keep metastats by type/class

      if (Atomic::cmpxchg_ptr (markOopDesc::encode(m), object->mark_addr(), mark) != mark) {
          m->set_object (NULL) ;
          m->set_owner  (NULL) ;
          m->OwnerIsThread = 0 ;
          m->Recycle() ;
          omRelease (Self, m, true) ;
          m = NULL ;
          continue ;
          // interference - the markword changed - just retry.
          // The state-transitions are one-way, so there's no chance of
          // live-lock -- "Inflated" is an absorbing state.
      }

      // Hopefully the performance counters are allocated on distinct
      // cache lines to avoid false sharing on MP systems ...
      if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ;
      TEVENT(Inflate: overwrite neutral) ;
      if (TraceMonitorInflation) {
        if (object->is_instance()) {
          ResourceMark rm;
          tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s",
            (void *) object, (intptr_t) object->mark(),
            object->klass()->external_name());
        }
      }
      return m ;
  }
}

关于自适应自旋
- 请查阅hotspot/src/share/vm/runtime/objectMonitor.cpp 中的 _SpinDuration

蒋含竹

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
JVM对synchronized的优化——锁膨胀

文章目录Java对synchronized的优化——锁膨胀前言锁标志位锁膨胀的流程Java对synchronized的优化——锁膨胀前言通常我们在synchronized(…)中传一个对象，即可实现加锁，非常简单。而使用锁，常见的问题就是效率慢。在较早期的Java中synchronized会直接调用系统的重量级互斥锁(monitor)来实现加锁，效率较慢。锁膨胀则是针对该问题的优化方案：由...
复制链接

扫一扫