jvm blog

https://gist.github.com/pandening/8b941997851ff6bec29d83a9af821602 

 

首选想探索一下GC是怎么开始工作的,或者说,GC到底是以什么样的方式在工作的;java应用在启动的时候会创建一个jvm进程,JVM内部通过调用create_vm来实现,该方法做了大量的工作来创建一个jvm进程,并且将java应用的main方法启动起来,运行在main线程中(主线程);在create_vm中,有一个地方值得关注,下面是thread.cpp中create_vm方法的代码片段:

  // Create the VMThread
  { TraceTime timer("Start VMThread", TRACETIME_LOG(Info, startuptime));

  VMThread::create();
    Thread* vmthread = VMThread::vm_thread();

    if (!os::create_thread(vmthread, os::vm_thread)) {
      vm_exit_during_initialization("Cannot create VM thread. "
                                    "Out of system resources.");
    }

    // Wait for the VM thread to become ready, and VMThread::run to initialize
    // Monitors can have spurious returns, must always check another state flag
    {
      MutexLocker ml(Notify_lock);
      os::start_thread(vmthread);
      while (vmthread->active_handles() == NULL) {
        Notify_lock->wait();
      }
    }
  }

VMThread是一种特殊的jvm线程,用于执行比如GC等操作,java代码的Thread和JVM里面的JavaThread对应,这一点后续再研究;上面的代码片段首先关注【VMThread::create()】这个函数调用,在VMThread.cpp中实现了该函数:

void VMThread::create() {
  assert(vm_thread() == NULL, "we can only allocate one VMThread");
  _vm_thread = new VMThread();

  // Create VM operation queue
  _vm_queue = new VMOperationQueue();
  guarantee(_vm_queue != NULL, "just checking");

  _terminate_lock = new Monitor(Mutex::safepoint, "VMThread::_terminate_lock", true,
                                Monitor::_safepoint_check_never);

  if (UsePerfData) {
    // jvmstat performance counters
    Thread* THREAD = Thread::current();
    _perf_accumulated_vm_operation_time =
                 PerfDataManager::create_counter(SUN_THREADS, "vmOperationTime",
                                                 PerfData::U_Ticks, CHECK);
  }
}

create函数在new了一个VMThread对象实例同时,为该VMThread创建了一个VMOperationQueue,VMThread有一个重要的成员叫_vm_queue,看看它的定义:

  static VMOperationQueue* _vm_queue;           // Queue (w/ policy) of VM operations

根据注释可以将该queue理解为是VMThread的任务队列,但是队列内部存放的任务都是VMOperation,不能是其他类型的任务,那VMOperation是什么呢?其实有一个基类叫VM_Operation,有一个子类叫VM_GC_Operation,就是专门来做GC的任务,在对象申请内存分配失败的时候会生成一个VM_CollectForAllocation任务来做GC,_vm_queue队列就是用来存储这些任务的,VMThread会不断来check该队列是否有任务需要执行,这种工作模式类似于特殊的线程池,这个线程池只有一个VMThread,_vm_queue就是线程池中的任务队列;
VMThread创建完成之后,create_vm函数将等到VMThread启动成功,判断VMThread是否已经正常工作的标准是vmthread->active_handles() == NULL为true,也就是执行了VMThread的run函数,这里说明一下,JVM的线程实现方法是将一个os线程绑定到JVM线程上,所以每创建一个JVM线程都需要创建一个os线程来做绑定,不同环境下创建os线程的方法不一样,比如在Mac下,就是使用bsd的方法来创建os线程的;回头看create_vm函数里面的那段代码片段,可以看到使用了os::create_thread(vmthread, os::vm_thread)来创建了一个os线程,在os_bsd.cpp内部的create_thread函数里面,可以看到创建了一个os线程,并且将该os线程绑定到了创建好的VMThread上,几乎在所有的os上创建线程的同时需要指定一个方法入口,使得os在创建好了线程之后可以准备执行相应的代码,可以在create_thread方法里面看到下面的代码片段:

  pthread_t tid;
    int ret = pthread_create(&tid, &attr, (void* (*)(void*)) thread_native_entry, thread);

thread_native_entry就是上面提到的代码入口,可以在thread_native_entry函数内部看到执行了VMThread的run方法,到此create_vm函数可以继续执行;
接下来再回过头来看看VMThread的run函数,该函数执行了一些线程初始化的工作,比如设置线程名称,线程优先级等,然后执行了一个关键的方法:loop,该方法可以理解为VMThread将不断轮询来从自己的任务队列_vm_queue中获取任务来执行,下面来仔细研究一下loop函数的关键步骤。

  • (1)、通过remove_next方法获取任务,如果当前任务队列中没有待执行的任务,那么remove_next函数会返回NULL,下面是remove_next函数的具体实现
VM_Operation* VMOperationQueue::remove_next() {
  // Assuming VMOperation queue is two-level priority queue. If there are
  // more than two priorities, we need a different scheduling algorithm.
  assert(SafepointPriority == 0 && MediumPriority == 1 && nof_priorities == 2,
         "current algorithm does not work");

  // simple counter based scheduling to prevent starvation of lower priority
  // queue. -- see 4390175
  int high_prio, low_prio;
  if (_queue_counter++ < 10) {
      high_prio = SafepointPriority;
      low_prio  = MediumPriority;
  } else {
      _queue_counter = 0;
      high_prio = MediumPriority;
      low_prio  = SafepointPriority;
  }

  return queue_remove_front(queue_empty(high_prio) ? low_prio : high_prio);
}

VM_Operation* VMOperationQueue::queue_remove_front(int prio) {
  if (queue_empty(prio)) return NULL;
  assert(_queue_length[prio] >= 0, "sanity check");
  _queue_length[prio]--;
  VM_Operation* r = _queue[prio]->next();
  assert(r != _queue[prio], "cannot remove base element");
  unlink(r);
  return r;
}
  • (2)、如果发现任务队列中没有待执行的任务,那么VMThread不能一直傻傻的轮询啊,就会让自己进入等待状态

2018-11-10 10 28 02

  • (3)、在(2)步骤中等待多时之后,VMThread可能会被一些任务填充线程唤醒(notify),这个时候loop函数就会继续执行接下来的代码,有一些Operation任务要求在safe_point执行,比如FullGC,使用SafepointSynchronize::begin()和SafepointSynchronize::end()可以达到这个目的,就像下面这样:
SafepointSynchronize::begin()

/// safe point code 

SafepointSynchronize::end();

无论如何,接下来就是要执行队列中取出来的任务了,所以evaluate_operation(_cur_vm_operation)方法应该是我们接下来应该关注的;在evaluate_operation函数内部看到了调用了evaluate()函数,接着看看evaluate函数;

void VM_Operation::evaluate() {
  ResourceMark rm;
  outputStream* debugstream;
  bool enabled = log_is_enabled(Debug, vmoperation);
  if (enabled) {
    debugstream = Log(vmoperation)::debug_stream();
    debugstream->print("begin ");
    print_on_error(debugstream);
    debugstream->cr();
  }
  doit();
  if (enabled) {
    debugstream->print("end ");
    print_on_error(debugstream);
    debugstream->cr();
  }
}

关键的是doit()函数,这里面就是具体的任务执行内容,不同的Operation的doit内容都是不一样的,就算是GC_Opertion,还是有多种不同的方式的,比如上面提到了VM_GenCollectForAllocation的doit内容做的工作就是这样的:

void VM_GenCollectForAllocation::doit() {
  SvcGCMarker sgcm(SvcGCMarker::MINOR);

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  GCCauseSetter gccs(gch, _gc_cause);
  _result = gch->satisfy_failed_allocation(_word_size, _tlab);
  assert(gch->is_in_reserved_or_null(_result), "result not in heap");

  if (_result == NULL && GCLocker::is_active_and_needs_gc()) {
    set_gc_locked();
  }
}

gch->satisfy_failed_allocation就是为了解决空间分配失败的,去看satisfy_failed_allocation函数的注释,可以看到:

  // Callback from VM_GenCollectForAllocation operation.
  // This function does everything necessary/possible to satisfy an
  // allocation request that failed in the youngest generation that should
  // have handled it (including collection, expansion, etc.)
  HeapWord* satisfy_failed_allocation(size_t size, bool is_tlab);

这个函数会被VM_GenCollectForAllocation执行的时候回调,也就是doit函数执行的时候调用这个函数,这个函数会做类似于垃圾收集,堆扩展等工作来满足一个"allocation request";当然,回调这个函数之前必然已经尝试进行空间分配申请了,并且已经失败了,所以该函数需要极尽所能去做工作来腾出空间(申请新的空间)来满足已经失败的空间分配申请;collectorPolicy类实现了垃圾收集的策略,所谓垃圾收集策略就是应该在什么时候做GC,做什么类型的GC等,参考价值很大;下面可以试着来看一下satisfy_failed_allocation函数具体是怎么做的;

2018-11-10 10 52 12

从上面这张图可以看到,如果发现gc_lock是活动的,也就说明已经有其他的线程触发了GC,那么这个时候策略就是扩展堆来满足内存申请。

2018-11-10 10 52 28

看if条件,如果增量GC是安全的,那么就执行增量安全,所谓增量GC,就是按照从轻到重的程度来做垃圾回收,大概分这么几个级别,首先是进行一次MinorGC,其次是进行一次FullGC,最后是进行一次带soft reference清理的FullGC;上面的图片对应的是第一种情况,进行一次MinorGC,然后尝试申请空间,如果成功就打住了,否则就要进行一次清理soft reference的FullGC了,硕大soft reference,可以大概说一下,java中的引用分四个级别,strong reference > soft reference > weak reference > phantom reference;强度梯度下降,strong reference只要对象还在被引用就不会被回收,而soft reference就不一样了,JVM在尝试进行GC来解决内存不足的状况下,如果发现还是无法满足内存申请,那么就会将这部分引用类型的对象回收回来,所以,在使用soft reference的时候不应该强依赖于对象,因为不知道什么时候就被回收了,这种引用可以用在缓存的场景中;weak reference的强度比soft弱一些,它只能存活到下次GC发生,而phantom reference就更弱了,弱到你根本无法获取到一个phantom reference对象,它唯一的作用就是可以在发生GC的时候告诉你它已经被回收了;下面的代码展示了进行FullGC的两种情况:

2018-11-10 10 52 28

2018-11-10 10 53 25

** 总结一下再allocate fail的应对策略,首先判断是否有其他线程触发了GC操作,如果是的话则不会进行GC操作,而是尝试去扩展堆来解决allocate fail,否则判断是否可以进行增量GC,如果可以,那么执行一次MinorGC,否则执行一次不回收soft reference的FullGC,之后判断是否可以解决allocate fail了,如果可以了就到此打住,否则进行一次彻底的FullGC,也就是将soft reference也回收回来 **

其实在(3)的时候,漏掉一个细节,VMThread会将任务队列中填充好的任务都执行完成,才会继续执行接下来的代码;最后希望能看一下到底在什么地方会将任务填充到VMThread的任务队列中去;还是拿VM_GenCollectForAllocation来说,可以在collectorPolicy.cpp中的mem_allocate_work看到执行了类似下面的代码:

    VM_GenCollectForAllocation op(size, is_tlab, gc_count_before);
    VMThread::execute(&op);

然后又回来VMThread看execute方法,可以看到下面的细节:

2018-11-10 11 21 14

到此,GC任务是怎么运行的大概梳理了一下,具体的GC细节还是需要再梳理。

@pandening

OwnerAuthor

pandening commented on 11 Nov 2018 • 

edited 

这个comment希望能分析一下GenCollectedHeap::do_collection这个函数的具体执行流程,根据函数名字可以猜测该函数实现的功能就是做垃圾回收,下面是它的方法声明(声明和定义是有区别的,声明仅仅是告诉别人有这样一个函数,而定义则是说这个函数具体实现了什么功能):

  // Helper function for two callbacks below.
  // Considers collection of the first max_level+1 generations.
  void do_collection(bool           full,
                     bool           clear_all_soft_refs,
                     size_t         size,
                     bool           is_tlab,
                     GenerationType max_generation);

参数full代表是否是FullGC,clear_all_soft_refs参数表示是否要回收sort reference,size参数需要多说明一点,在一些情况下,GC发生是因为发送了"Allocate Fail",这个size就代表了申请分配的内存大小;is_tlab表示是否使用 TLAB(线程分配Buffer,可以避免多线程在堆上并发申请内存),max_generation参数表示最大的回收代,只有两种类型,YoungGen或者OldGen;下面来仔细分析一下这个函数。

  • (1)、首先是做一些基本的校验,比如是否在safe_point,是否是GC线程访问该函数,以及是否已经有其他的线程触发了GC,这些条件都需要满足才能执行接下来的代码。

2018-11-11 10 04 16

  • (2)、接下来需要做一些GC策略的生成,主要是判断是否回收soft reference对象,是否收集Young或者Old区域等。complete表示是否收集整个堆,old_collects_young表示是否在收集老年代的同时收集新生代,也就是是否有必要收集新生代,JVM参数ScavengeBeforeFullGC控制是否在FullGC前做一次YoungGC,如果设置了该参数,那在收集old区的时候就没有必要再回收young区了;do_young_collection表示是否需要对young区域进行垃圾收集,判断标准就是young区域确实需要回收了,也就是进行YoungGC,

2018-11-11 10 07 02

  • (3)、现在,知道该回收哪些区域了,那么接下来就去回收需要回收的区域,如果do_young_collection是true的,那么就执行YoungGC,collect_generation函数是具体的执行某个区域垃圾回收的入口,待会再来分析这个函数的具体流程;接着也判读oldGen是否需要回收,如果需要的话也进行回收。

2018-11-11 10 22 01

2018-11-11 10 25 21

  • (4)、垃圾收集完成之后,需要计算各个分代的大小因为GC之后堆可能会扩展,所以需要重新计算一下各个分代的大小,重新计算大小通过调用函数compute_new_size实现,该函数需要调整各个分代的各种指针,使得堆扩展后各个分代依然可以正常工作。

下面,来分析上面几个步骤中出现的一些关键函数,首先是should_collect函数,该函数用于判断某一个分代是否需要做垃圾回收下面来看看该方法的细节

  // Returns "true" iff collect() should subsequently be called on this
  // this generation. See comment below.
  // This is a generic implementation which can be overridden.
  //
  // Note: in the current (1.4) implementation, when genCollectedHeap's
  // incremental_collection_will_fail flag is set, all allocations are
  // slow path (the only fast-path place to allocate is DefNew, which
  // will be full if the flag is set).
  // Thus, older generations which collect younger generations should
  // test this flag and collect if it is set.
  virtual bool should_collect(bool   full,
                              size_t word_size,
                              bool   is_tlab) {
    return (full || should_allocate(word_size, is_tlab));
  }

如果是FullGC,那么无论哪个分代都应该被回收,如果不是FullGC,那么就使用should_allocate函数继续判断是否需要在该分代进行收集,比如对于DefNew(Serial GC下新生代)分代来说,其具体实现就如下:

  // Allocation support
  virtual bool should_allocate(size_t word_size, bool is_tlab) {
    assert(UseTLAB || !is_tlab, "Should not allocate tlab");

    size_t overflow_limit    = (size_t)1 << (BitsPerSize_t - LogHeapWordSize);

    const bool non_zero      = word_size > 0;
    const bool overflows     = word_size >= overflow_limit;
    const bool check_too_big = _pretenure_size_threshold_words > 0;
    const bool not_too_big   = word_size < _pretenure_size_threshold_words;
    const bool size_ok       = is_tlab || !check_too_big || not_too_big;

    bool result = !overflows &&
                  non_zero   &&
                  size_ok;

    return result;
  }

接着一个重要的函数就是collect_generation,这个函数将回收给定的分代中的垃圾,主要看下面的这段代码片段:

// Do collection work
  {
    // Note on ref discovery: For what appear to be historical reasons,
    // GCH enables and disabled (by enqueing) refs discovery.
    // In the future this should be moved into the generation's
    // collect method so that ref discovery and enqueueing concerns
    // are local to a generation. The collect method could return
    // an appropriate indication in the case that notification on
    // the ref lock was needed. This will make the treatment of
    // weak refs more uniform (and indeed remove such concerns
    // from GCH). XXX

    HandleMark hm;  // Discard invalid handles created during gc
    save_marks();   // save marks for all gens
    // We want to discover references, but not process them yet.
    // This mode is disabled in process_discovered_references if the
    // generation does some collection work, or in
    // enqueue_discovered_references if the generation returns
    // without doing any work.
    ReferenceProcessor* rp = gen->ref_processor();
    // If the discovery of ("weak") refs in this generation is
    // atomic wrt other collectors in this configuration, we
    // are guaranteed to have empty discovered ref lists.
    if (rp->discovery_is_atomic()) {
      rp->enable_discovery();
      rp->setup_policy(clear_soft_refs);
    } else {
      // collect() below will enable discovery as appropriate
    }
    gen->collect(full, clear_soft_refs, size, is_tlab);
    if (!rp->enqueuing_is_done()) {
      rp->enqueue_discovered_references();
    } else {
      rp->set_enqueuing_is_done(false);
    }
    rp->verify_no_references_recorded();
  }

接着看gen->collect函数调用,这里面就是做具体的垃圾收集工作,比如下面分析在DefNew分代中的gen->collect实现。

  • (1)、DefNew是Serial GC下的新生代,首先它要判断是否有必要让老年代来做这次GC,使用collection_attempt_is_safe函数来做这个判断,也就是判断出收集该区域是否是安全的,所谓安全的,就是DefNew分代收集了之后,old 区域是否可以完整的将这次Minor GC之后晋升的对象安置起来,如果不能的话,那DefNew就举得自己做GC是不安全的,应该让老年代来做GC,这也是最合适的选择,老年代会做一次规模宏大的GC,并且做一些内存规整的工作,避免新生代中晋升上来的大对象无法找到连续的空间放置,当然,老年代GC实现上几乎都包含"整理"的阶段,这也是为什么老年代发生GC耗时是新生代GC的10倍的原因之一,新生代使用copying算法,是一种非常快速的收集算法,当然也得益于新生代中的对象寿命都比较短,不像老年代中的对象寿命较长,当然,这也是分代的意义所在;

2018-11-11 11 06 01

collection_attempt_is_safe函数的实现如下:

bool DefNewGeneration::collection_attempt_is_safe() {
  if (!to()->is_empty()) {
    log_trace(gc)(":: to is not empty ::");
    return false;
  }
  if (_old_gen == NULL) {
    GenCollectedHeap* gch = GenCollectedHeap::heap();
    _old_gen = gch->old_gen();
  }
  return _old_gen->promotion_attempt_is_safe(used());
}

正常来说,用区域内两个survivor中有一个区域总是空闲的,但是在某些情况下也会发生意外,使得两个survivor都不为空,这种情况是有可能发生的,首先DefNew在进行YoungGC之后,会将Eden + From中存活的对象拷贝到To中去,并且将一些符合晋升要求的对象拷贝到old区域中去,然后调换两个survivor的角色,所以按理来说其中某个survivor区域总是空的,但是这是在YoungGC顺利完成的情况,在发生"promotion failed"的时候就不会去清理From和To,这一点在后续会再次说明;但是肯定的是,如果To区域不为空,那么就说明前一次YoungGC并不是很顺利,此时DefNew就举得没必要再冒险去做一次可能没啥用处的Minor GC,因为有可能Minor GC之后需要出发一次Full GC来解决某些难题,所以DefNew基于自己的历史GC告诉Old去做一些较为彻底的GC工作时必要的;如果没有发生"promotion fail"这种不愉快的事情,那么接下来就让old区自己判断是否允许本次Minor GC的发生,也就是_old_gen->promotion_attempt_is_safe的调用,下面来看看该函数的具体实现;

bool TenuredGeneration::promotion_attempt_is_safe(size_t max_promotion_in_bytes) const {
  size_t available = max_contiguous_available();
  size_t av_promo  = (size_t)gc_stats()->avg_promoted()->padded_average();
  bool   res = (available >= av_promo) || (available >= max_promotion_in_bytes);

  log_trace(gc)("Tenured: promo attempt is%s safe: available(" SIZE_FORMAT ") %s av_promo(" SIZE_FORMAT "), max_promo(" SIZE_FORMAT ")",
    res? "":" not", available, res? ">=":"<", av_promo, max_promotion_in_bytes);

  return res;
}

老年代也会看历史数据,如果发现老年代的最大连续空间大小大于新生代历史晋升的平均大小或者新生代中存活的对象,那么老年代就认为本次Minor GC是安全的,没必要做一次Full GC;当然这是有一些冒险的成分的,如果某一次minorGC发生之后符合晋升条件的对象大小远远大小评价晋升大小,而且这个时候老年代连续空间小于这些符合晋升的对象大小的时候,悲剧就发生了,也就是上面说到的"promotion fail",这个时候就要做一次Full GC。

  • (2)、接着关键的一个步骤就是进行对象存活判断,并且将存活的对象转移到正确的位置,比如To区域或者old区域;

2018-11-11 11 53 16

FastEvacuateFollowersClosure是一个递归的过程,Closure后缀代表 它是一个回调操作,所谓递归,就是在判断对象存活并且copying的工作是递归进行的,首先找到root objects,然后根据root objects去标记存活的对象,并且将它们转移到合适的区域中去;gch->young_process_roots做的工作就是将root objects转移到其他空间去的函数:

void GenCollectedHeap::young_process_roots(StrongRootsScope* scope,
                                           OopsInGenClosure* root_closure,
                                           OopsInGenClosure* old_gen_closure,
                                           CLDClosure* cld_closure) {
  MarkingCodeBlobClosure mark_code_closure(root_closure, CodeBlobToOopClosure::FixRelocations);

  process_roots(scope, SO_ScavengeCodeCache, root_closure, root_closure,
                cld_closure, cld_closure, &mark_code_closure);
  process_string_table_roots(scope, root_closure);

  if (!_process_strong_tasks->is_task_claimed(GCH_PS_younger_gens)) {
    root_closure->reset_generation();
  }

  // When collection is parallel, all threads get to cooperate to do
  // old generation scanning.
  old_gen_closure->set_generation(_old_gen);
  rem_set()->younger_refs_iterate(_old_gen, old_gen_closure, scope->n_threads());
  old_gen_closure->reset_generation();

  _process_strong_tasks->all_tasks_completed(scope->n_threads());
}

这里面关键的函数是process_roots,该函数会对设置的各种Closure进行回调,比如FastScanClosure,具体的回调工作将在Closure的do_oop_work进行:

// NOTE! Any changes made here should also be made
// in ScanClosure::do_oop_work()
template <class T> inline void FastScanClosure::do_oop_work(T* p) {
  T heap_oop = oopDesc::load_heap_oop(p);
  // Should we copy the obj?
  if (!oopDesc::is_null(heap_oop)) {
    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
    if ((HeapWord*)obj < _boundary) {
      assert(!_g->to()->is_in_reserved(obj), "Scanning field twice?");
      oop new_obj = obj->is_forwarded() ? obj->forwardee()
                                        : _g->copy_to_survivor_space(obj);
      oopDesc::encode_store_heap_oop_not_null(p, new_obj);
      if (is_scanning_a_klass()) {
        do_klass_barrier();
      } else if (_gc_barrier) {
        // Now call parent closure
        do_barrier(p);
      }
    }
  }
}

如果对象已经被复制过了,那么就不用再复制一次了,否则调用copy_to_survivor_space将该对象复制到to区域中去,下面是copy_to_survivor_space函数的具体实现:

oop DefNewGeneration::copy_to_survivor_space(oop old) {
  assert(is_in_reserved(old) && !old->is_forwarded(),
         "shouldn't be scavenging this oop");
  size_t s = old->size();
  oop obj = NULL;

  // Try allocating obj in to-space (unless too old)
  if (old->age() < tenuring_threshold()) {
    obj = (oop) to()->allocate_aligned(s);
  }

  // Otherwise try allocating obj tenured
  if (obj == NULL) {
    obj = _old_gen->promote(old, s);
    if (obj == NULL) {
      handle_promotion_failure(old);
      return old;
    }
  } else {
    // Prefetch beyond obj
    const intx interval = PrefetchCopyIntervalInBytes;
    Prefetch::write(obj, interval);

    // Copy obj
    Copy::aligned_disjoint_words((HeapWord*)old, (HeapWord*)obj, s);

    // Increment age if obj still in new generation
    obj->incr_age();
    age_table()->add(obj, s);
  }

  // Done, insert forward pointer to obj in this header
  old->forward_to(obj);

  return obj;
}

这个函数的流程大概是这样的:首先判断对象是否达到了晋升到老年代的年龄阈值,如果到了,那么就要将对象拷贝到老年代中去,否则就要将对象拷贝到to区域中去,这里面也包括一个细节,如果对象没有达到晋升老年代的年龄阈值,但是无法拷贝到To区域中去,那么也试图将对象晋升到老年代,也就是将对象提前晋升,晋升是有风险的,可能晋升失败,那么就要通过调用handle_promotion_failure来处理晋升失败的情况,如果对象成功拷贝到了To区域中来,那么就要将对象的年龄更新一下,最后,需要需要标记对象已经被转移,如果可能,那么就把老的对象清空吧;下面来先来看看promote函数,该函数用于将对象晋升到老年代:

// Ignores "ref" and calls allocate().
oop Generation::promote(oop obj, size_t obj_size) {
  assert(obj_size == (size_t)obj->size(), "bad obj_size passed in");

#ifndef PRODUCT
  if (GenCollectedHeap::heap()->promotion_should_fail()) {
    return NULL;
  }
#endif  // #ifndef PRODUCT

  HeapWord* result = allocate(obj_size, false);
  if (result != NULL) {
    Copy::aligned_disjoint_words((HeapWord*)obj, result, obj_size);
    return oop(result);
  } else {
    GenCollectedHeap* gch = GenCollectedHeap::heap();
    return gch->handle_failed_promotion(this, obj, obj_size);
  }
}

这个函数较为简单,首先通过allocate函数试图在老年代申请一块可以容纳对象的内存,如果成功了,那么就将对象复制到里面去,否则通过handle_failed_promotion函数来处理晋升失败的情况,晋升失败的前提下,handle_failed_promotion在handle_promotion_failure前执行,看起来都是处理晋升失败的情况,下面先看看handle_failed_promotion:

oop GenCollectedHeap::handle_failed_promotion(Generation* old_gen,
                                              oop obj,
                                              size_t obj_size) {
  guarantee(old_gen == _old_gen, "We only get here with an old generation");
  assert(obj_size == (size_t)obj->size(), "bad obj_size passed in");
  HeapWord* result = NULL;

  result = old_gen->expand_and_allocate(obj_size, false);

  if (result != NULL) {
    Copy::aligned_disjoint_words((HeapWord*)obj, result, obj_size);
  }
  return oop(result);
}

可以看到,oldGen将试图去扩展自己的堆空间来让更多的新生代对象可以成功晋升,但是很多情况下,堆空间被设置为不可扩展,这种情况下这个方法也就做了无用功,接着会调用handle_promotion_failure,调用handle_promotion_failure代表老年代也就明确告诉新生代无法将本次晋升的这个对象放置到老年代,来看看handle_promotion_failure会有什么对策:

void DefNewGeneration::handle_promotion_failure(oop old) {
  log_debug(gc, promotion)("Promotion failure size = %d) ", old->size());

  _promotion_failed = true;
  _promotion_failed_info.register_copy_failure(old->size());
  _preserved_marks_set.get()->push_if_necessary(old, old->mark());
  // forward to self
  old->forward_to(old);

  _promo_failure_scan_stack.push(old);

  if (!_promo_failure_drain_in_progress) {
    // prevent recursion in copy_to_survivor_space()
    _promo_failure_drain_in_progress = true;
    drain_promo_failure_scan_stack();
    _promo_failure_drain_in_progress = false;
  }
}

看起来DefNew还是比较乐观的,既然老年代容纳不了你,那么这个晋升的对象就还呆在新生代吧,说不定下次老年代发生GC就可以成功把它拷贝过去呢。这个时候_promotion_failed也被标记物为了true,这个标记之后会有用,发生"promotion fail"之后From区域可能存在一些对象没有成功晋升到老年代,但是又不是垃圾,这个时候From和To区域都不为空了,这是个难题。

接着,是时候执行递归标记&复制的过程了,也就是evacuate_followers.do_void(),这个过程是非常复杂的,下面来稍微看看这个函数:

void DefNewGeneration::FastEvacuateFollowersClosure::do_void() {
  do {
    _gch->oop_since_save_marks_iterate(GenCollectedHeap::YoungGen, _scan_cur_or_nonheap, _scan_older);
  } while (!_gch->no_allocs_since_save_marks());
  guarantee(_young_gen->promo_failure_scan_is_complete(), "Failed to finish scan");
}

不断使用oop_since_save_marks_iterate来做递归遍历的工作,结束条件是通过no_allocs_since_save_marks来决定的,下面是no_allocs_since_save_marks函数的具体实现:

bool GenCollectedHeap::no_allocs_since_save_marks() {
  return _young_gen->no_allocs_since_save_marks() &&
         _old_gen->no_allocs_since_save_marks();
}

看名字应该是说没有分配发生了,比如看看DefNew的no_allocs_since_save_marks函数实现:

bool DefNewGeneration::no_allocs_since_save_marks() {
  assert(eden()->saved_mark_at_top(), "Violated spec - alloc in eden");
  assert(from()->saved_mark_at_top(), "Violated spec - alloc in from");
  return to()->saved_mark_at_top();
}

top()指向To区域空闲空间的起点,上面已经说过的一个过程是将root objects先标记并且拷贝到To区域或者老年代,这个时候To区域内已经存在的对象是存活的,需要递归遍历这些对象引用的对象,然后也进行拷贝工作,saved_mark_at_top就是判断是否还在有对象呗拷贝到To区域中来,如果还有对象拷贝进来,那么就说明GC还没有完成,继续循环执行oop_since_save_marks_iterate,否则就可以停止了;下面来看看oop_since_save_marks_iterate函数的实现:

#define ContigSpace_OOP_SINCE_SAVE_MARKS_DEFN(OopClosureType, nv_suffix)  \
                                                                          \
void ContiguousSpace::                                                    \
oop_since_save_marks_iterate##nv_suffix(OopClosureType* blk) {            \
  HeapWord* t;                                                            \
  HeapWord* p = saved_mark_word();                                        \
  assert(p != NULL, "expected saved mark");                               \
                                                                          \
  const intx interval = PrefetchScanIntervalInBytes;                      \
  do {                                                                    \
    t = top();                                                            \
    while (p < t) {                                                       \
      Prefetch::write(p, interval);                                       \
      debug_only(HeapWord* prev = p);                                     \
      oop m = oop(p);                                                     \
      p += m->oop_iterate_size(blk);                                      \
    }                                                                     \
  } while (t < top());                                                    \
                                                                          \
  set_saved_mark_word(p);                                                 \
}

ALL_SINCE_SAVE_MARKS_CLOSURES(ContigSpace_OOP_SINCE_SAVE_MARKS_DEFN)

在深入下去的部分就比较复杂了,不再做分析,但是需要注意的一点是,DefNew在将存活对象复制到To区域的时候,Eden + From区域的对象是否存活不仅仅会看是否被To区域的对象引用,还会看老年代是否存在跨代引用新生代的对象的情况,这种情况也需要将存活的对象转到To或者老年代。

  • (3)、接下来需要对GC过程中发现的引用进行一些处理,比如是否回收soft reference,以及堆weak reference的回收等工作;

2018-11-11 1 03 35

  • (4)、到此GC工作大概已经完成了,接下来需要做一些收尾工作,如果发现在Minor GC的过程中发生了"promotion fail",那么就要做特殊的处理,younger_refs_iterate会将那些晋升失败的对象恢复回来,否则下一次发生Minor GC的时候会误以为这些对象已经被复制过了,但是他们确实没有被转移成功,这样的话,这些对象可能一直留在新生代,无论经历多少次GC都无法发生转移;

2018-11-11 1 05 45

2018-11-11 1 05 59

无论如何,新生代发生了GC,经过这次GC,需要转换From和To两个survivor的角色,swap_spaces函数实现了这个功能:

void DefNewGeneration::swap_spaces() {
  ContiguousSpace* s = from();
  _from_space        = to();
  _to_space          = s;
  eden()->set_next_compaction_space(from());
  // The to-space is normally empty before a compaction so need
  // not be considered.  The exception is during promotion
  // failure handling when to-space can contain live objects.
  from()->set_next_compaction_space(NULL);

  if (UsePerfData) {
    CSpaceCounters* c = _from_counters;
    _from_counters = _to_counters;
    _to_counters = c;
  }
}

这个函数较为简单,只是swap了一下From和To;再说一句,如果没有发生"Promotion Fail",那么在Minor GC之后,需要将From和Eden清空,因为没有发生晋升失败事件,就说明所以在新生代(Eden + From)存活的对象都安全的转移到了To或者老年代,所以可以清空,但是发生晋升失败意味着有部分存活的对象依然还留在原地等待,所以不能clear掉。

@pandening

OwnerAuthor

pandening commented on 11 Nov 2018 • 

edited 

DefNew的GC属于Minor GC,使用copying算法进行垃圾收集,是Serial GC(-XX:+UseSerialGC)的新生代部分,接下来分析一下Serial GC的老年代部分,也就是Serial Old;TenuredGeneration是Serial Old的堆实现,这里还是要说一下什么情况下可能会发生Old GC,在分析DefNew的时候提到了所谓的"空间分配担保",也就是YoungGen在即将进行Minor GC的时候,让OldGen判断一下是否可以进行这次Minor GC,判断的方法是OldGen可用的连续空间大于新生代的对象大小或者大于新生代历史晋升的平均大小,如果这个条件成立的话,那么Minor GC就会进行,否则就会进行一次Major GC;下面将以TenuredGeneration的实现来分析一下OldGC的实现细节。
TenuredGeneration使用标记-整理算法进行垃圾收集,包括标记、压缩、清理三个核心步骤,TenuredGeneration::collect是TenuredGeneration垃圾收集工作的入口:

void TenuredGeneration::collect(bool   full,
                                bool   clear_all_soft_refs,
                                size_t size,
                                bool   is_tlab) {
  GenCollectedHeap* gch = GenCollectedHeap::heap();

  // Temporarily expand the span of our ref processor, so
  // refs discovery is over the entire heap, not just this generation
  ReferenceProcessorSpanMutator
    x(ref_processor(), gch->reserved_region());

  STWGCTimer* gc_timer = GenMarkSweep::gc_timer();
  gc_timer->register_gc_start();

  SerialOldTracer* gc_tracer = GenMarkSweep::gc_tracer();
  gc_tracer->report_gc_start(gch->gc_cause(), gc_timer->gc_start());

  gch->pre_full_gc_dump(gc_timer);

  GenMarkSweep::invoke_at_safepoint(ref_processor(), clear_all_soft_refs);

  gch->post_full_gc_dump(gc_timer);

  gc_timer->register_gc_end();

  gc_tracer->report_gc_end(gc_timer->gc_end(), gc_timer->time_partitions());
}

主要关注GenMarkSweep::invoke_at_safepoint函数调用,这是整个TennredGeneration垃圾收集的核心,invoke_at_safepoint函数通过调用下面四个函数来做具体的垃圾收集工作。

  // Mark live objects
  static void mark_sweep_phase1(bool clear_all_softrefs);
  // Calculate new addresses
  static void mark_sweep_phase2();
  // Update pointers
  static void mark_sweep_phase3();
  // Move objects to new positions
  static void mark_sweep_phase4();

下面根据每个步骤分别来分析一下具体的GC过程。

  • (1)、首先是标记阶段:mark_sweep_phase1

2018-11-11 5 21 54

来看看full_process_roots函数的具体情况:

void GenCollectedHeap::full_process_roots(StrongRootsScope* scope,
                                          bool is_adjust_phase,
                                          ScanningOption so,
                                          bool only_strong_roots,
                                          OopsInGenClosure* root_closure,
                                          CLDClosure* cld_closure) {
  MarkingCodeBlobClosure mark_code_closure(root_closure, is_adjust_phase);
  OopsInGenClosure* weak_roots = only_strong_roots ? NULL : root_closure;
  CLDClosure* weak_cld_closure = only_strong_roots ? NULL : cld_closure;

  process_roots(scope, so, root_closure, weak_roots, cld_closure, weak_cld_closure, &mark_code_closure);
  if (is_adjust_phase) {
    // We never treat the string table as roots during marking
    // for the full gc, so we only need to process it during
    // the adjust phase.
    process_string_table_roots(scope, root_closure);
  }

  _process_strong_tasks->all_tasks_completed(scope->n_threads());
}

process_roots是需要重点关注的函数,这个函数将扫描出所有可以作为GCRoot的对象,扫描的地方非常多,可以参考下面这个代码片段:

2018-11-11 5 40 11

这个strong_roots就是上面提到的follow_root_closure,他负责标记存活的对象,去它对应的do_oop函数看看到底是怎么做的:

void MarkSweep::FollowRootClosure::do_oop(oop* p)       {
  follow_root(p);
}
template <class T> inline void MarkSweep::follow_root(T* p) {
  assert(!Universe::heap()->is_in_reserved(p),
         "roots shouldn't be things within the heap");
  T heap_oop = oopDesc::load_heap_oop(p);
  if (!oopDesc::is_null(heap_oop)) {
    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
    if (!obj->mark()->is_marked() &&
        !is_archive_object(obj)) {
      mark_object(obj);
      follow_object(obj);
    }
  }
  follow_stack();
}

对象是否被标记过时存储在对象头里面的,如果一个对象没有被标记过,就会用mark_object将会标记一个对象,具体看看mark_object的实现:

inline void MarkSweep::mark_object(oop obj) {
#if INCLUDE_ALL_GCS
  if (G1StringDedup::is_enabled()) {
    // We must enqueue the object before it is marked
    // as we otherwise can't read the object's age.
    G1StringDedup::enqueue_from_mark(obj);
  }
#endif
  // some marks may contain information we need to preserve so we store them away
  // and overwrite the mark.  We'll restore it at the end of markSweep.
  markOop mark = obj->mark();
  obj->set_mark(markOopDesc::prototype()->set_marked());

  if (mark->must_be_preserved(obj)) {
    preserve_mark(obj, mark);
  }
}

调用了oop的set_mark方法进行对象标记,如果对象头里面的信息需要被保存起来稍后GC完成需要恢复,那么就要调用preserve_mark将对象头的信息存储起来,mark是对象的oldMark,厦门市hipreserve_mark的实现:

// We preserve the mark which should be replaced at the end and the location
// that it will go.  Note that the object that this markOop belongs to isn't
// currently at that address but it will be after phase4
void MarkSweep::preserve_mark(oop obj, markOop mark) {
  // We try to store preserved marks in the to space of the new generation since
  // this is storage which should be available.  Most of the time this should be
  // sufficient space for the marks we need to preserve but if it isn't we fall
  // back to using Stacks to keep track of the overflow.
  if (_preserved_count < _preserved_count_max) {
    _preserved_marks[_preserved_count++].init(obj, mark);
  } else {
    _preserved_mark_stack.push(mark);
    _preserved_oop_stack.push(obj);
  }
}

这个函数较为简单,如果_preserved_marks里面存储了太多的对象头信息超出限制了,那么就将对象头信息分别存储在_preserved_mark_stack和_preserved_oop_stack两个栈里面,否则存储在_preserved_marks里面去;说完了对象的标记,下面来看看follow_object;

inline void MarkSweep::follow_object(oop obj) {
  assert(obj->is_gc_marked(), "should be marked");
  if (obj->is_objArray()) {
    // Handle object arrays explicitly to allow them to
    // be split into chunks if needed.
    MarkSweep::follow_array((objArrayOop)obj);
  } else {
    obj->oop_iterate(&mark_and_push_closure);
  }
}

follow_object根据名字可以猜测是处理obj的引用,我想这也是一个递归的过程,具体看看上面的代码片段,如果对象是一个数组对象,那么就使用follow_array来处理,否则使用对象的oop_iterate函数来处理,数组对象单独处理的原因是如果数组对象和普通对象一起处理,数组对象非常大的时候可能会影响普通对象的处理;follow_array最后依然还是使用follow_object来处理数组元素中的对象的,看看follow_array:

inline void MarkSweep::follow_array(objArrayOop array) {
  MarkSweep::follow_klass(array->klass());
  // Don't push empty arrays to avoid unnecessary work.
  if (array->length() > 0) {
    MarkSweep::push_objarray(array, 0);
  }
}

如果数组长度大于0,那么就使用push_objarray来处理这个数组:

void MarkSweep::push_objarray(oop obj, size_t index) {
  ObjArrayTask task(obj, index);
  assert(task.is_valid(), "bad ObjArrayTask");
  _objarray_stack.push(task);
}

push_objarray将数组push到了_objarray_stack栈里面,follow_stack函数会去处理_objarray_stack栈中的数组对象:

void MarkSweep::follow_stack() {
  do {
    while (!_marking_stack.is_empty()) {
      oop obj = _marking_stack.pop();
      assert (obj->is_gc_marked(), "p must be marked");
      follow_object(obj);
    }
    // Process ObjArrays one at a time to avoid marking stack bloat.
    if (!_objarray_stack.is_empty()) {
      ObjArrayTask task = _objarray_stack.pop();
      follow_array_chunk(objArrayOop(task.obj()), task.index());
    }
  } while (!_marking_stack.is_empty() || !_objarray_stack.is_empty());
}

从_marking_stack中拿出数组对象之后,调用follow_object继续处理,但是这时候follow_object里面的已经不是一个纯粹的数组对象了,已经是一个ObjArrayTask对象了,具体的标记过程泰国复杂就不继续深入了。
对象标记的工作完成之后,就要开始处理发现的对象了:

2018-11-11 6 12 44

这个处理过程和Minor GC时的处理是一样的;接下来会做一些清理工作:

2018-11-11 6 14 53

SystemDictionary::do_unloading用于卸载一些不再使用到的类;CodeCache::do_unloading用于卸载一些不再使用到的方法(编译好的方法会放在CodeCache里面去);Klass::clean_weak_klass_links用于清理weak reference;StringTable::unlink(&is_alive)用于删除一些不再使用的字符串常量;SymbolTable::unlink()用于从符号表中清理那些不再使用的符号;

  • (2)、接着是mark_sweep_phase2,重新计算存活对象的新地址

首先看prepare_for_compaction这个函数:

void GenCollectedHeap::prepare_for_compaction() {
  // Start by compacting into same gen.
  CompactPoint cp(_old_gen);
  _old_gen->prepare_for_compaction(&cp);
  _young_gen->prepare_for_compaction(&cp);
}

首先看oldGen的prepare_for_compaction函数实现:

void Generation::prepare_for_compaction(CompactPoint* cp) {
  // Generic implementation, can be specialized
  CompactibleSpace* space = first_compaction_space();
  while (space != NULL) {
    space->prepare_for_compaction(cp);
    space = space->next_compaction_space();
  }
}

这是一个循环处理过程,通过prepare_for_compaction函数来处理:

void ContiguousSpace::prepare_for_compaction(CompactPoint* cp) {
  scan_and_forward(this, cp);
}

scan_and_forward这个函数名字非常直观,扫描并且做forward,forward可以理解为将对象转移到一个新的位置,整个步骤(2)只是计算出一个对象的新地址,并没有将对象转移到新的地址去,转移对象到新地址的工作将在接下来的步骤(3)里面进行,下面的代码片段是步骤(2)处理的核心:

 while (cur_obj < scan_limit) {
    assert(!space->scanned_block_is_obj(cur_obj) ||
           oop(cur_obj)->mark()->is_marked() || oop(cur_obj)->mark()->is_unlocked() ||
           oop(cur_obj)->mark()->has_bias_pattern(),
           "these are the only valid states during a mark sweep");
    if (space->scanned_block_is_obj(cur_obj) && oop(cur_obj)->is_gc_marked()) {
      // prefetch beyond cur_obj
      Prefetch::write(cur_obj, interval);
      size_t size = space->scanned_block_size(cur_obj);
      compact_top = cp->space->forward(oop(cur_obj), size, cp, compact_top);
      cur_obj += size;
      end_of_live = cur_obj;
    } else {
      // run over all the contiguous dead objects
      HeapWord* end = cur_obj;
      do {
        // prefetch beyond end
        Prefetch::write(end, interval);
        end += space->scanned_block_size(end);
      } while (end < scan_limit && (!space->scanned_block_is_obj(end) || !oop(end)->is_gc_marked()));

      // see if we might want to pretend this object is alive so that
      // we don't have to compact quite as often.
      if (cur_obj == compact_top && dead_spacer.insert_deadspace(cur_obj, end)) {
        oop obj = oop(cur_obj);
        compact_top = cp->space->forward(obj, obj->size(), cp, compact_top);
        end_of_live = end;
      } else {
        // otherwise, it really is a free region.

        // cur_obj is a pointer to a dead object. Use this dead memory to store a pointer to the next live object.
        *(HeapWord**)cur_obj = end;

        // see if this is the first dead region.
        if (first_dead == NULL) {
          first_dead = cur_obj;
        }
      }

      // move on to the next object
      cur_obj = end;
    }
  }

这个代码较长,主要完成的就一件事情,就是找到那些存活的对象,然后给这些存活的对象计算一个新的地址;为对象计算新地址的工作由CompactibleSpace::forward完成:

2018-11-11 7 23 01

CompactibleSpace::forward首先试图找到一块合适的内存来存放存活的对象,然后判断这块内存是否和存活对象目前所在的位置一样,如果一样的话就没必要移动了,否则就要改变指针来移动对象,移动的工作将在(3)中进行。

上面说了对存活对象的处理,对于死亡对象,首先找到下一个存活的对象,也就是找到一段连续的死亡对象,然后判断是否可以将这段死亡对象也当成是"活的"对象,判断条件还是比较严格的,首先,这段死亡对象的起点应该是compact_top,也就是空闲的空间起点(对于forward来说),并且通过dead_spacer.insert_deadspace的校验:

 bool insert_deadspace(HeapWord* dead_start, HeapWord* dead_end) {
    if (!_active) {
      return false;
    }

    size_t dead_length = pointer_delta(dead_end, dead_start);
    if (_allowed_deadspace_words >= dead_length) {
      _allowed_deadspace_words -= dead_length;
      CollectedHeap::fill_with_object(dead_start, dead_length);
      oop obj = oop(dead_start);
      obj->set_mark(obj->mark()->set_marked());

      assert(dead_length == (size_t)obj->size(), "bad filler object size");
      log_develop_trace(gc, compaction)("Inserting object to dead space: " PTR_FORMAT ", " PTR_FORMAT ", " SIZE_FORMAT "b",
          p2i(dead_start), p2i(dead_end), dead_length * HeapWordSize);

      return true;
    } else {
      _active = false;
      return false;
    }
  }

};

_allowed_deadspace_words是允许死亡对象存储的空间大小,这部分空间是属于浪费调的,如果太大那就不行了,那为什么还要将死亡对象也当成"活着"的对象对待呢?因为对象拷贝也是有损耗的,如果一段死亡对象刚好不需要移动,并且浪费掉的空间在可以接受的范围内,那么何乐而不为呢?insert_deadspace这个方法就是做这件事情的,当然,这段死亡的对象会被使用一个新的长度和原来这段死亡对象长度相等的一个对象替换。

  • (3)、步骤(2)完成了存活对象的新地址计算,那么步骤(3)的主要工作就是将对象转移到新的地址去,也算是"整理"了;

转移对象到新地址的工作由AdjustPointerClosure来完成,直接来看这个Closure的do_oop方法吧;

void MarkSweep::AdjustPointerClosure::do_oop(oop* p)       { do_oop_nv(p); }
void MarkSweep::AdjustPointerClosure::do_oop_nv(T* p)      { adjust_pointer(p); }

接着看adjust_pointer这个函数;

template <class T> inline void MarkSweep::adjust_pointer(T* p) {
  T heap_oop = oopDesc::load_heap_oop(p);
  if (!oopDesc::is_null(heap_oop)) {
    oop obj     = oopDesc::decode_heap_oop_not_null(heap_oop);
    assert(Universe::heap()->is_in(obj), "should be in heap");

    oop new_obj = oop(obj->mark()->decode_pointer());
    assert(is_archive_object(obj) ||                  // no forwarding of archive objects
           new_obj != NULL ||                         // is forwarding ptr?
           obj->mark() == markOopDesc::prototype() || // not gc marked?
           (UseBiasedLocking && obj->mark()->has_bias_pattern()),
           // not gc marked?
           "should be forwarded");
    if (new_obj != NULL) {
      if (!is_archive_object(obj)) {
        assert(Universe::heap()->is_in_reserved(new_obj),
              "should be in object space");
        oopDesc::encode_store_heap_oop_not_null(p, new_obj);
      }
    }
  }
}

adjust_pointer这个函数的目的是将对象p转移到new_obj里面去,在实现上,就是将new_obj的地址赋值给p即可:

static inline void encode_store_heap_oop_not_null(oop* p, oop v) { 
        *p = v;
 }
  • (4)、将所有存活的对象转移到新的地址里面去

GenCompactClosure会遍历老年代和新生代,做内存整理的工作;generation_iterate会根据配置从老年代或者新生代开始进行压缩工作:

void GenCollectedHeap::generation_iterate(GenClosure* cl,
                                          bool old_to_young) {
  if (old_to_young) {
    cl->do_generation(_old_gen);
    cl->do_generation(_young_gen);
  } else {
    cl->do_generation(_young_gen);
    cl->do_generation(_old_gen);
  }
}

下面是GenCompactClosure的do_generation函数:

  void do_generation(Generation* gen) {
    gen->compact();
  }
void Generation::compact() {
  CompactibleSpace* sp = first_compaction_space();
  while (sp != NULL) {
    sp->compact();
    sp = sp->next_compaction_space();
  }
}

跟着CompactibleSpace的compact函数看,CompactibleSpace::scan_and_compact是具体实现压缩工作的函数,下面来分析一下这个方法的实现细节;

(1)、如果这块内存内部没有存活的对象,那么可以直接忽略这块内容
(2)、找到第一个被GC标记过的对象,然后开始存活进行对象转移操作,对于遍历过程中遇到的没有被标记过的对象,skip;所谓转移就是通过内存对象拷贝将对象拷贝到计算好的地址去
(3)、最后需要判断是否这块内存已经空了,如果已经没有存活的对象了,那么清空整个region

@pandening

OwnerAuthor

pandening commented on 12 Nov 2018 • 

edited 

JVM可以帮我们管理内存,这是一件非常有意义的事情,我们再也不用担心allocate出来的内存没有在适当的时候free掉了,这个comment希望能去探索一下JVM是如何处理内容申请的,因为垃圾收集的发生就是因为申请了太多的内存,需要清理或者整理哪些已经没有价值的对象来释放空间,以满足新的内存分配申请;下面将以一个具体的内存申请问题出发,从源码角度去分析一下JVM的内存分配处理链路;

JVM是如何为一个对象在堆上申请一块空间的?

在java语言中我们通过使用new关键字来创建一个新的对象,在虚拟机中对应着new指令,当然本文并不打算从new指令说起;instanceOopDesc对应java语言中的对象实例,所以创建一个新对象就是在JVM里面创建一个新的instanceOopDesc实例,InstanceKlass::allocate_instance用于创建一个新的instanceOopDesc实例,下面就从allocate_instance函数开始说起。

instanceOop InstanceKlass::allocate_instance(TRAPS) {
  bool has_finalizer_flag = has_finalizer(); // Query before possible GC
  int size = size_helper();  // Query before forming handle.

  KlassHandle h_k(THREAD, this);

  instanceOop i;

  i = (instanceOop)CollectedHeap::obj_allocate(h_k, size, CHECK_NULL);
  if (has_finalizer_flag && !RegisterFinalizersAtInit) {
    i = register_finalizer(i, CHECK_NULL);
  }
  return i;
}

整个函数大概分三个步骤执行,首先取到实例所需要的空间大小size,然后使用CollectedHeap::obj_allocate去堆上申请一块大小为size的空间,最后判断实例是否实现了finalizer,如果有的话,那么就要使用register_finalizer注册finalizer;这里主要关心内存申请的部分,也就是CollectedHeap::obj_allocate函数;

oop CollectedHeap::obj_allocate(KlassHandle klass, int size, TRAPS) {
  debug_only(check_for_valid_allocation_state());
  assert(!Universe::heap()->is_gc_active(), "Allocation during gc not allowed");
  assert(size >= 0, "int won't convert to size_t");
  HeapWord* obj = common_mem_allocate_init(klass, size, CHECK_NULL);
  post_allocation_setup_obj(klass, obj, size);
  NOT_PRODUCT(Universe::heap()->check_for_bad_heap_word_value(obj, size));
  return (oop)obj;
}

这个函数做了一些校验,然后调用common_mem_allocate_init去申请内存,下面看看common_mem_allocate_init这个函数的实现:

HeapWord* CollectedHeap::common_mem_allocate_init(KlassHandle klass, size_t size, TRAPS) {
  HeapWord* obj = common_mem_allocate_noinit(klass, size, CHECK_NULL);
  init_obj(obj, size);
  return obj;
}

这个函数分两步,首先使用common_mem_allocate_noinit来申请内存,然后使用init_obj初始化这块内存;,依然只关系内存申请相关函数common_mem_allocate_noinit:

2018-11-11 10 39 40

申请分两组情况,如果使用TLAB(Thread-Local Allocation Buffer),那么就使用allocate_from_tlab来分配内存,否则使用Universe::heap()->mem_allocate来分配内存;先来看看从TLAB分配内存的情况:

HeapWord* CollectedHeap::allocate_from_tlab(KlassHandle klass, Thread* thread, size_t size) {
  assert(UseTLAB, "should use UseTLAB");

  HeapWord* obj = thread->tlab().allocate(size);
  if (obj != NULL) {
    return obj;
  }
  // Otherwise...
  return allocate_from_tlab_slow(klass, thread, size);
}

依然还是分两种情况,首先通过thread->tlab().allocate来分配内存,如果无法满足要求,那么就通过allocate_from_tlab_slow来进行内存分配,还是先来看thread->tlab().allocate;

inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
  invariants();
  HeapWord* obj = top();
  if (pointer_delta(end(), obj) >= size) {
    // successful thread-local allocation
#ifdef ASSERT
    // Skip mangling the space corresponding to the object header to
    // ensure that the returned space is not considered parsable by
    // any concurrent GC thread.
    size_t hdr_size = oopDesc::header_size();
    Copy::fill_to_words(obj + hdr_size, size - hdr_size, badHeapWordVal);
#endif // ASSERT
    // This addition is safe because we know that top is
    // at least size below end, so the add can't wrap.
    set_top(obj + size);

    invariants();
    return obj;
  }
  return NULL;
}

这个函数还是比较简单明了的,top指针指向空闲内存开始处,判断tlab里面剩下的内存是否可以满足要求,如果可以,那么就分配size大小的空间,并且移动空闲指针到合适的地方;否则就代表无法成功在TLAB上分配到足够的内存;

如果thread->tlab().allocate分配失败,那么allocate_from_tlab_slow就要开始工作了,首先,如果JVM认为TLAB空闲的内存足够大,那么就不能抛弃这部分空闲的内存,那就得去堆中去分配了;

  // Retain tlab and allocate object in shared space if
  // the amount free in the tlab is too large to discard.
  if (thread->tlab().free() > thread->tlab().refill_waste_limit()) {
    thread->tlab().record_slow_allocation(size);
    return NULL;
  }

2018-11-11 10 53 04

接着,就说明TLAB里面已经没有空闲的空间了,或者TLAB里面空闲的空间可以忍受浪费,那么就新申请一块TLAB,首先需要计算新的TLAB的大小,thread->tlab().compute_size将承担这个工作:

2018-11-11 11 01 35

首先将申请的对象大小规整为aligned_obj_size,然后计算出目前可申请的空间大小available_size,这个大小的值可能是新生代中Eden的空闲空间;new_tlab_size是最终确定的申请的TLAB的大小;接着判断是否满足要求,如果new_tlab_size的大小还不足以满足申请的对象实例,那么就放弃神奇这次TLAB;
回到allocate_from_tlab_slow,如果发现计算出来的TLAB的大小为0,那么就直接返回NULL告诉上层无法完成在TLAB上进行分配;如果计算出来的新的TLAB的大小不为0,那么就通过Universe::heap()->allocate_new_tlab函数来申请一块新的TLAB;这个工作将由GenCollectedHeap::allocate_new_tlab来完成:

HeapWord* GenCollectedHeap::allocate_new_tlab(size_t size) {
  bool gc_overhead_limit_was_exceeded;
  return gen_policy()->mem_allocate_work(size /* size */,
                                         true /* is_tlab */,
                                         &gc_overhead_limit_was_exceeded);
}

这个函数较为复杂,下面按几个关键步骤来分析一下该函数的实现;

  • (1)、首先判断是否可以在Young区分配空间,如果可以,那么就在Young区域分配,如果分配成功了,那么就可以结束这次内存分配之旅了,否则就得继续往下尝试。
    if (young->should_allocate(size, is_tlab)) {
      result = young->par_allocate(size, is_tlab);
      if (result != NULL) {
        assert(gch->is_in_reserved(result), "result not in heap");
        return result;
      }
    }

should_allocate函数在DefNew里面的实现如下:

  // Allocation support
  virtual bool should_allocate(size_t word_size, bool is_tlab) {
    assert(UseTLAB || !is_tlab, "Should not allocate tlab");

    size_t overflow_limit    = (size_t)1 << (BitsPerSize_t - LogHeapWordSize);

    const bool non_zero      = word_size > 0;
    const bool overflows     = word_size >= overflow_limit;
    const bool check_too_big = _pretenure_size_threshold_words > 0;
    const bool not_too_big   = word_size < _pretenure_size_threshold_words;
    const bool size_ok       = is_tlab || !check_too_big || not_too_big;

    bool result = !overflows &&
                  non_zero   &&
                  size_ok;

    return result;
  }

如果申请的内存在可控的范围之内,那么就可以在该DefNew里面进行分配,否则就不行;如果判断可以在Young里面分配内存,那么就通过young->par_allocate函数来执行内存分配的工作:

HeapWord* DefNewGeneration::par_allocate(size_t word_size,
                                         bool is_tlab) {
  HeapWord* res = eden()->par_allocate(word_size);
  if (CMSEdenChunksRecordAlways && _old_gen != NULL) {
    _old_gen->sample_eden_chunk();
  }
  return res;
}

eden()->par_allocate是关键,最后将由par_allocate_impl来实现具体的内存分配工作:

// This version is lock-free.
inline HeapWord* ContiguousSpace::par_allocate_impl(size_t size) {
  do {
    HeapWord* obj = top();
    if (pointer_delta(end(), obj) >= size) {
      HeapWord* new_top = obj + size;
      HeapWord* result = (HeapWord*)Atomic::cmpxchg_ptr(new_top, top_addr(), obj);
      // result can be one of two:
      //  the old top value: the exchange succeeded
      //  otherwise: the new value of the top is returned.
      if (result == obj) {
        assert(is_aligned(obj) && is_aligned(new_top), "checking alignment");
        return obj;
      }
    } else {
      return NULL;
    }
  } while (true);
}

这个函数还是比较容易理解的,通过CAS技术来循环尝试分配内存,top指向空闲内存的起始地址,尝试分配内存就是将top指针向前移动size长度即可,当然,如果申请的内存大小大于Eden的空闲内存,那么直接就会返回NULL以代表内存分配失败;

2018-11-11 11 24 40

如果无法从Young区域成功申请到内存,那么就要使用attempt_allocation来从其他的分代尝试获取足够的内存了;

HeapWord* GenCollectedHeap::attempt_allocation(size_t size,
                                               bool is_tlab,
                                               bool first_only) {
  HeapWord* res = NULL;
  if (_young_gen->should_allocate(size, is_tlab)) {
    res = _young_gen->allocate(size, is_tlab);
    if (res != NULL || first_only) {
      return res;
    }
  }
  if (_old_gen->should_allocate(size, is_tlab)) {
    res = _old_gen->allocate(size, is_tlab);
  }
  return res;
}

依然是用各个分代的should_allocate来判断是否可以在某个分代进行内存分配,首先尝试在Young区域进行分配,然后在尝试从Old区域分配内存,从Young区分配内存的过程已经在上面分析过,就不再赘述了;下面来分析如何判断是否可以在old区域进行内存分配,以及具体是如何进行内存分配的;

  // Returns "true" iff this generation should be used to allocate an
  // object of the given size.  Young generations might
  // wish to exclude very large objects, for example, since, if allocated
  // often, they would greatly increase the frequency of young-gen
  // collection.
  virtual bool should_allocate(size_t word_size, bool is_tlab) {
    bool result = false;
    size_t overflow_limit = (size_t)1 << (BitsPerSize_t - LogHeapWordSize);
    if (!is_tlab || supports_tlab_allocation()) {
      result = (word_size > 0) && (word_size < overflow_limit);
    }
    return result;
  }

老年代是不支持TLAB分配的,只有DefNew是支持的,supports_tlab_allocation函数用于判断某个分代是否支持TLAB分配,除了DefNew,其他分代都是不支持的,当然,如果不是TLAB分配请求,那么如果申请分配的内存大于0并且小于最大极限,那么就支持在该分代内申请内存,否则就是不支持的。

接着看看具体如何在老年代进行内存分配(对于Serial Old);

inline HeapWord* OffsetTableContigSpace::allocate(size_t size) {
  HeapWord* res = ContiguousSpace::allocate(size);
  if (res != NULL) {
    _offsets.alloc_block(res, size);
  }
  return res;
}

接着去看ContiguousSpace的allocate_impl:

// This version requires locking.
inline HeapWord* ContiguousSpace::allocate_impl(size_t size) {
  assert(Heap_lock->owned_by_self() ||
         (SafepointSynchronize::is_at_safepoint() && Thread::current()->is_VM_thread()),
         "not locked");
  HeapWord* obj = top();
  if (pointer_delta(end(), obj) >= size) {
    HeapWord* new_top = obj + size;
    set_top(new_top);
    assert(is_aligned(obj) && is_aligned(new_top), "checking alignment");
    return obj;
  } else {
    return NULL;
  }
}

依然是一段比较清晰简单的代码,top依然是空闲内存的起始地址,申请一段内存就是将top向前移动一段距离;
回到mem_allocate_work,如果attempt_allocation依然无法满足要求(分配失败),那么就要走接下来的逻辑了,如果is_active_and_needs_gc是true,那么说明其他线程已经出发了GC,这个时候判断堆是否可以扩展,如果可以扩展那么就进行堆扩展,否则就得出发GC操作了;
如果gch->is_maximal_no_gc()是false的,那么就说明堆是可以尝试去扩展的,那么就使用expand_heap_and_allocate来扩展堆然后再尝试分配;下面来看看expand_heap_and_allocate函数的实现细节;

HeapWord* GenCollectorPolicy::expand_heap_and_allocate(size_t size,
                                                       bool   is_tlab) {
  GenCollectedHeap *gch = GenCollectedHeap::heap();
  HeapWord* result = NULL;
  Generation *old = gch->old_gen();
  if (old->should_allocate(size, is_tlab)) {
    result = old->expand_and_allocate(size, is_tlab);
  }
  if (result == NULL) {
    Generation *young = gch->young_gen();
    if (young->should_allocate(size, is_tlab)) {
      result = young->expand_and_allocate(size, is_tlab);
    }
  }
  assert(result == NULL || gch->is_in_reserved(result), "result not in heap");
  return result;
}

堆扩展的顺序是老年代到新生代,内存分配的顺序是从新生代到老年代,这个细节需要注意一下!expand_and_allocate函数永远做堆分代扩展及内存分配的具体工作,首先看Serial Old的expand_and_allocate是如何实现的:

HeapWord*
TenuredGeneration::expand_and_allocate(size_t word_size,
                                       bool is_tlab,
                                       bool parallel) {
  assert(!is_tlab, "TenuredGeneration does not support TLAB allocation");
  if (parallel) {
    MutexLocker x(ParGCRareEvent_lock);
    HeapWord* result = NULL;
    size_t byte_size = word_size * HeapWordSize;
    while (true) {
      expand(byte_size, _min_heap_delta_bytes);
      if (GCExpandToAllocateDelayMillis > 0) {
        os::sleep(Thread::current(), GCExpandToAllocateDelayMillis, false);
      }
      result = _the_space->par_allocate(word_size);
      if ( result != NULL) {
        return result;
      } else {
        // If there's not enough expansion space available, give up.
        if (_virtual_space.uncommitted_size() < byte_size) {
          return NULL;
        }
        // else try again
      }
    }
  } else {
    expand(word_size*HeapWordSize, _min_heap_delta_bytes);
    return _the_space->allocate(word_size);
  }
}

parallel代表是否是多线程版本的GC,Serial Old是单线程的,所以看else分支即可;expand函数实现堆扩展,allocate函数用于从堆中申请内存,先看看expand函数的实现;CardGeneration::expand是最终指向expand的实际函数:

2018-11-11 11 53 13

通过不断尝试缩小扩展的大小来进行堆扩展,grow_by函数用于实际执行扩展工作,暂时就不深入了;
下面看看DefNew的expand_and_allocate函数是怎么实现的;

HeapWord* DefNewGeneration::expand_and_allocate(size_t size,
                                                bool   is_tlab,
                                                bool   parallel) {
  // We don't attempt to expand the young generation (but perhaps we should.)
  return allocate(size, is_tlab);
}
HeapWord* DefNewGeneration::allocate(size_t word_size, bool is_tlab) {
  // This is the slow-path allocation for the DefNewGeneration.
  // Most allocations are fast-path in compiled code.
  // We try to allocate from the eden.  If that works, we are happy.
  // Note that since DefNewGeneration supports lock-free allocation, we
  // have to use it here, as well.
  HeapWord* result = eden()->par_allocate(word_size);
  if (result != NULL) {
    if (CMSEdenChunksRecordAlways && _old_gen != NULL) {
      _old_gen->sample_eden_chunk();
    }
  } else {
    // If the eden is full and the last collection bailed out, we are running
    // out of heap space, and we try to allocate the from-space, too.
    // allocate_from_space can't be inlined because that would introduce a
    // circular dependency at compile time.
    result = allocate_from_space(word_size);
  }
  return result;
}

eden()->par_allocate试图从Eden空闲区域中去申请内存,上文已经分析过这个函数的实现细节,不再赘述;如果从Eden区域分配失败,那么就尝试在From区域进行分配,allocate_from_space函数用于执行这个工作;

2018-11-11 11 59 12

是否需要从From区域进行内存分配需要做一些判断,should_allocate_from_space用于判断是否应该在From区域进行内存分配,当然,如果已经有线程触发了GC,那么也是可以从From区域进行内存分配的;下面先来看看should_allocate_from_space的判断标准;

2018-11-12 12 03 35

如果在做一次FullGC,并且collection_attempt_is_safe返回了false,并且Eden不是空的,那么就可以在From分配内存,collection_attempt_is_safe是做什么的?

bool DefNewGeneration::collection_attempt_is_safe() {
  if (!to()->is_empty()) {
    log_trace(gc)(":: to is not empty ::");
    return false;
  }
  if (_old_gen == NULL) {
    GenCollectedHeap* gch = GenCollectedHeap::heap();
    _old_gen = gch->old_gen();
  }
  return _old_gen->promotion_attempt_is_safe(used());
}

如果to区域不为空,那么说明发生了"Promotion fail",这种情况下是false,以及_old_gen->promotion_attempt_is_safe也是false;

bool TenuredGeneration::promotion_attempt_is_safe(size_t max_promotion_in_bytes) const {
  size_t available = max_contiguous_available();
  size_t av_promo  = (size_t)gc_stats()->avg_promoted()->padded_average();
  bool   res = (available >= av_promo) || (available >= max_promotion_in_bytes);

  log_trace(gc)("Tenured: promo attempt is%s safe: available(" SIZE_FORMAT ") %s av_promo(" SIZE_FORMAT "), max_promo(" SIZE_FORMAT ")",
    res? "":" not", available, res? ">=":"<", av_promo, max_promotion_in_bytes);

  return res;
}

如果老年代连续的可用内存空间大于新生代的对象大小或者大于新生代历史平均晋升大小,那么就是true,否则就是false;

回到GenCollectorPolicy::mem_allocate_work函数中来,如果尝试扩展堆之后还是无法申请到内存,那么就只能触发一次VM_GenCollectForAllocation类型的GC Operation了,完成之后再尝试申请内存;

现在回头看看CollectedHeap::common_mem_allocate_noinit,如果TLAB这个分支无法完成内存申请工作,那么就要交给Universe::heap()->mem_allocate来执行内存分配的工作;下面来分析一下Universe::heap()->mem_allocate的流程,这个流程分析完了整个对象实例分配的流程也就分析完了:

HeapWord* GenCollectedHeap::mem_allocate(size_t size,
                                         bool* gc_overhead_limit_was_exceeded) {
  return gen_policy()->mem_allocate_work(size,
                                         false /* is_tlab */,
                                         gc_overhead_limit_was_exceeded);
}

上面已经分析过mem_allocate_work这个函数的具体实现,和TLAB分支唯一的区别就是is_tlab是false,所以接下来的分析就不进行了;
在内存分配的时候有一个细节没有提到,JVM提供了一个参数-XX:PretenureSizeThreshold,该参数可以设置直接在老年代进行分配的对象大小阈值,如果对象的大小大于该阈值,则直接在老年代分配,这个逻辑是做在新生代的,比如DefNew里面should_allocate函数里面判断对象是否应该被防止在DefNew的时候就用到了_pretenure_size_threshold_words参数。

JVM对象内存申请流程总结如下:

  • (1)、优先使用TLAB策略进行分配,在线程的TLAB里面进行内分配,如果无法再TLAB里面分配成功,再去堆中进行直接分配
  • (2)、在TLAB模式下进行分配的时候,如果无法分配成功,那么就要尝试去分配新的TLAB,然后再尝试分配内存
  • (3)、无论是重新申请TLAB,还是直接在堆上为内存分配内存,都是一样的,都是在堆上分配内存,下面就不区分了
  • (4)、首先尝试在新生代Eden中申请内存,如果无法完成内存分配,那么尝试从From区域分配(需要判断),如果还是不行,那么再尝试从老年代中分配内存
  • (5)、如果内存申请还是无法得到满足,这个时候就要去扩展堆来满足要去了,当然还是要去判断是否允许进行堆扩展
  • (6)、堆扩展是从老年代到新生代的,而内存分配是从新生代到老年代的,对扩展完了之后再尝试去申请内存,如果还是无法完成内存分配工作,那么这个时候就要试图去From区域分配内存,当然还是要判断是否允许在From区域进行内存分配的
  • (7)、如果还是无法完成内存分配,那么就要出发一次GC来回收垃圾了,然后再去尝试
  • (8)、OOM

@pandening

OwnerAuthor

pandening commented on 12 Nov 2018 • 

edited 

本comment希望能系统的探索一下GC发生的时机,以及各个GC的具体工作内容(流程),GC包括Minor GC和Major GC,下面将分别看看Minor GC和Major GC会在什么时候执行、怎么执行的,也就是希望能了解触发GC的条件和GC原理。
已经在前面的内容中说过JVM是如何支持GC工作的,简单来说就是在create_vm的时候创建一个VMThread,VMThread有一个任务队列,VMThread会等待任务队列里面存储任务然后拿出来执行,当任务队列中已经没有可以执行的任务的时候就wait,直到被其他的线程notify,然后接着处理任务;任务队列里面存储着VMOperationQueue类型的任务,有很多类型的VMOperationQueue,VM_GC_Operation代表的就是GC操作,所以主要来关注VM_GC_Operation,VM_GC_Operation也有很多子类,下面的图片展示了VM_GC_Operation的子类情况:

2018-11-12 10 30 50

其中VM_CollectForAllocation表示内存申请失败,它有三个子类,分别是VM_GenCollectForAllocation、VM_ParallelGCFailedAllocation、VM_G1OperationWithAllocRequest;带Full字符的Operation代表是一次FullGC,有VM_GenCollectFull、VM_G1CollectFull;VM_ParallelGCSystemGC虽然不带Full,但是也是FullGC操作;下面来看看触发这些VM_GC_Operation的时机到底是什么时候。

VM_GenCollectForAllocation

可以在collectorPolicy.cpp的mem_allocate_work函数里面发现除了了一个VM_GenCollectForAllocation;mem_allocate_work函数用于申请内存空间,前面的文章也分析过这个函数,简单来说,这个函数将首先在YoungGen里面申请内存,如果无法得到满足,那么就去OldGen试试,如果OldGen也不可以满足话,那么就去尝试扩展堆之后再试试,如果还是不行,那就只能触发一个VM_GenCollectForAllocation了;

2018-11-12 10 52 13

VMThread::execute函数会将这个VM_GenCollectForAllocation放到VMThread的任务队列里面去,VMThread就会执行这个VM_GenCollectForAllocation的doit函数,下面来看看VM_GenCollectForAllocation的doit函数的具体实现:

void VM_GenCollectForAllocation::doit() {
  SvcGCMarker sgcm(SvcGCMarker::MINOR);

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  GCCauseSetter gccs(gch, _gc_cause);
  _result = gch->satisfy_failed_allocation(_word_size, _tlab);
  assert(gch->is_in_reserved_or_null(_result), "result not in heap");

  if (_result == NULL && GCLocker::is_active_and_needs_gc()) {
    set_gc_locked();
  }
}

satisfy_failed_allocation函数前面的文章也已经说过了,再总结一下这个函数的具体工作;

  • (1)、首先判断是有其他的线程触发了GC,如果是的话,那么本次GC就不能继续了,但是退出前试试能不能扩展堆,如果可以的话说不定就可以在扩展堆之后成功申请到需要的空间了,如果这个时候不能扩展堆的话,那么就只能退出等其他的线程GC完成了;
  • (2)、判断是否可以增量进行GC,如果可以的话,那么就执行一次Minor GC,否则执行一次不回收soft reference的Full GC;如果这次GC之后可以成功申请到内存了
  • (3)、如果(2)结束之后还是无法申请到足够的内存,那么就要进行一次彻底的FullGC,这次GC将要把soft reference都清理掉;

总结一下,VM_GenCollectForAllocation会在内存申请失败的时候进行工作,它可能触发Minor GC和FullGC,首先是Minor GC,如果Minor GC并不奏效,那么就要进行FullGC;

VM_ParallelGCFailedAllocation

VM_GenCollectForAllocation工作在DefNew,是SerialGC的年轻代;VM_ParallelGCFailedAllocation工作在ParallelScavengeHeap,ParallelScavengeHeap是UseParallelGC和UseParallelOldGC的年轻代,属于"吞吐量"GC,该类型的GC注重的是系统的吞吐量,和CMS注重"响应时间"不同,"吞吐量"类型GC可以设定用于GC的时间,JVM会自动调整堆来满足要求;
可以在parallelScavengeHeap的mem_allocate函数里面看到触发了VM_ParallelGCFailedAllocation操作;

2018-11-12 11 22 23

mem_allocate函数先从YounYoungGen申请内存,如果无法得到满足,那么就去OldGen去申请内存;如果还是无法满足要求,那么就触发一个
VM_GenCollectForAllocation操作,来看看VM_GenCollectForAllocation的doit函数的实现;

void VM_ParallelGCFailedAllocation::doit() {
  SvcGCMarker sgcm(SvcGCMarker::MINOR);

  ParallelScavengeHeap* heap = ParallelScavengeHeap::heap();

  GCCauseSetter gccs(heap, _gc_cause);
  _result = heap->failed_mem_allocate(_word_size);

  if (_result == NULL && GCLocker::is_active_and_needs_gc()) {
    set_gc_locked();
  }
}

ParallelScavengeHeap::failed_mem_allocate函数将会处理接下来的工作,下面来分析一下ParallelScavengeHeap::failed_mem_allocate这个函数的具体实现细节;

// Failed allocation policy. Must be called from the VM thread, and
// only at a safepoint! Note that this method has policy for allocation
// flow, and NOT collection policy. So we do not check for gc collection
// time over limit here, that is the responsibility of the heap specific
// collection methods. This method decides where to attempt allocations,
// and when to attempt collections, but no collection specific policy.
HeapWord* ParallelScavengeHeap::failed_mem_allocate(size_t size) {
  assert(SafepointSynchronize::is_at_safepoint(), "should be at safepoint");
  assert(Thread::current() == (Thread*)VMThread::vm_thread(), "should be in vm thread");
  assert(!is_gc_active(), "not reentrant");
  assert(!Heap_lock->owned_by_self(), "this thread should not own the Heap_lock");

  // We assume that allocation in eden will fail unless we collect.

  // First level allocation failure, scavenge and allocate in young gen.
  GCCauseSetter gccs(this, GCCause::_allocation_failure);
  const bool invoked_full_gc = PSScavenge::invoke();
  HeapWord* result = young_gen()->allocate(size);

  // Second level allocation failure.
  //   Mark sweep and allocate in young generation.
  if (result == NULL && !invoked_full_gc) {
    do_full_collection(false);
    result = young_gen()->allocate(size);
  }

  death_march_check(result, size);

  // Third level allocation failure.
  //   After mark sweep and young generation allocation failure,
  //   allocate in old generation.
  if (result == NULL) {
    result = old_gen()->allocate(size);
  }

  // Fourth level allocation failure. We're running out of memory.
  //   More complete mark sweep and allocate in young generation.
  if (result == NULL) {
    do_full_collection(true);
    result = young_gen()->allocate(size);
  }

  // Fifth level allocation failure.
  //   After more complete mark sweep, allocate in old generation.
  if (result == NULL) {
    result = old_gen()->allocate(size);
  }

  return result;
}

这个函数分下面几个步骤来处理Allocation Fail;

  • (1)、用PSScavenge::invoke()去做"scavenge"的工作,可能是一次Minor GC,也可能是FullGC,如果触发了一次FullGC,那么该函数就会返回true,否则返回false;完成之后,尝试从新生代从新申请空间,如果不能成功,则进行第(2)步;
  • (2)、如果PSScavenge::invoke()做的是一次Minor GC,那么此时就要做一次FullGC,但是先不回收soft reference;结束之后重新尝试去新生代申请空间,如果不能满足,那么继续第(3)步;
  • (3)、既然新生代无法申请到空间,那去老年代试试吧,如果在老年代成功申请到了空间,那么就结束,否则继续第(4)步;
  • (4)、在一次FullGC之后,从新生代和老年代均无法获取到空间,那么就只能把soft reference清理一下了,也就是做一次清理soft reference的FullGC;之后再尝试从新生代申请空间,如果还是无法满足,那么执行第(5)步;
  • (5)、第(4)步还是无法申请成功的话,那么就尝试去老年代试试,如果还不行,那就交给上层处理吧(oom)

来看看PSScavenge::invoke()的具体实现细节;

2018-11-13 4 11 37

PSScavenge::invoke_no_policy()首先将被调用进行一次MinorGC,在MinorGC的过程中可能有一些对象达到了晋升阈值,但是可能老年代因为空间不够的问题无法将所有晋升的对象都放到老年代,这个时候就发生了Promotion Fail;因为Scavenge GC的一个特点是可以自动调整各个分代的大小以满足设定的参数,这个过程较为复杂,可以在PSScavenge::invoke_no_policy()里面找到这些代码;Minor GC的过程大概和DefNew是一样的,但是和DefNew不一样的地方就是ParallelScavengeHeap使用了多线程来做GC,所以代码要复杂很多,但是流程还是那样,首先标记GCRoot,然后根据GCRoot去遍历存活对象,之后标记-清除;
接着回到PSScavenge::invoke(),need_full_gc用于判断是否需要进行FullGC,刚才已经试图去做一些MinorGC了,但是MinorGC可能根本没有执行,如果当前线程发现已经有其他线程在做GC了,那么就会直接退出;need_full_gc的判断由两部分组成,首先是PSScavenge::invoke_no_policy()的结果,也就是PSScavenge::invoke_no_policy()是否真的执行了MinorGC,如果没有执行,那么就有必要执行一次FullGC;如果PSScavenge::invoke_no_policy()成功执行了,那么就看policy->should_full_GC,这个policy是PSAdaptiveSizePolicy;下面来看看这个函数的判断:

// If the remaining free space in the old generation is less that
// that expected to be needed by the next collection, do a full
// collection now.
bool PSAdaptiveSizePolicy::should_full_GC(size_t old_free_in_bytes) {

  // A similar test is done in the scavenge's should_attempt_scavenge().  If
  // this is changed, decide if that test should also be changed.
  bool result = padded_average_promoted_in_bytes() > (float) old_free_in_bytes;
  log_trace(gc, ergo)("%s after scavenge average_promoted " SIZE_FORMAT " padded_average_promoted " SIZE_FORMAT " free in old gen " SIZE_FORMAT,
                      result ? "Full" : "No full",
                      (size_t) average_promoted_in_bytes(),
                      (size_t) padded_average_promoted_in_bytes(),
                      old_free_in_bytes);
  return result;
}

判断条件很简单,如果发现YoungGen里面等待晋升到OldGen的对象大小大于oldGen的空闲空间,那么就有必要执行FullGC了;接着看进行FullGC的代码,UseParallelOldGC用于判断老年代使用的堆类型,如果我们在JVM启动的时候使用了-XX:+UseParallelOldGC,那么新生代和老年代的组合就是(Parallel Scavenge + Parallel Old),如果使用的是-XX:+UseParallelGC,那么新生代和老年代的组合就是(Parallel Scavenge + Serial Old);这里假设使用了-XX:+UseParallelGC,那么就看PSMarkSweep::invoke_no_policy(clear_all_softrefs);而Serial Old的GC过程前面的文章已经分析过就不继续了。

现在回到ParallelScavengeHeap::failed_mem_allocate函数,看看剩下的部分;PSScavenge::invoke()执行过后,可能进行了一次MinorGC,或者是FullGC,可能将soft reference清理掉了,但是总得来说执行了PSScavenge::invoke()之后已经清理了一波垃圾了,young_gen()->allocate(size)试图从新生代申请空间;如果申请失败,那么就看刚才是否做了FullGC,如果做了,那么就只能oom了,否则通过do_full_collection(false)做一次FullGC,但是soft reference依然还在;接着分别从young 和 old去申请空间,如果还是无法满足要求,那么就通过do_full_collection(true)来做一次清理FullGC,并且将soft reference清理掉,然后再从young 和 old中去试图申请内存,如果还是无法申请成功,那么就交给上层处理吧。(OOM)

VM_G1OperationWithAllocRequest

VM_G1OperationWithAllocRequest有两个子类:VM_G1CollectForAllocation和VM_G1IncCollectionPause,属于G1的内容,暂时不做分析,后续专门分析G1的相关实现细节;

VM_GenCollectFull

VM_GenCollectFull用于支持一些外部的GC命令,比如System.gc(),可以在GenCollectedHeap::collect_locked函数里面发现VM_GenCollectFull操作:

void GenCollectedHeap::collect_locked(GCCause::Cause cause, GenerationType max_generation) {
  // Read the GC count while holding the Heap_lock
  unsigned int gc_count_before      = total_collections();
  unsigned int full_gc_count_before = total_full_collections();
  {
    MutexUnlocker mu(Heap_lock);  // give up heap lock, execute gets it back
    VM_GenCollectFull op(gc_count_before, full_gc_count_before,
                         cause, max_generation);
    VMThread::execute(&op);
  }
}

genCollectedHeap::collect函数是该操作发生的一个起点,而genCollectedHeap::collect是为了响应类似于System.gc()调用,比如:

JVM_ENTRY_NO_ENV(void, JVM_GC(void))
  JVMWrapper("JVM_GC");
  if (!DisableExplicitGC) {
    Universe::heap()->collect(GCCause::_java_lang_system_gc);
  }
JVM_END

这就是一个System.gc()的请求,而调用的就是Universe::heap()->collect函数,Universe::heap()返回的是JVM的一个高层堆管理器,目前JVM里面有三个这样的堆管理器,分别是GenCollectedHeap、ParallelScavengeHeap和G1CollectedHeap,分别对应不同种类型的GC;GenCollectedHeap对应-XX:+UseSerialGC和-XX:+UseConcMarkSweepGC;ParallelScavengeHeap对应-XX:+UseParallelGC和-XX:+UseParallelOldGC以及-XX:+UseParNewGC;G1CollectedHeap对应-XX:+UseG1GC;这些对应关系是在create_vm的时候创建的,关于堆初始化这部分内容将在后续的文章中分析。
和VM_GenCollectFull类似的还有VM_G1IncCollectionPause(G1)、VM_ParallelGCSystemGC(ParallelGC);这里就不分析这两个Operation了。

结论

Minor GC发生的原因较为简单,就是"Allocation Fail";发生"Allocation Fail"的原因就是没有足够的内存了,这个时候就要去做Minor GC,但是,内存不足之后不一定进行Minor GC,可能因为某些原因直接进行了FullGC,在JVM里面有大量的用于判断是否应该在某个分代进行垃圾收集的函数,这些函数将根据一些统计数据来判断是否应该在该区域进行垃圾收集;比如在某次Eden区域分配失败的时候,Old区域就需要判断是否允许Young区进行一次Minor GC,因为进行MinorGC的时候一些符合晋升年龄的对象将会晋升到老年代中来,还有一部分对象因为无法移动到To区域(To区满了或者连续空间小于存活对象大小)也需要提前拷贝到老年代,这些对象转移到老年代对老年代来说是一种负担,并且也是有风险的,比如可能老年代根本没有足够的内存容纳这次Minor GC之后晋升的对象,这个时候MinorGC就要报"Promotion Fail",这就需要开启一次FullGC来回收掉一些不再使用的对象,也可能包括正在使用的soft reference;还有一些发生FullGC的条件(或者说是触发FullGC)本文没有分析到,主要原因是关于G1和CMS还不太了解,CMS和G1是相对复杂的GC,需要花费大量的时间去研究分析以及描述出来。

发生GC有两种原因,主动进行GC和被动进行GC,被动GC就是类似于System.gc(),主动GC发生在allocate的时候,如果可以,应该尽量避免让GC被动GC,因为这会打乱JVM的GC计划,应该相信JVM可以做得足够好,让我们不需要担心GC的问题,这也是Java相比于类似于C/C++的主要优势之一。

@pandening

OwnerAuthor

pandening commented on 13 Nov 2018 • 

edited 

JVM参数解析以及Heap初始化过程分析

在create_vm的时候,我们设置的JVM参数会被解析出来,然后生成各种策略,比如设置了 -XX:+UseSerialGC,那么JVM就会适应Serial GC来作为堆的管理者,当然,也就会初始化新生代和老年代,不同的参数设置会生成不同的GC策略,JVM参数众多,不同参数之间有可能互相影响,有些参数可能导致非常诡异的现象,所以在设置JVM参数的时候,如果对一个参数并不是很了解,不要轻易设置。本文将从JVM参数解析开始说起,然后会分析一下堆的初始化,分析堆的初始化的过程也就是去分析JVM是如何使用我们设置的JVM参数的过程。

JVM参数解析

Arguments::parse(args)函数是JVM参数解析的入口,在thread.cpp里面的create_vm函数里面可以找到这个函数调用,因为JVM可配置的参数特别多,所以本文不打算将所有的JVM参数都讲一下,下面的文章将只是介绍一下JVM解析参数的流程,会拿几个参数来具体分析其解析的流程;parse_vm_init_args是我比较关注的函数,类似于-XX:+UseSerialGC这样的参数将在从这里开始进行解析,当然,具体的解析是在parse_each_vm_init_arg里面完成的,所以直接来关注parse_each_vm_init_arg函数;下面来看看JVM参数-Xms,-Xmx以及类似于-XX:+UseConcMarkSweepGC这样的参数是怎么解析的。

48420272-fcf6aa00-e794-11e8-897c-e9d3c74f1eca.pnguploading.gif转存失败重新上传取消2018-11-13 10 35 38

-Xms用于设置堆的最小容量,-Xmx(或者-XX:MaxHeapSize)用于设置堆的最大容量,如果-Xms设置的大小和-Xmx一样大,那么堆就是不可扩展的,否则堆就是可以动态扩展的;上面的代码片段就是用来解析-Xms和-Xmx两个参数的,解析好的参数会设定到相应的全局共享变量中去,比如-Xms就会被设置到InitialHeapSize中去,-Xmx会设置到MaxHeapSize中去;这两个是数值型的参数,下面来看一个flag类型的参数设置,比如我们使用-XX:+UseConcMarkSweepGC,那么这个参数是怎么被JVM识别出来的呢?下面来分析一下。

2018-11-13 10 48 25

还是在同样的函数里面解析,上面的代码片段是解析类似-XX:+UseConcMarkSweepGC参数的入口,parse_argument负责具体的解析工作,下面来看看parse_argument函数是怎么实现解析这样的参数并且设置到全局变量中去的。parse_argument函数可以在process_argument中找到;

2018-11-13 10 54 40

图中标注的就是解析的关键,+%或者-%用于匹配-XX:+UseSerialGC中的+,下面可以看一个实际的启动时参数解析例子:

2018-11-13 11 00 15

set_bool_flag会将解析到的flag对应的全局变量设置为true,可以具体看看set_bool_flag函数是如何做到这一点的。

static bool set_bool_flag(const char* name, bool value, Flag::Flags origin) {
  if (CommandLineFlags::boolAtPut(name, &value, origin) == Flag::SUCCESS) {
    return true;
  } else {
    return false;
  }
}

Flag::Error CommandLineFlags::boolAtPut(const char* name, size_t len, bool* value, Flag::Flags origin) {
  Flag* result = Flag::find_flag(name, len);
  return boolAtPut(result, value, origin);
}

find_flag将找到对应的flag信息,可以在下面的debug界面中看到找到了我们设置的flag,找到flag之后会调用boolAtPut函数来设置全局变量:

2018-11-13 11 04 18

Flag::Error CommandLineFlags::boolAtPut(Flag* flag, bool* value, Flag::Flags origin) {
  const char* name;
  if (flag == NULL) return Flag::INVALID_FLAG;
  if (!flag->is_bool()) return Flag::WRONG_FORMAT;
  name = flag->_name;
  Flag::Error check = apply_constraint_and_check_range_bool(name, *value, !CommandLineFlagConstraintList::validated_after_ergo());
  if (check != Flag::SUCCESS) return check;
  bool old_value = flag->get_bool();
  trace_flag_changed<EventBooleanFlagChanged, bool>(name, old_value, *value, origin);
  check = flag->set_bool(*value);
  *value = old_value;
  flag->set_origin(origin);
  return check;
}

在设置了JVM参数之后,我们也不知道参数这样设置是否存在问题,或者是否有冲突,但是JVM必须能够发现这种冲突,并且及时给出提示,check_vm_args_consistency函数将完成JVM参数设置校验的工作,比如校验GC设置是否合理是通过调用check_gc_consistency函数来完成的:

2018-11-13 11 32 54

JVM参数的解析部分析就到这里,但是还是得说一下,为什么有时候我们什么参数也不配置,JVM也能运行起来呢?Arguments::apply_ergo()就是做这个工作的,它会进行一些自动的配置来启动JVM,比如选择GC等,select_gc就是做这件事情的:

void Arguments::select_gc() {
  if (!gc_selected()) {
    select_gc_ergonomically();
    if (!gc_selected()) {
      vm_exit_during_initialization("Garbage collector not selected (default collector explicitly disabled)", NULL);
    }
  }
}

gc_selected首先判断是否设置了GC,判断条件很简单:

bool Arguments::gc_selected() {
#if INCLUDE_ALL_GCS
  return UseSerialGC || UseParallelGC || UseParallelOldGC || UseConcMarkSweepGC || UseG1GC;
#else
  return UseSerialGC;
#endif // INCLUDE_ALL_GCS
}

如果没有设置,select_gc_ergonomically将选择一个合适的GC,在java9里面的实现如下:

void Arguments::select_gc_ergonomically() {
  if (os::is_server_class_machine()) {
    if (!UseAutoGCSelectPolicy) {
       FLAG_SET_ERGO_IF_DEFAULT(bool, UseG1GC, true);
    } else {
      if (should_auto_select_low_pause_collector()) {
        FLAG_SET_ERGO_IF_DEFAULT(bool, UseConcMarkSweepGC, true);
        FLAG_SET_ERGO_IF_DEFAULT(bool, UseParNewGC, true);
      } else {
        FLAG_SET_ERGO_IF_DEFAULT(bool, UseParallelGC, true);
      }
    }
  } else {
    FLAG_SET_ERGO_IF_DEFAULT(bool, UseSerialGC, true);
  }
}

选择的策略和当前JVM的Mode有关,如果是client模式,则默认选择SerialGC,这也是Client模式下的最优的GC;如果是在Server模式下,那么如果没有设置UseAutoGCSelectPolicy的话,就默认使用G1(所以说java9默认的GC是G1),如果设置了UseAutoGCSelectPolicy,那么根据should_auto_select_low_pause_collector的结果来选择;

bool Arguments::should_auto_select_low_pause_collector() {
  if (UseAutoGCSelectPolicy &&
      !FLAG_IS_DEFAULT(MaxGCPauseMillis) &&
      (MaxGCPauseMillis <= AutoGCSelectPauseMillis)) {
    return true;
  }
  return false;
}

如果should_auto_select_low_pause_collector返回true,那么就选择CMS,否则使用UseParallelGC;前者是相应时间优先GC,后者则是吞吐量优先GC。

JVM堆的初始化

JVM参数解析之后,在初始化JVM堆的时候就可以使用我们设置的JVM参数了,不同的参数使用的堆是不一样的,GC策略也是有所差异的,下面来分析一下堆的初始化过程;initialize_heap函数用于初始化堆,下面简单分几个步骤分析一下这个函数具体做了些什么工作。

  • (1)、创建堆

首先要做的事情就是要创建使用的堆,创建哪种类型的堆和设置的GC参数有关,create_heap函数将完成创建堆的工作;

2018-11-13 11 27 58

创建什么类型的堆依赖于选择了什么类型的GC,JVM提供了四种类型的GC,分别是并行GC(UseParallelGC),也就是使用多线程来做GC,G1 (UseG1GC),CMS以及串行GC(UseSerialGC);Universe::create_heap_with_policy函数用于创建对应的堆,它的两个泛型类型,一个是堆的类型Heap,一个是管理堆的策略Policy,比如对于UseSerialGC,那么创建的堆就是GenCollectedHeap,堆管理的策略就是MarkSweepPolicy;在HotSpot中,堆的实现是一种典型的分代实现,简单来说分为新生代和老年代,不同的分代存放的对象具有不一样的特征,但是不同特征的对象也可能放在一起,分在不同分代中的特征包括对象的GC年龄以及对象的大小等因素,对象将优先在Eden中存活,经过多次Minor GC依然存活的对象将晋升(Promotion)到老年代,但是晋升可能失败,所以有部分本该晋升到老年代的对象依然存活在新生代,而在做Minor GC的时候,如果Eden + From中存活的对象无法拷贝到To区域,那么也会直接转移到老年代,这称为提前晋升,还有一些比较大的对象会直接在老年代申请空间;下面的文章将以UseSerialGC为例,看看堆创建的后续流程。

先来看一下create_heap_with_policy函数的实现:

template <class Heap, class Policy>
CollectedHeap* Universe::create_heap_with_policy() {
  Policy* policy = new Policy();
  policy->initialize_all();
  return new Heap(policy);
}

对于UseSerialGC来说,policy就是MarkSweepPolicy,Heap就是GenCollectedHeap;下面分别看看策略的初始化和堆的初始化。

Policy初始化

initialize_all函数应该是我们应该主要关心的,这个函数在基类GenCollectorPolicy中实现:

  virtual void initialize_all() {
    CollectorPolicy::initialize_all();
    initialize_generations();
  }

CollectorPolicy::initialize_all()函数的实现在CollectorPolicy里面,实现如下:

  virtual void initialize_all() {
    initialize_alignments();
    initialize_flags();
    initialize_size_info();
  }

initialize_alignments会根据os的page大小来设置空间对齐参数,稍后会根据这些对齐参数来将我们设置的各种堆大小对齐到合理的值,所以JVM里面的实际堆大小并不会精确的等于我们设置的大小,而是会做对齐操作;
initialize_flags的工作是根据我们设定的JVM参数来设置一些全局变量的值,这里的设置是"修正"设置,在参数解析的时候已经设置过了,但是现在某些参数需要被重写设置,比如堆的大小参数,需要对齐一下大小再重新设置。比如下面的代码片段:

2018-11-14 12 08 46

_min_heap_byte_size表示堆的最小值,align_size_up函数用于对齐堆的大小;aligned_initial_heap_size是对齐之后的堆初始化大小,如果和InitialHeapSize大小不一样,就要重新设置一下InitialHeapSize;MaxHeapSize也是同样的处理方法;initialize_size_info函数相对来说比较复杂,它的工作就是确定新生代和老年代的堆大小,比如新生代的初始化堆大小,以及最大堆大小等信息,下面看看细节:

// Determine maximum size of the young generation.

  if (FLAG_IS_DEFAULT(MaxNewSize)) {
    _max_young_size = scale_by_NewRatio_aligned(_max_heap_byte_size);
    // Bound the maximum size by NewSize below (since it historically
    // would have been NewSize and because the NewRatio calculation could
    // yield a size that is too small) and bound it by MaxNewSize above.
    // Ergonomics plays here by previously calculating the desired
    // NewSize and MaxNewSize.
    _max_young_size = MIN2(MAX2(_max_young_size, _initial_young_size), MaxNewSize);
  }

这段代码要确定_max_young_size的大小,也就是新生代的大小,如果我们使用-Xmn设置了新生代的大小,那么就不用执行这段代码,否则就要通过scale_by_NewRatio_aligned函数来确定新生代的大小,scale_by_NewRatio_aligned的实现如下:

size_t GenCollectorPolicy::scale_by_NewRatio_aligned(size_t base_size) {
  return align_size_down_bounded(base_size / (NewRatio + 1), _gen_alignment);
}

我们可以使用-XX:NewRatio来设置新生代的占用整个堆的比例,NewRatio默认为2,也就是young_gen_size = heap_size / (NewRatio + 1);接着看下面的代码:

  if (_max_heap_byte_size == _initial_heap_byte_size) {
    // The maximum and initial heap sizes are the same so the generation's
    // initial size must be the same as it maximum size. Use NewSize as the
    // size if set on command line.
    _max_young_size = FLAG_IS_CMDLINE(NewSize) ? NewSize : _max_young_size;
    _initial_young_size = _max_young_size;

    // Also update the minimum size if min == initial == max.
    if (_max_heap_byte_size == _min_heap_byte_size) {
      _min_young_size = _max_young_size;
    }
  }

如果堆不可扩展,也就是-Xms和-Xmx是相等的,那么就会执行这段代码,_max_young_size会根据是否设定了NewSize来确定,如果设定了那就取设定的NewSize(-Xmn),接着_initial_young_size会被设定了_max_young_size,也就是新生代不可扩展了;这里稍微说一下,DefNew不会进行堆扩展,如果Eden无法满足申请空间的要求的时候,他就会尝试去From去申请内存;如果堆可扩展,那么就会执行下面的代码:

if (FLAG_IS_CMDLINE(NewSize)) {
      // If NewSize is set on the command line, we should use it as
      // the initial size, but make sure it is within the heap bounds.
      _initial_young_size =
        MIN2(_max_young_size, bound_minus_alignment(NewSize, _initial_heap_byte_size));
      _min_young_size = bound_minus_alignment(_initial_young_size, _min_heap_byte_size);
    } else {
      // For the case where NewSize is not set on the command line, use
      // NewRatio to size the initial generation size. Use the current
      // NewSize as the floor, because if NewRatio is overly large, the resulting
      // size can be too small.
      _initial_young_size =
        MIN2(_max_young_size, MAX2(scale_by_NewRatio_aligned(_initial_heap_byte_size), NewSize));
    }

至此,新生代_min_young_size、_initial_young_size、_max_young_size都已经确定了,下面就是确定老年代的这三个变量;这部分内容就不再赘述了,后续再专门研究吧。
执行完CollectorPolicy::initialize_all()之后,initialize_generations就要被执行,这个函数为创建和初始化堆进行准备,对于MarkSweepPolicy来说实现如下:

void MarkSweepPolicy::initialize_generations() {
  _young_gen_spec = new GenerationSpec(Generation::DefNew, _initial_young_size, _max_young_size, _gen_alignment);
  _old_gen_spec   = new GenerationSpec(Generation::MarkSweepCompact, _initial_old_size, _max_old_size, _gen_alignment);
}

可以看到新生代是DefNew,老年代是MarkSweepCompact,上面计算好的新生代老年代的堆大小也被设置到GenerationSpec对象中了,后续会使用这些参数来创建具体的堆以及初始化堆空间。

Heap初始化

下面以GenCollectedHeap为例看看堆是如何初始化的;在initialize_heap中调用create_heap之后,就会调用创建好的堆的initialize函数来初始化堆,对应着看GenCollectedHeap的initialize函数;

2018-11-14 4 52 51

主要看标记出来的两行代码,分别初始化了新生代和老年代,gen_policy()->young_gen_spec()函数将返回上面设定的GenerationSpec,然后init函数将根据具体的堆类型进行创建新生代和老年代并且初始化;

2018-11-14 4 56 53

比如对于+UseSerialGC,新生代就是DefNew,老年代就是MarkSweepCompact;下面看看新生代是如何进行初始化的,DefNewGeneration::DefNewGeneration这个构造函数将用来创建一个DefNew,下图展示了几个关键的地方:

2018-11-14 5 02 13

@pandening

OwnerAuthor

pandening commented on 14 Nov 2018 • 

edited 

UseConcMarkSweepGC下的内存申请流程分析

-XX:+UseConcMarkSweepGC俗称CMS,是一种减少GC停顿时间的堆管理方案,使用的堆管理器是GenCollectedHeap,新生代堆类型是ParNew,老年代是ConcurrentMarkSweepGeneration,新生代使用多线程版本的copy算法来进行垃圾收集,将新生代分为Eden + From + To三个空间区域;老年代使用CMS来进行周期性的垃圾收集,可以通过设置CMSInitiatingOccupancyFraction来让CMS检测是否需要进行一次CMS GC,CMSInitiatingOccupancyFraction的默认值为92%,也就是如果老年代的空间使用占了92%,那么就会进行一次CMS GC,这个默认值是计算出来的:

void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, uintx tr) {
  assert(io <= 100 && tr <= 100, "Check the arguments");
  if (io >= 0) {
    _initiating_occupancy = (double)io / 100.0;
  } else {
    _initiating_occupancy = ((100 - MinHeapFreeRatio) +
                             (double)(tr * MinHeapFreeRatio) / 100.0)
                            / 100.0;
  }
}

参数io是CMSInitiatingOccupancyFraction,trCMSTriggerRatio;是如果设置了CMSInitiatingOccupancyFraction,那么_initiating_occupancy就是(double)io / 100.0,否则通过else分支中的计算分支来计算,假设没有设置CMSTriggerRatio,默认就是80,MinHeapFreeRatio是40;那么计算结果就是0.92;CMS的GC分为background gc和foreground gc,前者是CMS线程进行不但检测是否需要进行CMS GC来实现垃圾回收的,属于后台任务;而后者是被"Allocation Fail"或者“Promotion Fail”触发的,是一种主动的GC,而主动GC是要全程STW的,在实现上使用了SerialOld的策略,使用标记-清除-整理算法来进行整个堆空间的垃圾回收;关于CMS GC的详细细节另论,本文的重点在于UseConcMarkSweepGC下的对象内存分配策略探索。

在UseConcMarkSweepGC下对象依然首先在Eden中进行内存申请,UseConcMarkSweepGC新生代使用的是ParNew,是DefNew的子类,ParNew上的GC是DefNew上GC的多线程版本,在ParNew上进行空间分配应该也和DefNew差不多,下面来看看UseConcMarkSweepGC下内存分配的全流程。

因为前面的文章已经分析过对象在UseSerialGC下的内存申请流程,所以对于CMS的内存申请直接从CollectedHeap::common_mem_allocate_noinit函数开始看起,在UseSerialGC的时候也说过该函数,这个函数首先allocate_from_tlab函数来试图从TLAB申请空间,如果无法满足,那么就重新申请一块TLAB,申请一块TLAB和为对象申请空间的流程对于堆来说都是内存申请,所以后续的流程是一致的;如果通过TLAB无法申请到内存,那么就通过Universe::heap()->mem_allocate来直接在堆中申请内存,这个时候就要加锁了,因为堆面向的是所有线程,不像TLAB是线程私有的,所以会存在多线程竞争的问题,所以但愿TLAB可以有效;GenCollectorPolicy::mem_allocate_work将完成再堆中内存申请的流程,下面就主要来分析一下这个函数的具体实现。

    // First allocation attempt is lock-free.
    Generation *young = gch->young_gen();
    assert(young->supports_inline_contig_alloc(),
      "Otherwise, must do alloc within heap lock");
    if (young->should_allocate(size, is_tlab)) {
      result = young->par_allocate(size, is_tlab);
      if (result != NULL) {
        assert(gch->is_in_reserved(result), "result not in heap");
        return result;
      }
    }

young->should_allocate用于判断是否应该在新生代进行空间申请,大对象应该直接在老年代进行分配,如果不是大对象,那么就会通过young->par_allocate来进行空间申请,young->par_allocate使用的是DefNew的实现,ParNew继承了DefNew的young->par_allocate实现;

HeapWord* DefNewGeneration::par_allocate(size_t word_size,
                                         bool is_tlab) {
  HeapWord* res = eden()->par_allocate(word_size);
  if (CMSEdenChunksRecordAlways && _old_gen != NULL) {
    _old_gen->sample_eden_chunk();
  }
  return res;
}

可以看到是向Eden空间申请内存,具体实现时通过ContiguousSpace::par_allocate_impl来进行的,关于这块的内容前面的文章已经分析过,不再赘述,因为使用copying算法来进行垃圾回收,不会存在内存碎片问题,所以可以使用指针碰撞算法来进行空间分配,所谓指针碰撞就是使用一个top指针,来标记当前空闲内存的起始地址,分配一块size大小的内存空间的实现就是将top指针向前移动size即可实现;如果无法从Eden空间分配到内存,那么就要试图从From区域分配内存了,gch->attempt_allocation将实现先尝试从Eden区域申请内存,如果无法成功,那么尝试从From区域分配,如果还不可以,那么就从Old区域分配的逻辑,具体实现如下:

HeapWord* GenCollectedHeap::attempt_allocation(size_t size,
                                               bool is_tlab,
                                               bool first_only) {
  HeapWord* res = NULL;
  if (_young_gen->should_allocate(size, is_tlab)) {
    res = _young_gen->allocate(size, is_tlab);
    if (res != NULL || first_only) {
      return res;
    }
  }
  if (_old_gen->should_allocate(size, is_tlab)) {
    res = _old_gen->allocate(size, is_tlab);
  }
  return res;
}

_young_gen->allocate将会从From区域尝试申请内存:

HeapWord* DefNewGeneration::allocate(size_t word_size, bool is_tlab) {
  // This is the slow-path allocation for the DefNewGeneration.
  // Most allocations are fast-path in compiled code.
  // We try to allocate from the eden.  If that works, we are happy.
  // Note that since DefNewGeneration supports lock-free allocation, we
  // have to use it here, as well.
  HeapWord* result = eden()->par_allocate(word_size);
  if (result != NULL) {
    if (CMSEdenChunksRecordAlways && _old_gen != NULL) {
      _old_gen->sample_eden_chunk();
    }
  } else {
    // If the eden is full and the last collection bailed out, we are running
    // out of heap space, and we try to allocate the from-space, too.
    // allocate_from_space can't be inlined because that would introduce a
    // circular dependency at compile time.
    result = allocate_from_space(word_size);
  }
  return result;
}

eden()->par_allocate将从Eden区域申请内存,如果无法满足,那么就通过allocate_from_space从From区域进行内存分配;

// The last collection bailed out, we are running out of heap space,
// so we try to allocate the from-space, too.
HeapWord* DefNewGeneration::allocate_from_space(size_t size) {
  bool should_try_alloc = should_allocate_from_space() || GCLocker::is_active_and_needs_gc();

  // If the Heap_lock is not locked by this thread, this will be called
  // again later with the Heap_lock held.
  bool do_alloc = should_try_alloc && (Heap_lock->owned_by_self()
                                       || (SafepointSynchronize::is_at_safepoint()
                                           && Thread::current()->is_VM_thread()));
  HeapWord* result = NULL;
  if (do_alloc) {
    result = from()->allocate(size);
  }
  return result;
}

当然,需要判断是否允许在From区域进行内存分配,如果不允许,那么还是无法在From区域进行分配;should_allocate_from_space将完成这个判断,当然,如果当前有线程在进行GC,那么是运行从From区域进行内存分配的,下面看看should_allocate_from_space函数的具体判断逻辑:

  bool should_allocate_from_space() const {
    return _should_allocate_from_space;
  }

  void clear_should_allocate_from_space() {
    _should_allocate_from_space = false;
  }
  void set_should_allocate_from_space() {
    _should_allocate_from_space = true;
  }

很简单,直接返回_should_allocate_from_space的值,所以来看看在什么时候设置了该值即可找到判断逻辑:

2018-11-15 10 57 21

判断条件还是比较严格的,首先collection_attempt_is_safe是true,并且Eden已经满了,collection_attempt_is_safe函数的实现如下:

bool DefNewGeneration::collection_attempt_is_safe() {
  if (!to()->is_empty()) {
    log_trace(gc)(":: to is not empty ::");
    return false;
  }
  if (_old_gen == NULL) {
    GenCollectedHeap* gch = GenCollectedHeap::heap();
    _old_gen = gch->old_gen();
  }
  return _old_gen->promotion_attempt_is_safe(used());
}

如果To区域不为空,那么就直接不可以在From区域进行分配,To区域不为空就说明发生了“Promotion Fail”,如果没有发生过“Promotion Fail”,那么判断晋升是否是安全的,通过_old_gen->promotion_attempt_is_safe函数来实现:

bool ConcurrentMarkSweepGeneration::promotion_attempt_is_safe(size_t max_promotion_in_bytes) const {
  size_t available = max_available();
  size_t av_promo  = (size_t)gc_stats()->avg_promoted()->padded_average();
  bool   res = (available >= av_promo) || (available >= max_promotion_in_bytes);
  return res;
}

available是老年代可用内存大小,av_promo是新生代评价晋升对象大小,max_promotion_in_bytes是新生代的使用量(Eden + From),所以,如果老年代的可用空间大于新生代评价晋升对象大小,或者大于新生代的使用量,那么就说明年轻代晋升是安全的,否则就是不安全的;

总结一下,如果当前有线程在进行GC,或者Eden区域已经满了,或者老年代判断晋升是安全的,那么就运行在From区域进行分配,否则只能到老年代去分配了;
如果新生代(包括从Eden和From区域)无法申请到内存的话,那么就要去老年代试试了,_old_gen->should_allocate首先判断是否可以在老年代进行内存申请,如果允许,那么就通过_old_gen->allocate函数来申请内存,下面先来看看_old_gen->should_allocate的实现;

  // Returns "true" iff this generation should be used to allocate an
  // object of the given size.  Young generations might
  // wish to exclude very large objects, for example, since, if allocated
  // often, they would greatly increase the frequency of young-gen
  // collection.
  virtual bool should_allocate(size_t word_size, bool is_tlab) {
    bool result = false;
    size_t overflow_limit = (size_t)1 << (BitsPerSize_t - LogHeapWordSize);
    if (!is_tlab || supports_tlab_allocation()) {
      result = (word_size > 0) && (word_size < overflow_limit);
    }
    return result;
  }

如果上述函数判断是true,那么就通过_old_gen->allocate来从老年代申请内存:

HeapWord* ConcurrentMarkSweepGeneration::allocate(size_t size, bool tlab) {
  CMSSynchronousYieldRequest yr;
  MutexLockerEx x(freelistLock(), Mutex::_no_safepoint_check_flag);
  return have_lock_and_allocate(size, tlab);
}

HeapWord* ConcurrentMarkSweepGeneration::have_lock_and_allocate(size_t size,
                                                                bool   tlab /* ignored */) {
  assert_lock_strong(freelistLock());
  size_t adjustedSize = CompactibleFreeListSpace::adjustObjectSize(size);
  HeapWord* res = cmsSpace()->allocate(adjustedSize);
  // Allocate the object live (grey) if the background collector has
  // started marking. This is necessary because the marker may
  // have passed this address and consequently this object will
  // not otherwise be greyed and would be incorrectly swept up.
  // Note that if this object contains references, the writing
  // of those references will dirty the card containing this object
  // allowing the object to be blackened (and its references scanned)
  // either during a preclean phase or at the final checkpoint.
  if (res != NULL) {
    // We may block here with an uninitialized object with
    // its mark-bit or P-bits not yet set. Such objects need
    // to be safely navigable by block_start().
    assert(oop(res)->klass_or_null() == NULL, "Object should be uninitialized here.");
    assert(!((FreeChunk*)res)->is_free(), "Error, block will look free but show wrong size");
    collector()->direct_allocated(res, adjustedSize);
    _direct_allocated_words += adjustedSize;
    // allocation counters
    NOT_PRODUCT(
      _numObjectsAllocated++;
      _numWordsAllocated += (int)adjustedSize;
    )
  }
  return res;
}

CompactibleFreeListSpace将会负责CMS老年代的内存分配工作,这里需要说一下的是,CMS老年代和DefNew或者ParNew都不一样,CMS老年代堆可能会产生内存碎片,所以无法使用指针碰撞算法来进行内存分配,CMS老年代使用了称为空闲列表(Free-List)的算法来管理老年代的内存,下面来看看CompactibleFreeListSpace的allocate函数的实现:

HeapWord* CompactibleFreeListSpace::allocate(size_t size) {
  HeapWord* res = NULL;
  res = allocate_adaptive_freelists(size);
  if (res != NULL) {
    FreeChunk* fc = (FreeChunk*)res;
    fc->markNotFree();
    // Verify that the block offset table shows this to
    // be a single block, but not one which is unallocated.
    _bt.verify_single_block(res, size);
    _bt.verify_not_unallocated(res, size);
  }
  return res;
}

allocate_adaptive_freelists函数将尽最大努力来找到一块合适的内存,这里面的流程也是非常复杂的,但是这里的实现像极了C++ STL中内存池的实现,所以如果有条件的话还是希望去分析一下C++ STL内存池的相关实现。下面来看看allocate_adaptive_freelists函数的具体实现。

2018-11-15 11 21 27

这里顺便说一下,如果从Old区域中也无法满足申请要求,那么就得去通过expand_heap_and_allocate扩展堆再来allocate了,如果还不行,那么就执行进行GC了,VM_GenCollectForAllocation将会被放在VMThread中等待执行,具体执行Minor GC还是FullGC需要具体判断,这部分内容在前面的文章中分析过,就不再赘述,下面将详细分析CMS Free-List内存分配的实现细节,也就是allocate_adaptive_freelists函数的具体实现细节。

HeapWord* CompactibleFreeListSpace::allocate_adaptive_freelists(size_t size) {
  assert_lock_strong(freelistLock());
  HeapWord* res = NULL;
  assert(size == adjustObjectSize(size),
         "use adjustObjectSize() before calling into allocate()");

  // Strategy
  //   if small
  //     exact size from small object indexed list if small
  //     small or large linear allocation block (linAB) as appropriate
  //     take from lists of greater sized chunks
  //   else
  //     dictionary
  //     small or large linear allocation block if it has the space
  // Try allocating exact size from indexTable first
  if (size < IndexSetSize) {
    res = (HeapWord*) getChunkFromIndexedFreeList(size);
    if(res != NULL) {
      assert(res != (HeapWord*)_indexedFreeList[size].head(),
        "Not removed from free list");
      // no block offset table adjustment is necessary on blocks in
      // the indexed lists.

    // Try allocating from the small LinAB
    } else if (size < _smallLinearAllocBlock._allocation_size_limit &&
        (res = getChunkFromSmallLinearAllocBlock(size)) != NULL) {
        // if successful, the above also adjusts block offset table
        // Note that this call will refill the LinAB to
        // satisfy the request.  This is different that
        // evm.
        // Don't record chunk off a LinAB?  smallSplitBirth(size);
    } else {
      // Raid the exact free lists larger than size, even if they are not
      // overpopulated.
      res = (HeapWord*) getChunkFromGreater(size);
    }
  } else {
    // Big objects get allocated directly from the dictionary.
    res = (HeapWord*) getChunkFromDictionaryExact(size);
    if (res == NULL) {
      // Try hard not to fail since an allocation failure will likely
      // trigger a synchronous GC.  Try to get the space from the
      // allocation blocks.
      res = getChunkFromSmallLinearAllocBlockRemainder(size);
    }
  }

  return res;
}

CMS使用的Free-List分配算法策略复杂,当然复杂带来的好处是高效的内存分配速率;这一块内容日后再来整理。

@pandening

OwnerAuthor

pandening commented on 20 Nov 2018 • 

edited 

UseConcMarkSweepGC下的GC流程分析

相比于SerialGC,CMS要复杂得多,因为他是第一个GC线程可以和用户线程并发执行的GC,GC线程和用户线程并发执行这件事情是非常困难的,也是极其复杂的,因为垃圾收集的同时,用户线程还在不断的产生垃圾,或者改变引用关系,使得已经被GC线程标记为垃圾的对象活起来了,这些情况都需要CMS能够很好的去解决;

CMS GC分为foreground gc和background gc,foreground gc是一种主动式GC,是Minor GC造成的一种FullGC,foreground gc将和Serial old使用同样的垃圾收集算法来做FullGC(单线程,mark-sweep-compact);如果触发了foreground gc,但是发现此时background gc正在工作,那么就会发生"Concurrent model fail";background gc也就是CMS old GC,只会收集老年代(ConcurrentMarkSweepGeneration),是一种周期性被动GC,ConcurrentMarkSweepThread会周期性的检测是否需要触发一次background gc,判断条件一般是老年代空间使用超过了设置的触发CMS old GC的阈值,默认为92%,可以通过CMSInitiatingOccupancyFraction来设置具体的值,建议开启-XX:+UseCMSInitiatingOccupancyOnly,否则CMS会根据收集到的数据进行判断,这样可能情况就变得更加复杂了。

UseConcMarkSweepGC依然使用GenCollectedHeap作为堆管理器,所以GC策略还是和Serial GC一样,这里就不再赘述,本文剩下的内容主要分析CMS Old GC的实现细节,以及background gc和foreground gc之间是如何相互配合来回收垃圾的。CMS过程复杂,下面是CMS Old GC可能经过的状态枚举:

  // CMS abstract state machine
  // initial_state: Idling
  // next_state(Idling)            = {Marking}
  // next_state(Marking)           = {Precleaning, Sweeping}
  // next_state(Precleaning)       = {AbortablePreclean, FinalMarking}
  // next_state(AbortablePreclean) = {FinalMarking}
  // next_state(FinalMarking)      = {Sweeping}
  // next_state(Sweeping)          = {Resizing}
  // next_state(Resizing)          = {Resetting}
  // next_state(Resetting)         = {Idling}
  // The numeric values below are chosen so that:
  // . _collectorState <= Idling ==  post-sweep && pre-mark
  // . _collectorState in (Idling, Sweeping) == {initial,final}marking ||
  //                                            precleaning || abortablePrecleanb
 public:
  enum CollectorState {
    Resizing            = 0,
    Resetting           = 1,
    Idling              = 2,
    InitialMarking      = 3,
    Marking             = 4,
    Precleaning         = 5,
    AbortablePreclean   = 6,
    FinalMarking        = 7,
    Sweeping            = 8
  };

Idling状态是初始状态,也代表background gc目前不在进行垃圾收集,此时进行foreground gc是不会发生 "Concurrent mode fail"的,简单说,CMS Old GC需要经过初始标记(STW)、并发标记、最终标记(STW)、清理垃圾这么几个关键的步骤,看起来CMS Old GC的过程中一直在做标记的工作,这主要是CMS希望能尽量缩短暂停用户线程的时候,所以有些阶段就直接和用户线程并发运行了,这就导致会产生“浮动垃圾”,使得CMS整体实现非常复杂难懂,下面按照一些关键步骤尝试分析每一步所做的事情,以及每一步存在的意义以及可能存在的一些运行时表现。

CMSCollector::collect_in_background函数完成的工作就是background gc的工作,foreground gc的工作由CMSCollector::collect函数完成,下面的分析的入口均从这连个函数进入。

InitialMarking (初始标记)

初始标记是一个STW的过程,当CMS 发现当前状态_collectorState为InitialMarking的时候就会执行初始化标记的工作,下面是InitialMarking工作的入口代码:

      case InitialMarking:
        {
          ReleaseForegroundGC x(this);
          stats().record_cms_begin();
          VM_CMS_Initial_Mark initial_mark_op(this);
          VMThread::execute(&initial_mark_op);
        }
        // The collector state may be any legal state at this point
        // since the background collector may have yielded to the
        // foreground collector.
        break;

VM_CMS_Initial_Mark的doit函数将被VMThread调度执行,下面来看看VM_CMS_Initial_Mark的doit函数的具体工作内容。

void VM_CMS_Initial_Mark::doit() {
  HS_PRIVATE_CMS_INITMARK_BEGIN();
  GCIdMark gc_id_mark(_gc_id);

  _collector->_gc_timer_cm->register_gc_pause_start("Initial Mark");

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  GCCauseSetter gccs(gch, GCCause::_cms_initial_mark);

  VM_CMS_Operation::verify_before_gc();

  IsGCActiveMark x; // stop-world GC active
  _collector->do_CMS_operation(CMSCollector::CMS_op_checkpointRootsInitial, gch->gc_cause());

  VM_CMS_Operation::verify_after_gc();

  _collector->_gc_timer_cm->register_gc_pause_end();

  HS_PRIVATE_CMS_INITMARK_END();
}

_collector->do_CMS_operation将被执行,看参数中CMSCollector::CMS_op_checkpointRootsInitial可知接下来会进行初始化标记的过程,CMSCollector::do_CMS_operation函数内容如下:

void CMSCollector::do_CMS_operation(CMS_op_type op, GCCause::Cause gc_cause) {
  GCTraceCPUTime tcpu;
  TraceCollectorStats tcs(counters());

  switch (op) {
    case CMS_op_checkpointRootsInitial: {
      GCTraceTime(Info, gc) t("Pause Initial Mark", NULL, GCCause::_no_gc, true);
      SvcGCMarker sgcm(SvcGCMarker::OTHER);
      checkpointRootsInitial();
      break;
    }
    case CMS_op_checkpointRootsFinal: {
      GCTraceTime(Info, gc) t("Pause Remark", NULL, GCCause::_no_gc, true);
      SvcGCMarker sgcm(SvcGCMarker::OTHER);
      checkpointRootsFinal();
      break;
    }
    default:
      fatal("No such CMS_op");
  }
}

这个函数在FinalMarking阶段也会被调用,对应的Operation就是CMS_op_checkpointRootsFinal,无论是CMS_op_checkpointRootsFinal还是CMS_op_checkpointRootsInitial都是STW的,现在来看看CMS_op_checkpointRootsInitial对应的流程;checkpointRootsInitial函数将被调用:

// Checkpoint the roots into this generation from outside
// this generation. [Note this initial checkpoint need only
// be approximate -- we'll do a catch up phase subsequently.]
void CMSCollector::checkpointRootsInitial() {
  assert(_collectorState == InitialMarking, "Wrong collector state");
  check_correct_thread_executing();
  TraceCMSMemoryManagerStats tms(_collectorState,GenCollectedHeap::heap()->gc_cause());

  save_heap_summary();
  report_heap_summary(GCWhen::BeforeGC);

  ReferenceProcessor* rp = ref_processor();
  assert(_restart_addr == NULL, "Control point invariant");
  {
    // acquire locks for subsequent manipulations
    MutexLockerEx x(bitMapLock(),
                    Mutex::_no_safepoint_check_flag);
    checkpointRootsInitialWork();
    // enable ("weak") refs discovery
    rp->enable_discovery();
    _collectorState = Marking;
  }
}

checkpointRootsInitialWork是需要重点关注的函数调用;CMSParallelInitialMarkEnabled默认是true的,所以将会执行下面这段代码:

      // The parallel version.
      WorkGang* workers = gch->workers();
      assert(workers != NULL, "Need parallel worker threads.");
      uint n_workers = workers->active_workers();

      StrongRootsScope srs(n_workers);

      CMSParInitialMarkTask tsk(this, &srs, n_workers);
      initialize_sequential_subtasks_for_young_gen_rescan(n_workers);
      // If the total workers is greater than 1, then multiple workers
      // may be used at some time and the initialization has been set
      // such that the single threaded path cannot be used.
      if (workers->total_workers() > 1) {
        workers->run_task(&tsk);
      } else {
        tsk.work(0);
      }

CMSParInitialMarkTask就是具体的任务,CMSParInitialMarkTask::work将完成具体的InitialMarking工作,下面是CMSParInitialMarkTask::work的具体细节,从图中的代码片段可以看出来InitialMarking需要完成的工作是哪些:

void CMSParInitialMarkTask::work(uint worker_id) {
  elapsedTimer _timer;
  ResourceMark rm;
  HandleMark   hm;
  // ---------- scan from roots --------------
  _timer.start();
  GenCollectedHeap* gch = GenCollectedHeap::heap();
  ParMarkRefsIntoClosure par_mri_cl(_collector->_span, &(_collector->_markBitMap));
  // ---------- young gen roots --------------
  {
    work_on_young_gen_roots(&par_mri_cl);
    _timer.stop();
    log_trace(gc, task)("Finished young gen initial mark scan work in %dth thread: %3.3f sec",
                        worker_id, _timer.seconds());
  }
  // ---------- remaining roots --------------
  _timer.reset();
  _timer.start();
  CLDToOopClosure cld_closure(&par_mri_cl, true);
  gch->cms_process_roots(_strong_roots_scope,
                         false,     // yg was scanned above
                         GenCollectedHeap::ScanningOption(_collector->CMSCollector::roots_scanning_options()),
                         _collector->should_unload_classes(),
                         &par_mri_cl,
                         &cld_closure);
  assert(_collector->should_unload_classes()
         || (_collector->CMSCollector::roots_scanning_options() & GenCollectedHeap::SO_AllCodeCache),
         "if we didn't scan the code cache, we have to be ready to drop nmethods with expired weak oops");
  _timer.stop();
  log_trace(gc, task)("Finished remaining root initial mark scan work in %dth thread: %3.3f sec",
                      worker_id, _timer.seconds());
}

InitialMarking阶段将以GCRoot和新生代对象为Root扫描老年代,来标记出老年代存活的对象;在具体实现上,CMS使用称为“三色标记”的算法来进行存活对象标记,白色代表没有被标记,灰色代表自身被标记,但是引用的对象还没有被标记,黑色代表自身被标记,并且引用的对象也已经标记物完成,具体的算法实现非常复杂,本文就不继续分析研究了。

Marking (并发标记)

该阶段称为并发标记,这里的并发,指的是用户线程和GC线程并发执行,介于这种并发执行的情况,可能在GC线程标记的过程中存在新生代对象晋升的情况,或者根据内存分配策略大对象直接在老年代分配空间,以及Minor GC的时候存活对象无法转移到To Survivor中去而提前晋升转移到老年代中来,或者更为复杂的是对象引用关系发生变化,这些对象都需要被重新标记,否则就会错误的以为这部分对象不可达而被清理,造成严重的运行时错误。

      case Marking:
        // initial marking in checkpointRootsInitialWork has been completed
        if (markFromRoots()) { // we were successful
          assert(_collectorState == Precleaning, "Collector state should "
            "have changed");
        } else {
          assert(_foregroundGCIsActive, "Internal state inconsistency");
        }
        break;

markFromRoots函数将负责并发标记阶段的全部工作,下面来分析一下这个阶段的主要流程;

bool CMSCollector::markFromRoots() {
  // we might be tempted to assert that:
  // assert(!SafepointSynchronize::is_at_safepoint(),
  //        "inconsistent argument?");
  // However that wouldn't be right, because it's possible that
  // a safepoint is indeed in progress as a young generation
  // stop-the-world GC happens even as we mark in this generation.
  assert(_collectorState == Marking, "inconsistent state?");
  check_correct_thread_executing();
  verify_overflow_empty();

  // Weak ref discovery note: We may be discovering weak
  // refs in this generation concurrent (but interleaved) with
  // weak ref discovery by the young generation collector.

  CMSTokenSyncWithLocks ts(true, bitMapLock());
  GCTraceCPUTime tcpu;
  CMSPhaseAccounting pa(this, "Concurrent Mark");
  bool res = markFromRootsWork();
  if (res) {
    _collectorState = Precleaning;
  } else { // We failed and a foreground collection wants to take over
    assert(_foregroundGCIsActive, "internal state inconsistency");
    assert(_restart_addr == NULL,  "foreground will restart from scratch");
    log_debug(gc)("bailing out to foreground collection");
  }
  verify_overflow_empty();
  return res;
}

markFromRoots函数中的markFromRootsWork函数调用将完成主要的工作,然后判断该阶段的任务是否成功执行,如果是的话,那么就转移状态到Precleaning,接着GCThread就会进行下一阶段Precleaning的工作;下面来看看markFromRootsWork函数实现的细节:

bool CMSCollector::markFromRootsWork() {
  // iterate over marked bits in bit map, doing a full scan and mark
  // from these roots using the following algorithm:
  // . if oop is to the right of the current scan pointer,
  //   mark corresponding bit (we'll process it later)
  // . else (oop is to left of current scan pointer)
  //   push oop on marking stack
  // . drain the marking stack

  // Note that when we do a marking step we need to hold the
  // bit map lock -- recall that direct allocation (by mutators)
  // and promotion (by the young generation collector) is also
  // marking the bit map. [the so-called allocate live policy.]
  // Because the implementation of bit map marking is not
  // robust wrt simultaneous marking of bits in the same word,
  // we need to make sure that there is no such interference
  // between concurrent such updates.

  // already have locks
  assert_lock_strong(bitMapLock());

  verify_work_stacks_empty();
  verify_overflow_empty();
  bool result = false;
  if (CMSConcurrentMTEnabled && ConcGCThreads > 0) {
    result = do_marking_mt();
  } else {
    result = do_marking_st();
  }
  return result;
}

如果设置了CMSConcurrentMTEnabled,并且ConcGCThreads数量大于0,那么就会执行do_marking_mt,也就是多线程版本,否则就会执行do_marking_st,也就是单线程版本;为了分析简单,下面只分析单线程版本的内容:

bool CMSCollector::do_marking_st() {
  ResourceMark rm;
  HandleMark   hm;

  // Temporarily make refs discovery single threaded (non-MT)
  ReferenceProcessorMTDiscoveryMutator rp_mut_discovery(ref_processor(), false);
  MarkFromRootsClosure markFromRootsClosure(this, _span, &_markBitMap,
    &_markStack, CMSYield);
  // the last argument to iterate indicates whether the iteration
  // should be incremental with periodic yields.
  _markBitMap.iterate(&markFromRootsClosure);
  // If _restart_addr is non-NULL, a marking stack overflow
  // occurred; we need to do a fresh iteration from the
  // indicated restart address.
  while (_restart_addr != NULL) {
    if (_foregroundGCIsActive) {
      // We may be running into repeated stack overflows, having
      // reached the limit of the stack size, while making very
      // slow forward progress. It may be best to bail out and
      // let the foreground collector do its job.
      // Clear _restart_addr, so that foreground GC
      // works from scratch. This avoids the headache of
      // a "rescan" which would otherwise be needed because
      // of the dirty mod union table & card table.
      _restart_addr = NULL;
      return false;  // indicating failure to complete marking
    }
    // Deal with stack overflow:
    // we restart marking from _restart_addr
    HeapWord* ra = _restart_addr;
    markFromRootsClosure.reset(ra);
    _restart_addr = NULL;
    _markBitMap.iterate(&markFromRootsClosure, ra, _span.end());
  }
  return true;
}

markFromRootsClosure是一个闭包函数对象,它里面的do_bit函数将会被BitMap::iterate来调用,调用关系可以在CMSCollector::do_marking_st函数中看到,先开看看BitMap::iterate的实现:

// Note that if the closure itself modifies the bitmap
// then modifications in and to the left of the _bit_ being
// currently sampled will not be seen. Note also that the
// interval [leftOffset, rightOffset) is right open.
bool BitMap::iterate(BitMapClosure* blk, idx_t leftOffset, idx_t rightOffset) {
  verify_range(leftOffset, rightOffset);

  idx_t startIndex = word_index(leftOffset);
  idx_t endIndex   = MIN2(word_index(rightOffset) + 1, size_in_words());
  for (idx_t index = startIndex, offset = leftOffset;
       offset < rightOffset && index < endIndex;
       offset = (++index) << LogBitsPerWord) {
    idx_t rest = map(index) >> (offset & (BitsPerWord - 1));
    for (; offset < rightOffset && rest != 0; offset++) {
      if (rest & 1) {
        if (!blk->do_bit(offset)) return false;
        //  resample at each closure application
        // (see, for instance, CMS bug 4525989)
        rest = map(index) >> (offset & (BitsPerWord -1));
      }
      rest = rest >> 1;
    }
  }
  return true;
}

可以看到不断的调用了BitMapClosure的do_bit函数,这里的BitMapClosure就是MarkFromRootsClosure;下面来看看do_bit的具体实现:

bool MarkFromRootsClosure::do_bit(size_t offset) {
  if (_skipBits > 0) {
    _skipBits--;
    return true;
  }
  // convert offset into a HeapWord*
  HeapWord* addr = _bitMap->startWord() + offset;
  assert(_bitMap->endWord() && addr < _bitMap->endWord(),
         "address out of range");
  assert(_bitMap->isMarked(addr), "tautology");
  if (_bitMap->isMarked(addr+1)) {
    // this is an allocated but not yet initialized object
    assert(_skipBits == 0, "tautology");
    _skipBits = 2;  // skip next two marked bits ("Printezis-marks")
    oop p = oop(addr);
    if (p->klass_or_null_acquire() == NULL) {
      DEBUG_ONLY(if (!_verifying) {)
        // We re-dirty the cards on which this object lies and increase
        // the _threshold so that we'll come back to scan this object
        // during the preclean or remark phase. (CMSCleanOnEnter)
        if (CMSCleanOnEnter) {
          size_t sz = _collector->block_size_using_printezis_bits(addr);
          HeapWord* end_card_addr   = (HeapWord*)round_to(
                                         (intptr_t)(addr+sz), CardTableModRefBS::card_size);
          MemRegion redirty_range = MemRegion(addr, end_card_addr);
          assert(!redirty_range.is_empty(), "Arithmetical tautology");
          // Bump _threshold to end_card_addr; note that
          // _threshold cannot possibly exceed end_card_addr, anyhow.
          // This prevents future clearing of the card as the scan proceeds
          // to the right.
          assert(_threshold <= end_card_addr,
                 "Because we are just scanning into this object");
          if (_threshold < end_card_addr) {
            _threshold = end_card_addr;
          }
          if (p->klass_or_null_acquire() != NULL) {
            // Redirty the range of cards...
            _mut->mark_range(redirty_range);
          } // ...else the setting of klass will dirty the card anyway.
        }
      DEBUG_ONLY(})
      return true;
    }
  }
  scanOopsInOop(addr);
  return true;
}

主要关系MarkFromRootsClosure::scanOopsInOop函数:

void MarkFromRootsClosure::scanOopsInOop(HeapWord* ptr) {
  assert(_bitMap->isMarked(ptr), "expected bit to be set");
  assert(_markStack->isEmpty(),
         "should drain stack to limit stack usage");
  // convert ptr to an oop preparatory to scanning
  oop obj = oop(ptr);
  // Ignore mark word in verification below, since we
  // may be running concurrent with mutators.
  assert(obj->is_oop(true), "should be an oop");
  assert(_finger <= ptr, "_finger runneth ahead");
  // advance the finger to right end of this object
  _finger = ptr + obj->size();
  assert(_finger > ptr, "we just incremented it above");
  // On large heaps, it may take us some time to get through
  // the marking phase. During
  // this time it's possible that a lot of mutations have
  // accumulated in the card table and the mod union table --
  // these mutation records are redundant until we have
  // actually traced into the corresponding card.
  // Here, we check whether advancing the finger would make
  // us cross into a new card, and if so clear corresponding
  // cards in the MUT (preclean them in the card-table in the
  // future).

  DEBUG_ONLY(if (!_verifying) {)
    // The clean-on-enter optimization is disabled by default,
    // until we fix 6178663.
    if (CMSCleanOnEnter && (_finger > _threshold)) {
      // [_threshold, _finger) represents the interval
      // of cards to be cleared  in MUT (or precleaned in card table).
      // The set of cards to be cleared is all those that overlap
      // with the interval [_threshold, _finger); note that
      // _threshold is always kept card-aligned but _finger isn't
      // always card-aligned.
      HeapWord* old_threshold = _threshold;
      assert(old_threshold == (HeapWord*)round_to(
              (intptr_t)old_threshold, CardTableModRefBS::card_size),
             "_threshold should always be card-aligned");
      _threshold = (HeapWord*)round_to(
                     (intptr_t)_finger, CardTableModRefBS::card_size);
      MemRegion mr(old_threshold, _threshold);
      assert(!mr.is_empty(), "Control point invariant");
      assert(_span.contains(mr), "Should clear within span");
      _mut->clear_range(mr);
    }
  DEBUG_ONLY(})
  // Note: the finger doesn't advance while we drain
  // the stack below.
  PushOrMarkClosure pushOrMarkClosure(_collector,
                                      _span, _bitMap, _markStack,
                                      _finger, this);
  bool res = _markStack->push(obj);
  assert(res, "Empty non-zero size stack should have space for single push");
  while (!_markStack->isEmpty()) {
    oop new_oop = _markStack->pop();
    // Skip verifying header mark word below because we are
    // running concurrent with mutators.
    assert(new_oop->is_oop(true), "Oops! expected to pop an oop");
    // now scan this oop's oops
    new_oop->oop_iterate(&pushOrMarkClosure);
    do_yield_check();
  }
  assert(_markStack->isEmpty(), "tautology, emphasizing post-condition");
}

看到oop_iterate,就是进行对象标记工作了,当然,具体的工作还是由PushOrMarkClosure的闭包函数do_oop完成的,下面来看看实现细节:

void PushOrMarkClosure::do_oop(oop obj) {
  // Ignore mark word because we are running concurrent with mutators.
  assert(obj->is_oop_or_null(true), "Expected an oop or NULL at " PTR_FORMAT, p2i(obj));
  HeapWord* addr = (HeapWord*)obj;
  if (_span.contains(addr) && !_bitMap->isMarked(addr)) {
    // Oop lies in _span and isn't yet grey or black
    _bitMap->mark(addr);            // now grey
    if (addr < _finger) {
      // the bit map iteration has already either passed, or
      // sampled, this bit in the bit map; we'll need to
      // use the marking stack to scan this oop's oops.
      bool simulate_overflow = false;
      NOT_PRODUCT(
        if (CMSMarkStackOverflowALot &&
            _collector->simulate_overflow()) {
          // simulate a stack overflow
          simulate_overflow = true;
        }
      )
      if (simulate_overflow || !_markStack->push(obj)) { // stack overflow
        log_trace(gc)("CMS marking stack overflow (benign) at " SIZE_FORMAT, _markStack->capacity());
        assert(simulate_overflow || _markStack->isFull(), "Else push should have succeeded");
        handle_stack_overflow(addr);
      }
    }
    // anything including and to the right of _finger
    // will be scanned as we iterate over the remainder of the
    // bit map
    do_yield_check();
  }
}

可以看到,do_oop函数会将对象标记,并且将对象push到_markStack中去,然后在MarkFromRootsClosure::scanOopsInOop的while循环中将从_markStack中pop出一个obj继续遍历标记,整个过程是类似于递归完成的;所以并发标记阶段完成的工作就是根据初始化标记阶段标记出来的对象为Root,递归标记这些root可达的引用,只是在标记的过程中用户线程也是并发执行的,所以情况就会比较复杂,这也是为什么CMS需要有多次标记动作的原因,如果不执行多次标记,那么就可能会将一些存活的对象漏标记了,那么清理的时候就会误清理。

Precleaning (预清理)

通过Marking之后,_collectorState就会被更新为Precleaning,该阶段的入口如下:

   case Precleaning:
        // marking from roots in markFromRoots has been completed
        preclean();
        assert(_collectorState == AbortablePreclean ||
               _collectorState == FinalMarking,
               "Collector state should have changed");
        break;

preclean函数就完成Precleaning阶段的工作;

void CMSCollector::preclean() {
  check_correct_thread_executing();
  assert(Thread::current()->is_ConcurrentGC_thread(), "Wrong thread");
  verify_work_stacks_empty();
  verify_overflow_empty();
  _abort_preclean = false;
  if (CMSPrecleaningEnabled) {
    if (!CMSEdenChunksRecordAlways) {
      _eden_chunk_index = 0;
    }
    size_t used = get_eden_used();
    size_t capacity = get_eden_capacity();
    // Don't start sampling unless we will get sufficiently
    // many samples.
    if (used < (((capacity / CMSScheduleRemarkSamplingRatio) / 100)
                * CMSScheduleRemarkEdenPenetration)) {
      _start_sampling = true;
    } else {
      _start_sampling = false;
    }
    GCTraceCPUTime tcpu;
    CMSPhaseAccounting pa(this, "Concurrent Preclean");
    preclean_work(CMSPrecleanRefLists1, CMSPrecleanSurvivors1);
  }
  CMSTokenSync x(true); // is cms thread
  if (CMSPrecleaningEnabled) {
    sample_eden();
    _collectorState = AbortablePreclean;
  } else {
    _collectorState = FinalMarking;
  }
  verify_work_stacks_empty();
  verify_overflow_empty();
}

CMSPrecleaningEnabled用于控制是否进行Precleaning阶段,CMSPrecleaningEnabled默认是true的,也就是默认会进行CMSPrecleaningEnabled,除非特殊情况,应该使用默认配置;preclean_work函数用于完成Precleaning的具体工作,Precleaning阶段需要完成的工作包括:

  • (1)、在并发标记阶段,新生代引用了老年代对象,这些老年代对象需要被标记出来,防止被清理;
  • (2)、在并发标记阶段,老年代内部引用关系改变,这些老年代对象也需要被标记出来;

AbortablePreclean

AbortablePreclean其实是一个为了达到CMS的终极目标(缩短STW时间)而存在的,AbortablePreclean阶段要做的工作和Precleaning相似,并且是一个循环的过程,但是是有条件的,达到某些条件之后就会跳出循环,执行STW的Final Mark阶段,AbortablePreclean阶段(包括Precleaning阶段)所要做的事情就是尽最大努力减少Final Mark需要标记的对象,这样STW的时间就减下来了。

abortable_preclean函数将负责完成AbortablePreclean阶段的工作;

// Try and schedule the remark such that young gen
// occupancy is CMSScheduleRemarkEdenPenetration %.
void CMSCollector::abortable_preclean() {
  check_correct_thread_executing();
  assert(CMSPrecleaningEnabled,  "Inconsistent control state");
  assert(_collectorState == AbortablePreclean, "Inconsistent control state");

  // If Eden's current occupancy is below this threshold,
  // immediately schedule the remark; else preclean
  // past the next scavenge in an effort to
  // schedule the pause as described above. By choosing
  // CMSScheduleRemarkEdenSizeThreshold >= max eden size
  // we will never do an actual abortable preclean cycle.
  if (get_eden_used() > CMSScheduleRemarkEdenSizeThreshold) {
    GCTraceCPUTime tcpu;
    CMSPhaseAccounting pa(this, "Concurrent Abortable Preclean");
    // We need more smarts in the abortable preclean
    // loop below to deal with cases where allocation
    // in young gen is very very slow, and our precleaning
    // is running a losing race against a horde of
    // mutators intent on flooding us with CMS updates
    // (dirty cards).
    // One, admittedly dumb, strategy is to give up
    // after a certain number of abortable precleaning loops
    // or after a certain maximum time. We want to make
    // this smarter in the next iteration.
    // XXX FIX ME!!! YSR
    size_t loops = 0, workdone = 0, cumworkdone = 0, waited = 0;
    while (!(should_abort_preclean() ||
             ConcurrentMarkSweepThread::cmst()->should_terminate())) {
      workdone = preclean_work(CMSPrecleanRefLists2, CMSPrecleanSurvivors2);
      cumworkdone += workdone;
      loops++;
      // Voluntarily terminate abortable preclean phase if we have
      // been at it for too long.
      if ((CMSMaxAbortablePrecleanLoops != 0) &&
          loops >= CMSMaxAbortablePrecleanLoops) {
        log_debug(gc)(" CMS: abort preclean due to loops ");
        break;
      }
      if (pa.wallclock_millis() > CMSMaxAbortablePrecleanTime) {
        log_debug(gc)(" CMS: abort preclean due to time ");
        break;
      }
      // If we are doing little work each iteration, we should
      // take a short break.
      if (workdone < CMSAbortablePrecleanMinWorkPerIteration) {
        // Sleep for some time, waiting for work to accumulate
        stopTimer();
        cmsThread()->wait_on_cms_lock(CMSAbortablePrecleanWaitMillis);
        startTimer();
        waited++;
      }
    }
    log_trace(gc)(" [" SIZE_FORMAT " iterations, " SIZE_FORMAT " waits, " SIZE_FORMAT " cards)] ",
                               loops, waited, cumworkdone);
  }
  CMSTokenSync x(true); // is cms thread
  if (_collectorState != Idling) {
    assert(_collectorState == AbortablePreclean,
           "Spontaneous state transition?");
    _collectorState = FinalMarking;
  } // Else, a foreground collection completed this CMS cycle.
  return;
}

CMSScheduleRemarkEdenSizeThreshold默认值为2M,只有当Eden区域的使用量大于该值的时候才会进行接下来的工作;接下来看到的while循环里面做的工作和Precleaning是一样的,因为和Precleaning阶段一样使用了preclean_work函数来完成具体的工作;这个while循环执行下去的条件值得分析一下;

  • (1)、首先,CMSMaxAbortablePrecleanLoops用来设置最大的执行次数,默认是0,也就是不做限制
  • (2)、CMSMaxAbortablePrecleanTime用于设置最大的循环时间,默认是5000ms
  • (3)、如果每次循环花费的时间小于CMSAbortablePrecleanMinWorkPerIteration,那么就得等待CMSAbortablePrecleanWaitMillis再继续循环,两个值默认都是100ms
  • (4)、should_abort_preclean函数判断为true
inline bool CMSCollector::should_abort_preclean() const {
  // We are in the midst of an "abortable preclean" and either
  // scavenge is done or foreground GC wants to take over collection
  return _collectorState == AbortablePreclean &&
         (_abort_preclean || _foregroundGCIsActive ||
          GenCollectedHeap::heap()->incremental_collection_will_fail(true /* consult_young */));
}

_foregroundGCIsActive代表正在进行Serial Old GC,incremental_collection_will_fail代表已经发生了"Promotion Fail",那么就不用进行“递增式GC了”,也就是JVM建议直接进行FullGC,这些情况下should_abort_preclean都会返回true;

  • (5)、ConcurrentMarkSweepThread::cmst()->should_terminate()返回true,代表ConcurrentMarkSweepThread被标记为需要terminate;

FinalMarking (最终标记)

FinalMarking属于ReMark,需要STW,下面来分析一下这个阶段需要完成的工作;首先大概猜测一下会进行哪些工作;首先,ReMark阶段需要将最终要清理掉的对象标记出来,也就是这个阶段完成之后,被标记为"垃圾"的对象将会在稍后的阶段回收内存,初始标记阶段完成了从GCRoot和新生代可达的老年代对象,两个preclean阶段是一种修正手段,将那些在GC线程和用户线程并发执行时发生的变化记录起来,并且因为FinalMark阶段是STW的去扫描整个新生代来发现那些可达的老年代对象的,所以,新生代存活的对象如果很多的话,需要扫描的对象就很多,整个社会STW的时间就会上升,所以AbortablePreclean阶段将尽力使得新生代发生一次YGC,这样FinalMark时需要扫描的新生代对象就变少了。因为并发标记阶段GC线程和用户线程并发运行,所以可能会发生下列情况:

  • (1)、并发期间新生代对象引用(或者解除引用)了老年代对象
  • (2)、并发期间GCRoot引用(或者解除引用)了老年代对象
  • (3)、并发期间老年代内部引用关系发生了变化(DirtyCard,引用关系改变的都将记录在DirtyCard内,所以扫描DirtyCard即可)

这些情况FinalMark阶段需要全部考虑到,下面具体来看看该阶段完成的工作;

        {
          ReleaseForegroundGC x(this);

          VM_CMS_Final_Remark final_remark_op(this);
          VMThread::execute(&final_remark_op);
        }
        assert(_foregroundGCShouldWait, "block post-condition");
        break;c

VM_CMS_Final_Remark类型的任务将被添加到VMThread里面执行,所以直接来看VM_CMS_Final_Remark的doit函数实现就可以知道具体的工作内容了;

void VM_CMS_Final_Remark::doit() {
  if (lost_race()) {
    // Nothing to do.
    return;
  }
  HS_PRIVATE_CMS_REMARK_BEGIN();
  GCIdMark gc_id_mark(_gc_id);

  _collector->_gc_timer_cm->register_gc_pause_start("Final Mark");

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  GCCauseSetter gccs(gch, GCCause::_cms_final_remark);

  VM_CMS_Operation::verify_before_gc();

  IsGCActiveMark x; // stop-world GC active
  _collector->do_CMS_operation(CMSCollector::CMS_op_checkpointRootsFinal, gch->gc_cause());

  VM_CMS_Operation::verify_after_gc();

  _collector->save_heap_summary();
  _collector->_gc_timer_cm->register_gc_pause_end();

  HS_PRIVATE_CMS_REMARK_END();
}

和初始化标记一样使用了do_CMS_operation函数,但是执行类型变为了CMSCollector::CMS_op_checkpointRootsFinal,下面看看do_CMS_operation内部执行CMSCollector::CMS_op_checkpointRootsFinal的那部分代码;

void CMSCollector::checkpointRootsFinal() {
  assert(_collectorState == FinalMarking, "incorrect state transition?");
  check_correct_thread_executing();
  // world is stopped at this checkpoint
  assert(SafepointSynchronize::is_at_safepoint(),
         "world should be stopped");
  TraceCMSMemoryManagerStats tms(_collectorState,GenCollectedHeap::heap()->gc_cause());

  verify_work_stacks_empty();
  verify_overflow_empty();

  log_debug(gc)("YG occupancy: " SIZE_FORMAT " K (" SIZE_FORMAT " K)",
                _young_gen->used() / K, _young_gen->capacity() / K);
  {
    if (CMSScavengeBeforeRemark) {
      GenCollectedHeap* gch = GenCollectedHeap::heap();
      // Temporarily set flag to false, GCH->do_collection will
      // expect it to be false and set to true
      FlagSetting fl(gch->_is_gc_active, false);

      gch->do_collection(true,                      // full (i.e. force, see below)
                         false,                     // !clear_all_soft_refs
                         0,                         // size
                         false,                     // is_tlab
                         GenCollectedHeap::YoungGen // type
        );
    }
    FreelistLocker x(this);
    MutexLockerEx y(bitMapLock(),
                    Mutex::_no_safepoint_check_flag);
    checkpointRootsFinalWork();
  }
  verify_work_stacks_empty();
  verify_overflow_empty();
}

如果设置了CMSScavengeBeforeRemark,那么就在执行FinalMark之前执行一次YGC,具体原因前面说过,因为FinalMark阶段是STW的,如果新生代存活对象很多的话,就需要扫描很多对象,这个STW时间就上来了,所以提前进行一次YGC,那么就可以让新生代中废弃的对象回收掉,使得FinalMark阶段扫描的对象减少;CMSScavengeBeforeRemark默认是false的,这个参数还是建议不要轻易设置,因为有preclean阶段的存在,可能在preclean阶段已经发生了一次YGC,如果再进行一次YGC,是没有必要的,所以让CMS自己去按照自己的节奏去工作,除非特别不否和预期的时候才去干涉他的执行。

Sweeping (清除)

就像名字一样,该阶段就是进行垃圾对象清理的,这个阶段是并发的,整个CMS周期性GC过程中,除了initMark和FinalMark之外,其他阶段都是可以并发的;sweep函数将完成清理的工作,在sweep函数内部调用了一个关键的函数sweepWork,下面是sweepWork的具体实现:

void CMSCollector::sweepWork(ConcurrentMarkSweepGeneration* old_gen) {
  // We iterate over the space(s) underlying this generation,
  // checking the mark bit map to see if the bits corresponding
  // to specific blocks are marked or not. Blocks that are
  // marked are live and are not swept up. All remaining blocks
  // are swept up, with coalescing on-the-fly as we sweep up
  // contiguous free and/or garbage blocks:
  // We need to ensure that the sweeper synchronizes with allocators
  // and stop-the-world collectors. In particular, the following
  // locks are used:
  // . CMS token: if this is held, a stop the world collection cannot occur
  // . freelistLock: if this is held no allocation can occur from this
  //                 generation by another thread
  // . bitMapLock: if this is held, no other thread can access or update
  //

  // Note that we need to hold the freelistLock if we use
  // block iterate below; else the iterator might go awry if
  // a mutator (or promotion) causes block contents to change
  // (for instance if the allocator divvies up a block).
  // If we hold the free list lock, for all practical purposes
  // young generation GC's can't occur (they'll usually need to
  // promote), so we might as well prevent all young generation
  // GC's while we do a sweeping step. For the same reason, we might
  // as well take the bit map lock for the entire duration

  // check that we hold the requisite locks
  assert(have_cms_token(), "Should hold cms token");
  assert(ConcurrentMarkSweepThread::cms_thread_has_cms_token(), "Should possess CMS token to sweep");
  assert_lock_strong(old_gen->freelistLock());
  assert_lock_strong(bitMapLock());

  assert(!_inter_sweep_timer.is_active(), "Was switched off in an outer context");
  assert(_intra_sweep_timer.is_active(),  "Was switched on  in an outer context");
  old_gen->cmsSpace()->beginSweepFLCensus((float)(_inter_sweep_timer.seconds()),
                                          _inter_sweep_estimate.padded_average(),
                                          _intra_sweep_estimate.padded_average());
  old_gen->setNearLargestChunk();

  {
    SweepClosure sweepClosure(this, old_gen, &_markBitMap, CMSYield);
    old_gen->cmsSpace()->blk_iterate_careful(&sweepClosure);
    // We need to free-up/coalesce garbage/blocks from a
    // co-terminal free run. This is done in the SweepClosure
    // destructor; so, do not remove this scope, else the
    // end-of-sweep-census below will be off by a little bit.
  }
  old_gen->cmsSpace()->sweep_completed();
  old_gen->cmsSpace()->endSweepFLCensus(sweep_count());
  if (should_unload_classes()) {                // unloaded classes this cycle,
    _concurrent_cycles_since_last_unload = 0;   // ... reset count
  } else {                                      // did not unload classes,
    _concurrent_cycles_since_last_unload++;     // ... increment count
  }
}

CMS只会回收CMSGen,也就是老年代,这里需要重新说明一下;除了ConcMarkSweepGC外,其他GC类型的OldGC都可以说是FullGC(G1暂未了解),具体的sweep算法就不继续分析了。

foreground gc

上面说到的属于CMS周期性GC,也就是background gc,是一种被动的GC,通过监控老年代空间使用率来启动GC,foreground gc属于主动gc,发生foreground gc一般来说就是年轻代发生了Minor GC,并且发生了"Promotion fail",老年代空间不足等原因,具体原因和GenCollectedHeap堆的GC策略相关,这一点可以看前面的分析文章;下面来简单分析一下foreground gc的一些情况;

发生foreground gc的入口是ConcurrentMarkSweepGeneration::collect;

void ConcurrentMarkSweepGeneration::collect(bool   full,
                                            bool   clear_all_soft_refs,
                                            size_t size,
                                            bool   tlab)
{
  collector()->collect(full, clear_all_soft_refs, size, tlab);
}

void CMSCollector::collect(bool   full,
                           bool   clear_all_soft_refs,
                           size_t size,
                           bool   tlab)
{
  // The following "if" branch is present for defensive reasons.
  // In the current uses of this interface, it can be replaced with:
  // assert(!GCLocker.is_active(), "Can't be called otherwise");
  // But I am not placing that assert here to allow future
  // generality in invoking this interface.
  if (GCLocker::is_active()) {
    // A consistency test for GCLocker
    assert(GCLocker::needs_gc(), "Should have been set already");
    // Skip this foreground collection, instead
    // expanding the heap if necessary.
    // Need the free list locks for the call to free() in compute_new_size()
    compute_new_size();
    return;
  }
  acquire_control_and_collect(full, clear_all_soft_refs);
}

acquire_control_and_collect函数将完成foreground gc的工作,看函数名字就可以猜测它要干嘛,首先要acquire control,也就是获取到堆的控制权,因为在触发foreground gc的时候,background gc可能正在工作,因为不可能同时两中gc同时运行,而foreground gc的优先级明显高于background gc,所以需要让background gc放弃gc,然后foreground gc来完成收集老年代垃圾的工作,当然,foreground gc顺带会回收新生代,所以是一次FullGC,下面具体看看acquire_control_and_collect函数的流程;

{
    MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
    if (_foregroundGCShouldWait) {
      // We are going to be waiting for action for the CMS thread;
      // it had better not be gone (for instance at shutdown)!
      assert(ConcurrentMarkSweepThread::cmst() != NULL && !ConcurrentMarkSweepThread::cmst()->has_terminated(),
             "CMS thread must be running");
      // Wait here until the background collector gives us the go-ahead
      ConcurrentMarkSweepThread::clear_CMS_flag(
        ConcurrentMarkSweepThread::CMS_vm_has_token);  // release token
      // Get a possibly blocked CMS thread going:
      //   Note that we set _foregroundGCIsActive true above,
      //   without protection of the CGC_lock.
      CGC_lock->notify();
      assert(!ConcurrentMarkSweepThread::vm_thread_wants_cms_token(),
             "Possible deadlock");
      while (_foregroundGCShouldWait) {
        // wait for notification
        CGC_lock->wait(Mutex::_no_safepoint_check_flag);
        // Possibility of delay/starvation here, since CMS token does
        // not know to give priority to VM thread? Actually, i think
        // there wouldn't be any delay/starvation, but the proof of
        // that "fact" (?) appears non-trivial. XXX 20011219YSR
      }
      ConcurrentMarkSweepThread::set_CMS_flag(
        ConcurrentMarkSweepThread::CMS_vm_has_token);
    }
  }

这一段会尝试等background gc主动把堆的控制权转移给foreground gc,在collect_in_background(background gc)中,开始之前会判断是否在进行foreground gc(_foregroundGCIsActive = true),如果在执行foreground gc,那么就会直接退出本次background gc;否则再每完成一个阶段之后都会尝试判断是否foreground gc在等待;

    {
      // Check if the FG collector wants us to yield.
      CMSTokenSync x(true); // is cms thread
      if (waitForForegroundGC()) {
        // We yielded to a foreground GC, nothing more to be
        // done this round.
        assert(_foregroundGCShouldWait == false, "We set it to false in "
               "waitForForegroundGC()");
        log_debug(gc, state)("CMS Thread " INTPTR_FORMAT " exiting collection CMS state %d",
                             p2i(Thread::current()), _collectorState);
        return;
      } else {
        // The background collector can run but check to see if the
        // foreground collector has done a collection while the
        // background collector was waiting to get the CGC_lock
        // above.  If yes, break so that _foregroundGCShouldWait
        // is cleared before returning.
        if (_collectorState == Idling) {
          break;
        }
      }
    }

waitForForegroundGC函数完成等待foreground gc 发生的工作:

bool CMSCollector::waitForForegroundGC() {
  bool res = false;
  assert(ConcurrentMarkSweepThread::cms_thread_has_cms_token(),
         "CMS thread should have CMS token");
  // Block the foreground collector until the
  // background collectors decides whether to
  // yield.
  MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
  _foregroundGCShouldWait = true;
  if (_foregroundGCIsActive) {
    // The background collector yields to the
    // foreground collector and returns a value
    // indicating that it has yielded.  The foreground
    // collector can proceed.
    res = true;
    _foregroundGCShouldWait = false;
    ConcurrentMarkSweepThread::clear_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_has_token);
    ConcurrentMarkSweepThread::set_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_wants_token);
    // Get a possibly blocked foreground thread going
    CGC_lock->notify();
    log_debug(gc, state)("CMS Thread " INTPTR_FORMAT " waiting at CMS state %d",
                         p2i(Thread::current()), _collectorState);
    while (_foregroundGCIsActive) {
      CGC_lock->wait(Mutex::_no_safepoint_check_flag);
    }
    ConcurrentMarkSweepThread::set_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_has_token);
    ConcurrentMarkSweepThread::clear_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_wants_token);
  }
  log_debug(gc, state)("CMS Thread " INTPTR_FORMAT " continuing at CMS state %d",
                       p2i(Thread::current()), _collectorState);
  return res;
}

如果此时进行(或者等待)foreground gc,那么就放弃此次background gc;否则告诉后续来到的foreground gc等待一下,等本阶段CMS GC完成会再次来判断的;

在foreground gc中,获取到了堆的控制权之后,就会执行下面的代码片段:

  if (first_state > Idling) {
    report_concurrent_mode_interruption();
  }
void CMSCollector::report_concurrent_mode_interruption() {
  if (is_external_interruption()) {
    log_debug(gc)("Concurrent mode interrupted");
  } else {
    log_debug(gc)("Concurrent mode failure");
    _gc_tracer_cm->report_concurrent_mode_failure();
  }
}

bool CMSCollector::is_external_interruption() {
  GCCause::Cause cause = GenCollectedHeap::heap()->gc_cause();
  return GCCause::is_user_requested_gc(cause) ||
         GCCause::is_serviceability_requested_gc(cause);
}

我们在观察CMS GC日志的时候,偶尔会看到“Concurrent mode interrupted”或者“Concurrent mode failure”这样的日志,就是因为在进行foreground gc的时候发现background gc已经在工作了;如果是类似于System.gc()这样的用户请求GC,那么就会打印“Concurrent mode interrupted”,否则就是“Concurrent mode failure”;

之后CMSCollector::do_compaction_work函数将做一次Mark-sweep-compact的工作,具体的工作在GenMarkSweep::invoke_at_safepoint函数中完成,这个函数在前面分析Serial Old的时候提到过,所以不再赘述;

总结

整个CMS GC其实是非常复杂的,涉及用户线程和GC线程并发执行,以及foreground gc和background gc相互配合的过程,当然还涉及大量的参数,这些参数稍微不注意就会让JVM工作得不好,所以建议在不了解某个参数的具体表现的时候不要轻易使用;

其实CMS Old GC为什么分这么多步骤呢?主要原因是为了降低STW的时候,所以将mark和sweep两个阶段都设计成并发了,initMark和FinalMark会STW,但是initMark阶段所做的mark非常有限,GCRoot-> cms gen , YoungGen -> cms gen,而且因为两个preclan阶段和Dirty Card的存在,使得FinalMark阶段需要扫描的对象大大减小,如果在实际的运行过程中发现每次FinalMark过程都非常长,那么就设置参数在进行FinalMark之前进行一次YGC,使得FinalMark需要扫描的对象减少;CMS Old GC Mark 和 preclean阶段允许用户线程和GC线程并发执行,所以会存在:

  • (1)、yong gen -> old gen
  • (2)、GCRoot -> old gen
  • (3)、old gen internal ref changed

解决这些问题就需要FinalMark的存在,FinalMark将扫描新生代,标记出yong gen -> old gen的部分,老年代内部的对象引用关系如果在并发阶段发生变化,会记录到DirtyCard中去,所以在FinalMark阶段扫描DirtyCard即可;

最后要说一下foreground gc和background gc,最好不要发生foreground gc,因为foreground gc会认为此时已经没有什么办法满足对象分配了,那么就要做一次彻底清理的工作,也就是FullGC,并且foreground gc是单线程运行的,并且是mark-sweep-compact的,所以速度可想而知,如果发现foreground gc发生的频繁,就要分析一下原因了,建议去研究GenCollectedHeap::do_collection,搞明白GC的策略,当然不同GC对应的堆是不一样的,Serial 和 CMS对应的是GenCollectedHeap,其他的就不是了,这个前面的文章说过。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
JVM Profiler是一种用于监测和分析Java虚拟机(JVM)运行时性能的工具。它可以通过收集和分析JVM的运行时信息来帮助开发人员识别和解决性能问题。JVM Profiler可以通过使用JVM Agent来实现,JVM Agent是一种通过在JVM启动时加载的方式来修改JVM行为的工具。通过JVM Agent,JVM Profiler可以获取程序运行时的信息,如方法调用、内存使用情况、线程状态等。这些信息可以用于分析程序的性能瓶颈,并进行优化。一种常见的使用方式是通过命令行调用JAR文件来运行JVM Profiler,并指定相应的参数和报告器。例如,可以使用以下命令来运行JVM Profiler并将结果输出到控制台:java -javaagent:target/jvm-profiler-0.0.5.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter -cp target/jvm-profiler-0.0.5.jar com.uber.profiling.examples.HelloWorldApplication。此外,JVM Profiler还可以使用JVMTI(JVM Tool Interface)来实现,JVMTI是JVM提供的一套标准的C/C++编程接口,用于实现Debugger、Profiler、Monitor、Thread Analyzer等工具。通过JVMTI,开发人员可以编写自定义的JVM Profiler来满足特定的需求。 #### 引用[.reference_title] - *1* *3* [JVM CPU Profiler技术原理及源码深度解析](https://blog.csdn.net/weixin_45678149/article/details/130775520)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [JVM Profiler介绍](https://blog.csdn.net/weixin_33676492/article/details/89589464)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值