JVM G1源码分析——快速分配TLAB

学海_无涯_苦作舟

已于 2023-09-18 12:34:06 修改

阅读量246

点赞数

分类专栏： # JVM 文章标签： jvm 缓存 java

于 2023-09-17 15:39:28 首次发布

本文链接：https://blog.csdn.net/qq_16500963/article/details/132942003

版权

JVM 专栏收录该内容

13 篇文章 5 订阅

订阅专栏

前言

介绍TLAB之前先思考一个问题：
创建对象时，需要在堆上申请指定大小的内存，如果同时有大量线程申请内存的话，可以通过CAS乐观锁机制确保不会申请到同一块内存，在JVM运行中，内存分配时一个极其频繁的动作，这种方式势必会降低性能。

什么是TLAB

TLAB全称ThreadLocalAllocationBuffer，是线程的一块私有内存，如果设置了虚拟机参数 -XX：UseTLAB，在线程初始化时，同时也会申请一块指定大小的内存，只给当前线程使用，这样每个线程都单独拥有一个Buffer，如果需要分配内存，就在自己的Buffer上分配，这样就不会存在竞争的情况，可以大大提升分配效率，当Buffer容量不够的时候，再重新从Eden区域申请一个继续使用，这个申请动作还是需要原子操作的。

TLAB的目的是在为新对象分配内存空间时，让每个Java应用线程能在使用自己专属的分配指针来分配空间，均摊GC堆（eden区）里共享的分配指针做更新而带来的同步开销。

TLAB只是让每个线程私有的分配指针，但底下存对象的内存空间还是给所有线程访问的，只是其他线程无法在这个区域分配而已。当一个TLAB用满（分配指针top撞上分配极限end了），就新申请一个TLAB，而在老TLAB里对象还留在原地什么都不用管——它们无法感知自己是否是曾经从TLAB分配出来的，而只关心自己是在eden里分配的。

// ThreadLocalAllocBuffer: a descriptor for thread-local storage used by
// the threads for allocation.
//            It is thread-private at any time, but maybe multiplexed over
//            time across multiple threads. The park()/unpark() pair is
//            used to make it available for such multiplexing.
class ThreadLocalAllocBuffer: public CHeapObj<mtThread> {
  friend class VMStructs;
private:
  HeapWord* _start;                    // address of TLAB
  HeapWord* _top;                      // address after last allocation
  HeapWord* _pf_top;                   // allocation prefetch watermark
  HeapWord* _end;                      // allocation end (excluding alignment_reserve)
  size_t    _desired_size;             // desired size   (including alignment_reserve)
  size_t    _refill_waste_limit;       // hold onto tlab if free() is larger than this
  size_t    _allocated_before_last_gc; // total bytes allocated up until the last gc

  static size_t   _max_size;           // maximum size of any TLAB
  static unsigned _target_refills;     // expected number of refills between GCs

  unsigned  _number_of_refills;
  unsigned  _fast_refill_waste;
  unsigned  _slow_refill_waste;
  unsigned  _gc_waste;
  unsigned  _slow_allocations;
…………………………
}

ThreadLocalAllocBuffer继承自CHeapObj，提供的重要属性如下：

_start：HeapWord指针，TLAB内存区域的起始地址
_top：HeapWord指针，最后一次分配后的地址，即该地址之前的内存区域都已经被分配了
_pf_top：HeapWord指针，预分配的top
_end：HeapWord指针，TLAB内存区域的结束地址，不包含保留区域
_desired_size：期望的内存大小，包含保留区域，以字宽为单位，即8字节
_refill_waste_limit：一个阈值，free()的返回值大于此值，则保留此TLAB，否则丢弃创建一个新的TLAB
_allocated_before_last_gc：最近一次gc前已分配的内存大小
_max_size：TLAB的最大大小，静态属性
_target_refills：在GC前的目标refill的次数，静态属性
_number_of_refills：执行refill的次数，即重新分配TLAB的次数
_fast_refill_waste：走快速分配refill浪费的内存大小
_slow_refill_waste：走慢速分配refill浪费的内存大小
_gc_waste：因为gc导致refill浪费的内存大小
_slow_allocations：走慢速分配的次数，通过TLAB分配是快速分配，走堆内存分配因为必须加锁是慢速分配
_allocation_fraction: AdaptiveWeightedAverage类实例，用于自适应调整待分配的TLAB大小
_global_stats: TLAB分配对象的统计数据
注意对HeapWord指针加减1，指针内存地址会加减HeapWord对应的字节数，即8字节。注意TLAB中所有涉及内存大小的返回值，属性，入参的单位都是字宽，就是为了跟HeapWord指针配合使用。

TLAB简单来说本质上就是三个指针：start，top和end，每个线程都会从Eden分配一块空间，例如说100KB，作为自己的TLAB，其中start和end是占位用的，标识出eden里这个TLAB所管理的区域，卡住eden里的一块空间不让其他线程来这里分配。而top就是里面的分配指针，一开始指向跟start同样的位置，然后逐渐分配，直到再要分配下一个对象就会撞上end的时候就会触发一次TLAB refill，refill过程后续会解释。

_desired_size是指TLAB的内存大小。

_refill_waste_limit是指最大的浪费空间，假设为5KB，通俗一点讲就是：
1、假如当前TLAB已经分配96KB，还剩下4KB，但是现在new了一个对象需要6KB的空间，显然TLAB的内存不够了，这时可以简单的申请一个TLAB，原先的TLAB交给Eden管理，这时只浪费4KB的空间，在_refill_waste_limit之内。
2、假如当前TLAB已经分配90KB，还剩下10KB，现在new了一对象需要11KB，显然TLAB的内存不够了，这时就不能简单抛弃当前TLAB，这11KB会被安排到Eden区进行申请。

如何设计每个线程的 TLAB 大小

首先，TLAB 的初始大小，应该和每个 GC 内需要对象分配的线程个数相关。但是，要分配的线程个数并不一定是稳定的，可能这个时间段线程数多，下个阶段线程数就不那么多了，所以，需要用 EMA 的算法采集每个 GC 内需要对象分配的线程个数来计算这个个数期望。

接着，我们最理想的情况下，是每个 GC 内，所有用来分配对象的内存都处于对应线程的 TLAB 中。每个 GC 内用来分配对象的内存从 JVM 设计上来讲，其实就是 Eden 区大小。在最理想的情况下，最好只有Eden 区满了的时候才会 GC，不会有其他原因导致的 GC，这样是最高效的情况。Eden 区被用光，如果全都是 TLAB 内分配，也就是 Eden 区被所有线程的 TLAB 占满了，这样分配是最快的。

然后，每轮 GC 分配内存的线程个数以及大小是不一定的，如果一下子分配一大块会造成浪费，如果太小则会频繁从 Eden 申请 TLAB，降低效率。这个大小比较难以控制，但是我们可以限制每个线程究竟在一轮 GC 内，最多从 Eden 申请多少次 TLAB，这样对于用户来说更好控制。

最后，每个线程分配的内存大小，在每轮 GC 并不一定稳定，只用初始大小来指导之后的 TLAB 大小，显然不够。我们换个思路，每个线程分配的内存和历史有一定关系因此我们可以从历史分配中推测，所以每个线程也需要采用 EMA 的算法采集这个线程每次 GC 分配的内存，用于指导下次期望的 TLAB 的大小。

综上所述，我们可以得出这样一个近似的 TLAB 计算公式：

每个线程 TLAB 初始大小 = Eden区大小 / (线程单个 GC 轮次内最多从 Eden 申请多少次 TLAB * 当前 GC 分配线程个数 EMA)

GC 后，重新计算 TLAB 大小 = Eden区大小 / (线程单个 GC 轮次内最多从 Eden 申请多少次 TLAB * 当前 GC 分配线程个数 EMA)

TLAB默认配置项

#define define_pd_global(type, name, value) const type pd_##name = value;

define_pd_global(bool, UseTLAB,                      false);
define_pd_global(bool, ResizeTLAB,                   false);


// develop flags are settable / visible only during development and are constant in the PRODUCT version
// product flags are always settable / visible
// notproduct flags are settable / visible only during development and are not declared in the PRODUCT version

// A flag must be declared with one of the following types:
// bool, intx, uintx, ccstr.
// The type "ccstr" is an alias for "const char*" and is used
// only in this file, because the macrology requires single-token type names.

// Note: Diagnostic options not meant for VM tuning or for product modes.
// They are to be used for VM quality assurance or field diagnosis
// of VM bugs.  They are hidden so that users will not be encouraged to
// try them as if they were VM ordinary execution options.  However, they
// are available in the product version of the VM.  Under instruction
// from support engineers, VM customers can turn them on to collect
// diagnostic information about VM problems.  To use a VM diagnostic
// option, you must first specify +UnlockDiagnosticVMOptions.
// (This master switch also affects the behavior of -Xprintflags.)
//
// experimental flags are in support of features that are not
//    part of the officially supported product, but are available
//    for experimenting with. They could, for example, be performance
//    features that may not have undergone full or rigorous QA, but which may
//    help performance in some cases and released for experimentation
//    by the community of users and developers. This flag also allows one to
//    be able to build a fully supported product that nonetheless also
//    ships with some unsupported, lightly tested, experimental features.
//    Like the UnlockDiagnosticVMOptions flag above, there is a corresponding
//    UnlockExperimentalVMOptions flag, which allows the control and
//    modification of the experimental flags.
//
// Nota bene: neither diagnostic nor experimental options should be used casually,
//    and they are not supported on production loads, except under explicit
//    direction from support engineers.
//
// manageable flags are writeable external product flags.
//    They are dynamically writeable through the JDK management interface
//    (com.sun.management.HotSpotDiagnosticMXBean API) and also through JConsole.
//    These flags are external exported interface (see CCC).  The list of
//    manageable flags can be queried programmatically through the management
//    interface.
//
//    A flag can be made as "manageable" only if
//    - the flag is defined in a CCC as an external exported interface.
//    - the VM implementation supports dynamic setting of the flag.
//      This implies that the VM must *always* query the flag variable
//      and not reuse state related to the flag state at any given time.
//    - you want the flag to be queried programmatically by the customers.
//
// product_rw flags are writeable internal product flags.
//    They are like "manageable" flags but for internal/private use.
//    The list of product_rw flags are internal/private flags which
//    may be changed/removed in a future release.  It can be set
//    through the management interface to get/set value
//    when the name of flag is supplied.
//
//    A flag can be made as "product_rw" only if
//    - the VM implementation supports dynamic setting of the flag.
//      This implies that the VM must *always* query the flag variable
//      and not reuse state related to the flag state at any given time.
//
// Note that when there is a need to support develop flags to be writeable,
// it can be done in the same way as product_rw.

#define RUNTIME_FLAGS(develop, develop_pd, product, product_pd, diagnostic, experimental, notproduct, manageable, product_rw, lp64_product) \
                                                                            \
  product_pd(bool, UseTLAB, "Use thread-local object allocation")           \
                                                                            \
  product_pd(bool, ResizeTLAB,                                              \
          "Dynamically resize TLAB size for threads")                       \
                                                                            \
  product(bool, ZeroTLAB, false,                                            \
          "Zero out the newly created TLAB")                                \
                                                                            \
  product(bool, FastTLABRefill, true,                                       \
          "Use fast TLAB refill code")                                      \
                                                                            \
  product(bool, PrintTLAB, false,                                           \
          "Print various TLAB related information")                         \
                                                                            \
  product(bool, TLABStats, true,                                            \
          "Provide more detailed and expensive TLAB statistics "            \
          "(with PrintTLAB)")                                               \ 
  product(uintx, TLABSize, 0,                                               \
          "Starting TLAB size (in bytes); zero means set ergonomically")    \
                                                                            \
  product(uintx, MinTLABSize, 2*K,                                          \
          "Minimum allowed TLAB size (in bytes)")                           \
                                                                            \
  product(uintx, TLABAllocationWeight, 35,                                  \
          "Allocation averaging weight")                                    \
                                                                            \
  product(uintx, TLABWasteTargetPercent, 1,                                 \
          "Percentage of Eden that can be wasted")                          \
                                                                            \
  product(uintx, TLABRefillWasteFraction,    64,                            \
          "Maximum TLAB waste at a refill (internal fragmentation)")        \
                                                                            \
  product(uintx, TLABWasteIncrement,    4,                                  \
          "Increment allowed waste at slow allocation")                     \

其中UseTLAB、ResizeTLAB、ZeroTLAB、PrintTLAB默认为false；TLABStats和FastTLABRefill默认为true；TLABSize默认值为0，MinTLABSize默认值为2*K，

TLAB 初始化

线程初始化的时候，如果 JVM 启用了 TLAB（默认是启用的，可以通过 -XX:-UseTLAB 关闭），则会初始化 TLAB，在发生对象分配时，会根据期望大小申请 TLAB 内存。同时，在 GC 扫描对象发生之后，线程第一次尝试分配对象的时候，也会重新申请 TLAB 内存。我们先只关心初始化，初始化的流程图如图08 所示：

在Java代码中执行new Thread（）的时候，会触发以下代码：

\hotspot-jdk8u\src\share\vm\runtime\thread.cpp

// The first routine called by a new Java thread
void JavaThread::run() {
  // initialize thread-local alloc buffer related fields
  this->initialize_tlab();

  // used to test validitity of stack trace backs
  this->record_base_of_stack_pointer();

  // Record real stack base and size.
  this->record_stack_base_and_size();

  // Initialize thread local storage; set before calling MutexLocker
  this->initialize_thread_local_storage();

  this->create_stack_guard_pages();

  this->cache_global_variables();

  // Thread is now sufficient initialized to be handled by the safepoint code as being
  // in the VM. Change thread state from _thread_new to _thread_in_vm
  ThreadStateTransition::transition_and_fence(this, _thread_new, _thread_in_vm);

  assert(JavaThread::current() == this, "sanity check");
  assert(!Thread::current()->owns_locks(), "sanity check");

  DTRACE_THREAD_PROBE(start, this);

  // This operation might block. We call that after all safepoint checks for a new thread has
  // been completed.
  this->set_active_handles(JNIHandleBlock::allocate_block());

  if (JvmtiExport::should_post_thread_life()) {
    JvmtiExport::post_thread_start(this);
  }

  JFR_ONLY(Jfr::on_thread_start(this);)

  // We call another function to do the rest so we are sure that the stack addresses used
  // from there will be lower than the stack base just computed
  thread_main_inner();

  // Note, thread is no longer valid at this point!
}

JavaThread的run方法中，第一步就是调用this->initialize_tlab();方法初始化TLAB，initialize_tab实现如下：

  ThreadLocalAllocBuffer _tlab;                 // Thread-local eden
  
#define TLAB_FIELD_OFFSET(name) \
  static ByteSize tlab_##name##_offset()   { return byte_offset_of(Thread, _tlab) + ThreadLocalAllocBuffer::name##_offset(); }

  TLAB_FIELD_OFFSET(start)
  TLAB_FIELD_OFFSET(end)
  TLAB_FIELD_OFFSET(top)
  TLAB_FIELD_OFFSET(pf_top)
  TLAB_FIELD_OFFSET(size)                   // desired_size
  TLAB_FIELD_OFFSET(refill_waste_limit)
  TLAB_FIELD_OFFSET(number_of_refills)
  TLAB_FIELD_OFFSET(fast_refill_waste)
  TLAB_FIELD_OFFSET(slow_allocations)

#undef TLAB_FIELD_OFFSET

  // Thread-Local Allocation Buffer (TLAB) support
  ThreadLocalAllocBuffer& tlab()                 { return _tlab; }
  void initialize_tlab() {
    if (UseTLAB) {
      tlab().initialize();
    }
  }

上面的_tlab是Thread类的成员变量，保存在栈中，其中Thread是JavaThread的父类。上面的TLAB_FIELD_OFFSET宏定义函数，TLAB_FIELD_OFFSET(start)等价于如下代码：

static ByteSize tlab_start_offset()
{ 
    return byte_offset_of(Thread, _tlab) + ThreadLocalAllocBuffer::start_offset(); 
}

其中tlab()返回的就是一个ThreadLocalAllocBuffer对象，调用initialize()初始化TLAB，实现如下：

void ThreadLocalAllocBuffer::initialize() {
  initialize(NULL,                    // start
             NULL,                    // top
             NULL);                   // end

  set_desired_size(initial_desired_size());

  // Following check is needed because at startup the main
  // thread is initialized before the heap is.  The initialization for
  // this thread is redone in startup_initialization below.
  if (Universe::heap() != NULL) {
    size_t capacity   = Universe::heap()->tlab_capacity(myThread()) / HeapWordSize;
    // Keep alloc_frac as float and not double to avoid the double to float conversion
    float alloc_frac = desired_size() * target_refills() / (float) capacity;
    _allocation_fraction.sample(alloc_frac);
  }

  set_refill_waste_limit(initial_refill_waste_limit());

  initialize_statistics();
}

void ThreadLocalAllocBuffer::initialize(HeapWord* start,
                                        HeapWord* top,
                                        HeapWord* end) {
  set_start(start);
  set_top(top);
  set_pf_top(top);
  set_end(end);
  invariants();
}

  void set_start(HeapWord* start)                { _start = start; }
  void set_end(HeapWord* end)                    { _end = end; }
  void set_top(HeapWord* top)                    { _top = top; }
  void set_pf_top(HeapWord* pf_top)              { _pf_top = pf_top; }

  void invariants() const { assert(top() >= start() && top() <= end(), "invalid tlab"); }

1、设置_start、_end、_top和_pf_top都为null；

2、设置当前TLAB的_desired_size,该值通过initial_desired_size()方法计算；
3、设置当前的TLAB的_refill_waste_limit，该值通过initial_refill_waste_limit()方法计算；
4、初始化一些统计字段，如_number_of_refills、_fast_refill_waste、_slow_refill_waste、_gc_waste和_slow_allocations；

字段_desired_size的计算过程分析

size_t ThreadLocalAllocBuffer::initial_desired_size() {
  size_t init_sz = 0;

  if (TLABSize > 0) {
    init_sz = TLABSize / HeapWordSize;
  } else if (global_stats() != NULL) {
    // Initial size is a function of the average number of allocating threads.
    unsigned nof_threads = global_stats()->allocating_threads_avg();

    init_sz  = (Universe::heap()->tlab_capacity(myThread()) / HeapWordSize) /
                      (nof_threads * target_refills());
    init_sz = align_object_size(init_sz);
  }
  init_sz = MIN2(MAX2(init_sz, min_size()), max_size());
  return init_sz;
}

TLABSize在argument模块中默认会设置大小为 256 * K，也可以通过JVM参数选择进行设置，不过即使设置了也会和一个最大值max_size进行比较，然后取一个较小值，其中max_size计算如下：

static const size_t min_size()
{
    return align_object_size(MinTLABSize / HeapWordSize) + alignment_reserve(); 
}
static const size_t max_size()
{ 
    assert(_max_size != 0, "max_size not set up"); 
    return _max_size; 
}

size_t CollectedHeap::max_tlab_size() const {
  // TLABs can't be bigger than we can fill with a int[Integer.MAX_VALUE].
  // This restriction could be removed by enabling filling with multiple arrays.
  // If we compute that the reasonable way as
  //    header_size + ((sizeof(jint) * max_jint) / HeapWordSize)
  // we'll overflow on the multiply, so we do the divide first.
  // We actually lose a little by dividing first,
  // but that just makes the TLAB  somewhat smaller than the biggest array,
  // which is fine, since we'll be able to fill that.
  size_t max_int_size = typeArrayOopDesc::header_size(T_INT) +
              sizeof(jint) *
              ((juint) max_jint / (size_t) HeapWordSize);
  return align_size_down(max_int_size, MinObjAlignment);
}

jint Universe::initialize_heap() {

  if (UseParallelGC) {
#if INCLUDE_ALL_GCS
    Universe::_collectedHeap = new ParallelScavengeHeap();
#else  // INCLUDE_ALL_GCS
    fatal("UseParallelGC not supported in this VM.");
#endif // INCLUDE_ALL_GCS

  } else if (UseG1GC) {
#if INCLUDE_ALL_GCS
    G1CollectorPolicyExt* g1p = new G1CollectorPolicyExt();
    g1p->initialize_all();
    G1CollectedHeap* g1h = new G1CollectedHeap(g1p);
    Universe::_collectedHeap = g1h;
#else  // INCLUDE_ALL_GCS
    fatal("UseG1GC not supported in java kernel vm.");
#endif // INCLUDE_ALL_GCS

  } else {
    GenCollectorPolicy *gc_policy;

    if (UseSerialGC) {
      gc_policy = new MarkSweepPolicy();
    } else if (UseConcMarkSweepGC) {
#if INCLUDE_ALL_GCS
      if (UseAdaptiveSizePolicy) {
        gc_policy = new ASConcurrentMarkSweepPolicy();
      } else {
        gc_policy = new ConcurrentMarkSweepPolicy();
      }
#else  // INCLUDE_ALL_GCS
    fatal("UseConcMarkSweepGC not supported in this VM.");
#endif // INCLUDE_ALL_GCS
    } else { // default old generation
      gc_policy = new MarkSweepPolicy();
    }
    gc_policy->initialize_all();

    Universe::_collectedHeap = new GenCollectedHeap(gc_policy);
  }

  ThreadLocalAllocBuffer::set_max_size(Universe::heap()->max_tlab_size());

  jint status = Universe::heap()->initialize();
  if (status != JNI_OK) {
    return status;
  }

#ifdef _LP64
  if (UseCompressedOops) {
    // Subtract a page because something can get allocated at heap base.
    // This also makes implicit null checking work, because the
    // memory+1 page below heap_base needs to cause a signal.
    // See needs_explicit_null_check.
    // Only set the heap base for compressed oops because it indicates
    // compressed oops for pstack code.
    if (((uint64_t)Universe::heap()->reserved_region().end() > OopEncodingHeapMax)) {
      // Can't reserve heap below 32Gb.
      // keep the Universe::narrow_oop_base() set in Universe::reserve_heap()
      Universe::set_narrow_oop_shift(LogMinObjAlignmentInBytes);
#ifdef AIX
      // There is no protected page before the heap. This assures all oops
      // are decoded so that NULL is preserved, so this page will not be accessed.
      Universe::set_narrow_oop_use_implicit_null_checks(false);
#endif
    } else {
      Universe::set_narrow_oop_base(0);
#ifdef _WIN64
      if (!Universe::narrow_oop_use_implicit_null_checks()) {
        // Don't need guard page for implicit checks in indexed addressing
        // mode with zero based Compressed Oops.
        Universe::set_narrow_oop_use_implicit_null_checks(true);
      }
#endif //  _WIN64
      if((uint64_t)Universe::heap()->reserved_region().end() > UnscaledOopHeapMax) {
        // Can't reserve heap below 4Gb.
        Universe::set_narrow_oop_shift(LogMinObjAlignmentInBytes);
      } else {
        Universe::set_narrow_oop_shift(0);
      }
    }

    Universe::set_narrow_ptrs_base(Universe::narrow_oop_base());

    if (PrintCompressedOopsMode || (PrintMiscellaneous && Verbose)) {
      Universe::print_compressed_oops_mode(tty);
    }
  }
  // Universe::narrow_oop_base() is one page below the heap.
  assert((intptr_t)Universe::narrow_oop_base() <= (intptr_t)(Universe::heap()->base() -
         os::vm_page_size()) ||
         Universe::narrow_oop_base() == NULL, "invalid value");
  assert(Universe::narrow_oop_shift() == LogMinObjAlignmentInBytes ||
         Universe::narrow_oop_shift() == 0, "invalid value");
#endif

  // We will never reach the CATCH below since Exceptions::_throw will cause
  // the VM to exit if an exception is thrown during initialization

  if (UseTLAB) {
    assert(Universe::heap()->supports_tlab_allocation(),
           "Should support thread-local allocation buffers");
    ThreadLocalAllocBuffer::startup_initialization();
  }
  return JNI_OK;
}

上述代码中，Universe::initialize_heap() 被调用的时候，会调用ThreadLocalAllocBuffer::set_max_size(Universe::heap()->max_tlab_size())计算并设置最大TLAB的max_size；如果开启UseTLAB，则会调用ThreadLocalAllocBuffer::startup_initialization()进行初始化。

void ThreadLocalAllocBuffer::startup_initialization() {

  // Assuming each thread's active tlab is, on average,
  // 1/2 full at a GC
  _target_refills = 100 / (2 * TLABWasteTargetPercent);
  _target_refills = MAX2(_target_refills, (unsigned)1U);

  _global_stats = new GlobalTLABStats();

  // During jvm startup, the main thread is initialized
  // before the heap is initialized.  So reinitialize it now.
  guarantee(Thread::current()->is_Java_thread(), "tlab initialization thread not Java thread");
  Thread::current()->tlab().initialize();

  if (PrintTLAB && Verbose) {
    gclog_or_tty->print("TLAB min: " SIZE_FORMAT " initial: " SIZE_FORMAT " max: " SIZE_FORMAT "\n",
                        min_size(), Thread::current()->tlab().initial_desired_size(), max_size());
  }
}

字段_refill_waste_limit计算分析

size_t initial_refill_waste_limit()
{
    return desired_size() / TLABRefillWasteFraction;
}

计算逻辑很简单，其中TLABRefillWasteFraction默认64。

内存分配

new一个对象，假设需要1K的大小，我们一步一步看看是如何分配的。

instanceOop InstanceKlass::allocate_instance(TRAPS) {
  bool has_finalizer_flag = has_finalizer(); // Query before possible GC
  int size = size_helper();  // Query before forming handle.

  KlassHandle h_k(THREAD, this);

  instanceOop i;

  i = (instanceOop)CollectedHeap::obj_allocate(h_k, size, CHECK_NULL);
  if (has_finalizer_flag && !RegisterFinalizersAtInit) {
    i = register_finalizer(i, CHECK_NULL);
  }
  return i;
}

对象的内存分配入口为instanceKlass::allocate_instance()，通过CollectedHeap::obj_allocate()方法在堆内存上进行分配:

oop CollectedHeap::obj_allocate(KlassHandle klass, int size, TRAPS) {
  debug_only(check_for_valid_allocation_state());
  assert(!Universe::heap()->is_gc_active(), "Allocation during gc not allowed");
  assert(size >= 0, "int won't convert to size_t");
  HeapWord* obj = common_mem_allocate_init(klass, size, CHECK_NULL);
  post_allocation_setup_obj(klass, obj, size);
  NOT_PRODUCT(Universe::heap()->check_for_bad_heap_word_value(obj, size));
  return (oop)obj;
}

HeapWord* CollectedHeap::common_mem_allocate_init(KlassHandle klass, size_t size, TRAPS) {
  HeapWord* obj = common_mem_allocate_noinit(klass, size, CHECK_NULL);
  init_obj(obj, size);
  return obj;
}

HeapWord* CollectedHeap::common_mem_allocate_noinit(KlassHandle klass, size_t size, TRAPS) {

  // Clear unhandled oops for memory allocation.  Memory allocation might
  // not take out a lock if from tlab, so clear here.
  CHECK_UNHANDLED_OOPS_ONLY(THREAD->clear_unhandled_oops();)

  if (HAS_PENDING_EXCEPTION) {
    NOT_PRODUCT(guarantee(false, "Should not allocate with exception pending"));
    return NULL;  // caller does a CHECK_0 too
  }

  HeapWord* result = NULL;
  if (UseTLAB) {
    result = allocate_from_tlab(klass, THREAD, size);
    if (result != NULL) {
      assert(!HAS_PENDING_EXCEPTION,
             "Unexpected exception, will result in uninitialized storage");
      return result;
    }
  }
  bool gc_overhead_limit_was_exceeded = false;
  result = Universe::heap()->mem_allocate(size,
                                          &gc_overhead_limit_was_exceeded);
  if (result != NULL) {
    NOT_PRODUCT(Universe::heap()->
      check_for_non_bad_heap_word_value(result, size));
    assert(!HAS_PENDING_EXCEPTION,
           "Unexpected exception, will result in uninitialized storage");
    THREAD->incr_allocated_bytes(size * HeapWordSize);

    return result;
  }


  if (!gc_overhead_limit_was_exceeded) {
    // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support
    report_java_out_of_memory("Java heap space");

    if (JvmtiExport::should_post_resource_exhausted()) {
      JvmtiExport::post_resource_exhausted(
        JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
        "Java heap space");
    }

    THROW_OOP_0(Universe::out_of_memory_error_java_heap());
  } else {
    // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support
    report_java_out_of_memory("GC overhead limit exceeded");

    if (JvmtiExport::should_post_resource_exhausted()) {
      JvmtiExport::post_resource_exhausted(
        JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
        "GC overhead limit exceeded");
    }

    THROW_OOP_0(Universe::out_of_memory_error_gc_overhead_limit());
  }
}

如果启用TLAB，则调用allocate_from_tlab分配内存，否则调用Universe::heap()->mem_allocate()函数分配内存；allocate_from_tlab函数实现逻辑如下：

HeapWord* CollectedHeap::allocate_from_tlab(KlassHandle klass, Thread* thread, size_t size) {
  assert(UseTLAB, "should use UseTLAB");

  HeapWord* obj = thread->tlab().allocate(size);
  if (obj != NULL) {
    return obj;
  }
  // Otherwise...
  return allocate_from_tlab_slow(klass, thread, size);
}

从TLAB已分配的缓冲区空间直接分配对象，也称为指针碰撞法分配，其方法非常简单，在TLAB中保存一个top指针用于标记当前对象分配的位置，如果剩余空间（end-top）大于待分配对象的空间（objSize），则直接修改top=top+ObjSize，相关代码位于thread->tlab().allocate(size)中。对于分配失败，处理稍微麻烦一些，相关代码位于allocate_from_tlab_slow()中。

inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
  invariants();
  HeapWord* obj = top();
  if (pointer_delta(end(), obj) >= size) {
    // successful thread-local allocation
#ifdef ASSERT
    // Skip mangling the space corresponding to the object header to
    // ensure that the returned space is not considered parsable by
    // any concurrent GC thread.
    size_t hdr_size = oopDesc::header_size();
    Copy::fill_to_words(obj + hdr_size, size - hdr_size, badHeapWordVal);
#endif // ASSERT
    // This addition is safe because we know that top is
    // at least size below end, so the add can't wrap.
    set_top(obj + size);

    invariants();
    return obj;
  }
  return NULL;
}

如果TLAB过小，那么TLAB则不能存储更多的对象，所以可能需要不断地重新分配新的TLAB。但是如果TLAB过大，则可能导致内存碎片问题。假设TLAB大小为1M，Eden为200M。如果有40个线程，每个线程分配1个TLAB，TLAB被填满之后，发生GC。假设TLAB中对象分配符合均匀分布，那么发生GC时，TLAB总的大小为：40×1×0.5=20M（Eden的10%左右），这意味着Eden还有很多空间时就发生了GC，这并不是我们想要的。最直观的想法是增加TLAB的大小或者增加线程的个数，这样TLAB在分配的时候效率会更高，但是在GC回收的时候则可能花费更长的时间。因此JVM提供了参数TLABSize用于控制TLAB的大小，如果我们设置了这个值，那么JVM就会使用这个值来初始化TLAB的大小。但是这样设置不够优雅，其实TLABSize默认值是0，也就是说JVM会推断这个值多大更合适。采用的参数为TLABWasteTargetPercent，用于设置TLAB可占用的Eden空间的百分比，默认值1%，推断方式为TLABSize=Eden×2×1%/线程个数（乘以2是因为假设其内存使用服从均匀分布），G1中是通过下面的公式计算的：

size_t ThreadLocalAllocBuffer::initial_desired_size() {
  size_t init_sz = 0;

  if (TLABSize > 0) {
    init_sz = TLABSize / HeapWordSize;
  } else if (global_stats() != NULL) {
    // Initial size is a function of the average number of allocating threads.
    unsigned nof_threads = global_stats()->allocating_threads_avg();

    init_sz  = (Universe::heap()->tlab_capacity(myThread()) / HeapWordSize) /
                      (nof_threads * target_refills());
    init_sz = align_object_size(init_sz);
  }
  init_sz = MIN2(MAX2(init_sz, min_size()), max_size());
  return init_sz;
}

其中，tlab_capacity在G1CollectedHeap中实现，代码如下所示：

size_t G1CollectedHeap::tlab_capacity(Thread* ignored) const {
  return (_g1_policy->young_list_target_length() - young_list()->survivor_length()) * HeapRegion::GrainBytes;
}

简单来说，tlab_capacity就是Eden所有可用的区域。另外要注意的是，这里采用的启发式推断也仅仅是一个近似值，实际上线程在使用内存分配对象时并不是无关的（不完全服从均匀分布），另外不同的线程类型对内存的使用也不同，比如一些调度线程、监控线程等几乎不会分配新的对象。

其中target_refills()函数返回ThreadLocalAllocBuffer类的静态变量_target_refills，该值在startup_initialization()函数中被初始化，为MAX2(100 / (2 * TLABWasteTargetPercent), (unsigned)1U)。TLAB默认是eden区的1%，可以通过选项-XX:TLABWasteTargetPercent设置TLAB空间所占用Eden空间的百分比大小。TLABWasteTargetPercent默认值为1时，_target_refills=50。

在Java对象分配时，我们总希望它位于TLAB中，如果TLAB满了之后，如何处理呢？前面提到TLAB其实就是Eden的一块区域，在G1中就是HeapRegion的一块空闲区域。所以TLAB满了之后无须做额外的处理，直接保留这一部分空间，重新在Eden/堆分区中分配一块空间给TLAB，然后再在TLAB分配具体的对象。但这里会有两个小问题。

1.如何判断TLAB满了？按照前面的例子TLAB是1M，当我们使用800K，还是900K，还是950K时被认为满了？问题的答案是如何寻找最大的可能分配对象和减少内存碎片的平衡。实际上虚拟机内部会维护一个叫做refill_waste的值，当请求对象大于refill_waste时，会选择在堆中分配，若小于该值，则会废弃当前TLAB，新建TLAB来分配对象。这个阈值可以使用TLABRefillWasteFraction来调整，它表示TLAB中允许产生这种浪费的比例。默认值为64，即表示使用约为1/64的TLAB空间作为refill_waste，在我们的这个例子中，refill_waste的初始值为16K，即TLAB中已经被使用内存加上待分配内存大于（1M-16k=1024-16=1008K）1008K内存时直接分配一个新的，否则尽量使用这个老的TLAB。

2.如何调整TLAB如果要分配的内存大于TLAB剩余的空间则直接在Eden/HeapRegion中分配。那么这个1/64是否合适？会不会太小，比如通常分配的对象大多是20K，最后剩下16K，这样导致每次都进入Eden/堆分区慢速分配中。所以，JVM还提供了一个参数TLAB WasteIncrement（默认值为4个字）用于动态增加这个refill_waste的值。默认情况下，TLAB大小和refill_waste都会在运行时不断调整，使系统的运行状态达到最优。在动态调整的过程中，也不能无限制变更，所以JVM提供MinTLABSize（默认值2K）用于控制最小值，对于G1来说，由于大对象都不在新生代分区，所以TLAB也不能分配大对象，HeapRegion/2就会被认定为大对象，所以TLAB肯定不会超过HeapRegionSize的一半。

如果想要禁用自动调整TLAB的大小，可以使用-XX:-ResizeTLAB禁用ResizeTLAB，并使用-XX:TLABSize手工指定一个TLAB的大小。-XX:+PrintTLAB可以跟踪TLAB的使用情况。一般不建议手工修改TLAB相关参数，推荐使用虚拟机默认行为。继续来看TLAB中的慢速分配，主要的步骤有：·TLAB的剩余空间是否太小，如果很小，即说明这个空间通常不满足对象的分配，所以最好丢弃，丢弃的方法就是填充一个dummy对象，然后申请新的TLAB来分配对象。·如果不能丢弃，说明TLAB剩余空间并不小，能满足很多对象的分配，所以不能丢弃这个TLAB，否则内存浪费很多，此时可以把对象分配到堆中，不使用TLAB分配，所以可以直接返回。

TLAB慢速分配代码如下所示：

HeapWord* CollectedHeap::allocate_from_tlab_slow(KlassHandle klass, Thread* thread, size_t size) {

  // 判断TLAB尚未分配的剩余空间是否可以丢掉。如果剩余空间大于阈值则保留，其中阈值为
  // refill_waste_limit，它由desired_size和参数TLABRefillWasteFraction计算得到
  if (thread->tlab().free() > thread->tlab().refill_waste_limit()) {
    // 不能丢掉，根据TLABWasteIncrement更新refill_waste的阈值
    thread->tlab().record_slow_allocation(size);
    // 返回NULL，说明在Eden/HeapRegion中分配
    return NULL;
  }

  // 说明TLAB剩余空间很小了，所以要重新分配一个TLAB。老的TLAB不用处理，因为它属于Eden，
  // GC可以正确回收空间
  size_t new_tlab_size = thread->tlab().compute_size(size);
  // 分配之前先清理老的TLAB，其目的就是为了让堆保持parsable可解析
  thread->tlab().clear_before_allocation();

  if (new_tlab_size == 0) {
    return NULL;
  }

  // 分配一个新的TLAB...
  HeapWord* obj = Universe::heap()->allocate_new_tlab(new_tlab_size);
  if (obj == NULL) {
    return NULL;
  }
  // 是否把内存空间清零
  if (ZeroTLAB) {
    // ..and clear it.
    Copy::zero_to_words(obj, new_tlab_size);
  } else {
    // ...and zap just allocated object.
#ifdef ASSERT
    // Skip mangling the space corresponding to the object header to
    // ensure that the returned space is not considered parsable by
    // any concurrent GC thread.
    size_t hdr_size = oopDesc::header_size();
    Copy::fill_to_words(obj + hdr_size, new_tlab_size - hdr_size, badHeapWordVal);
#endif // ASSERT
  }
  // 分配对象，并设置TLAB的start、top、end等信息
  thread->tlab().fill(obj, obj + size, new_tlab_size);
  return obj;
}

为什么要对老的TLAB做清理动作？TLAB存储的都是已经分配的对象，为什么要清理以及清理什么？其实这里的清理就是把尚未分配的空间分配一个对象（通常是一个int[]），那么为什么要分配一个垃圾对象？代码说明是为了栈解析（Heap Parsable），Heap Parsable是什么？为什么需要设置？下面继续分析。

void ThreadLocalAllocBuffer::clear_before_allocation() {
  _slow_refill_waste += (unsigned)remaining();
  make_parsable(true);   // also retire the TLAB
}

 size_t remaining() const
{ 
    return end() == NULL ? 0 : pointer_delta(hard_end(), top()); 
}

// Fills the current tlab with a dummy filler array to create
// an illusion of a contiguous Eden and optionally retires the tlab.
// Waste accounting should be done in caller as appropriate; see,
// for example, clear_before_allocation().
void ThreadLocalAllocBuffer::make_parsable(bool retire) {
  if (end() != NULL) {
    invariants();

    if (retire) {
      myThread()->incr_allocated_bytes(used_bytes());
    }

    CollectedHeap::fill_with_object(top(), hard_end(), retire);

    if (retire || ZeroTLAB) {  // "Reset" the TLAB
      set_start(NULL);
      set_top(NULL);
      set_pf_top(NULL);
      set_end(NULL);
    }
  }
  assert(!(retire || ZeroTLAB)  ||
         (start() == NULL && end() == NULL && top() == NULL),
         "TLAB must be reset");
}

内存管理器（GC）在进行某些需要线性扫描堆里对象的操作时，比如，查看HeapRegion对象、并行标记等，需要知道堆里哪些地方有对象，而哪些地方是空白。对于对象，扫描之后可以直接跳过对象的长度，对于空白的地方只能一个字一个字地扫描，这会非常慢。所以可以把这块空白的地方也分配一个dummy对象（哑元对象），这样GC在线性遍历时就能做到快速遍历了。这样的话就能统一处理，示例代码如下：

HeapWord* cur = heap_start;
while (cur < heap_used) {
  object o = (object)cur;
  do_object(o);
  cur = cur + o->size();
}

具体我们可以在新生代垃圾回收的时候再来验证这一点。我们再看一下如何申请一个新的TLAB缓冲区，代码如下所示：

HeapWord* G1CollectedHeap::allocate_new_tlab(size_t word_size) {
  assert_heap_not_locked_and_not_at_safepoint();
  assert(!isHumongous(word_size), "we do not allow humongous TLABs");

  uint dummy_gc_count_before;
  uint dummy_gclocker_retry_count = 0;
  return attempt_allocation(word_size, &dummy_gc_count_before, &dummy_gclocker_retry_count);
}

它最终会调用到G1CollectedHeap中分配，其分配主要是在attempt_allocation完成的，步骤也分为两步：快速无锁分配和慢速分配。

inline HeapWord* G1CollectedHeap::attempt_allocation(size_t word_size,
                                                     uint* gc_count_before_ret,
                                                     uint* gclocker_retry_count_ret) {
  assert_heap_not_locked_and_not_at_safepoint();
  assert(!isHumongous(word_size), "attempt_allocation() should not "
         "be called for humongous allocation requests");

  AllocationContext_t context = AllocationContext::current();
  HeapWord* result = _allocator->mutator_alloc_region(context)->attempt_allocation(word_size,
                                                                                   false /* bot_updates */);
  if (result == NULL) {
    result = attempt_allocation_slow(word_size,
                                     context,
                                     gc_count_before_ret,
                                     gclocker_retry_count_ret);
  }
  assert_heap_not_locked();
  if (result != NULL) {
    dirty_young_block(result, word_size);
  }
  return result;
}

快速无锁分配：指的是在当前可以分配的堆分区中使用CAS来获取一块内存，如果成功则可以作为TLAB的空间。因为使用CAS可以并行分配，当然也有可能不成功。对于不成功则进行慢速分配，代码如下所示：

  // First-level allocation: Should be called without holding a
  // lock. It will try to allocate lock-free out of the active region,
  // or return NULL if it was unable to.
inline HeapWord* G1AllocRegion::attempt_allocation(size_t word_size,
                                                   bool bot_updates) {
  assert(bot_updates == _bot_updates, ar_ext_msg(this, "pre-condition"));

  HeapRegion* alloc_region = _alloc_region;
  assert(alloc_region != NULL, ar_ext_msg(this, "not initialized properly"));

  HeapWord* result = par_allocate(alloc_region, word_size, bot_updates);
  if (result != NULL) {
    trace("alloc", word_size, result);
    return result;
  }
  trace("alloc failed", word_size);
  return NULL;
}

inline HeapWord* G1AllocRegion::par_allocate(HeapRegion* alloc_region,
                                             size_t word_size,
                                             bool bot_updates) {
  assert(alloc_region != NULL, err_msg("pre-condition"));
  assert(!alloc_region->is_empty(), err_msg("pre-condition"));

  if (!bot_updates) {
    return alloc_region->par_allocate_no_bot_updates(word_size);
  } else {
    return alloc_region->par_allocate(word_size);
  }
}

inline HeapWord* HeapRegion::par_allocate_no_bot_updates(size_t word_size) {
  assert(is_young(), "we can only skip BOT updates on young regions");
  return par_allocate_impl(word_size, end());
}

// This version is lock-free.
inline HeapWord* G1OffsetTableContigSpace::par_allocate_impl(size_t size,
                                                    HeapWord* const end_value) {
  do {
    HeapWord* obj = top();
    if (pointer_delta(end_value, obj) >= size) {
      HeapWord* new_top = obj + size;
      HeapWord* result = (HeapWord*)Atomic::cmpxchg_ptr(new_top, top_addr(), obj);
      // result can be one of two:
      //  the old top value: the exchange succeeded
      //  otherwise: the new value of the top is returned.
      if (result == obj) {
        assert(is_aligned(obj) && is_aligned(new_top), "checking alignment");
        return obj;
      }
    } else {
      return NULL;
    }
  } while (true);
}

对于不成功则进行慢速分配，慢速分配需要尝试对Heap加锁，扩展新生代区域或垃圾回收等处理后再分配。

首先尝试对堆分区进行加锁分配，成功则返回，在attempt_allocation_locked完成。
不成功，则判定是否可以对新生代分区进行扩展，如果可以扩展则扩展后再分配TLAB，成功则返回，在attempt_allocation_force完成。
不成功，判定是否可以进行垃圾回收，如果可以进行垃圾回收后再分配，成功则返回，在do_collection_pause完成。
不成功，如果尝试分配次数达到阈值（默认值是2次）则返回失败。
如果还可以继续尝试，再次判定是否进行快速分配，如果成功则返回。
不成功重新再尝试一次，直到成功或者达到阈值失败。

所以慢速分配要么成功分配，要么尝试次数达到阈值后结束并返回NULL。代码如下：

HeapWord* G1CollectedHeap::attempt_allocation_slow(size_t word_size,
                                                   AllocationContext_t context,
                                                   uint* gc_count_before_ret,
                                                   uint* gclocker_retry_count_ret) {
  // Make sure you read the note in attempt_allocation_humongous().

  assert_heap_not_locked_and_not_at_safepoint();
  assert(!isHumongous(word_size), "attempt_allocation_slow() should not "
         "be called for humongous allocation requests");

  // We should only get here after the first-level allocation attempt
  // (attempt_allocation()) failed to allocate.

  // We will loop until a) we manage to successfully perform the
  // allocation or b) we successfully schedule a collection which
  // fails to perform the allocation. b) is the only case when we'll
  // return NULL.
  HeapWord* result = NULL;
  for (int try_count = 1; /* we'll return */; try_count += 1) {
    bool should_try_gc;
    uint gc_count_before;

    {
      // 加锁分配
      MutexLockerEx x(Heap_lock);
      result = _allocator->mutator_alloc_region(context)->attempt_allocation_locked(word_size,
                                                                                    false /* bot_updates */);
      if (result != NULL) {
        return result;
      }

      // If we reach here, attempt_allocation_locked() above failed to
      // allocate a new region. So the mutator alloc region should be NULL.
      assert(_allocator->mutator_alloc_region(context)->get() == NULL, "only way to get here");

      if (GC_locker::is_active_and_needs_gc()) {
        if (g1_policy()->can_expand_young_list()) {
          // No need for an ergo verbose message here,
          // can_expand_young_list() does this when it returns true.
          result = _allocator->mutator_alloc_region(context)->attempt_allocation_force(word_size,
                                                                                       false /* bot_updates */);
          if (result != NULL) {
            return result;
          }
        }
        should_try_gc = false;
      } else {
        // The GCLocker may not be active but the GCLocker initiated
        // GC may not yet have been performed (GCLocker::needs_gc()
        // returns true). In this case we do not try this GC and
        // wait until the GCLocker initiated GC is performed, and
        // then retry the allocation.
        if (GC_locker::needs_gc()) {
          should_try_gc = false;
        } else {
          // Read the GC count while still holding the Heap_lock.
          gc_count_before = total_collections();
          should_try_gc = true;
        }
      }
    }

    if (should_try_gc) {
      // GCLocker没有进入临界区，可以进行垃圾回收
      bool succeeded;
      result = do_collection_pause(word_size, gc_count_before, &succeeded,
                                   GCCause::_g1_inc_collection_pause);
      if (result != NULL) {
        assert(succeeded, "only way to get back a non-NULL result");
        return result;
      }

      if (succeeded) {
        // 稍后可以进行回收，可以先返回
        // If we get here we successfully scheduled a collection which
        // failed to allocate. No point in trying to allocate
        // further. We'll just return NULL.
        MutexLockerEx x(Heap_lock);
        *gc_count_before_ret = total_collections();
        return NULL;
      }
    } else {
      // JNI进入临界区中，判断是否达到分配次数阈值
      if (*gclocker_retry_count_ret > GCLockerRetryAllocationCount) {
        MutexLockerEx x(Heap_lock);
        *gc_count_before_ret = total_collections();
        return NULL;
      }
      // The GCLocker is either active or the GCLocker initiated
      // GC has not yet been performed. Stall until it is and
      // then retry the allocation.
      GC_locker::stall_until_clear();
      (*gclocker_retry_count_ret) += 1;
    }

    // We can reach here if we were unsuccessful in scheduling a
    // collection (because another thread beat us to it) or if we were
    // stalled due to the GC locker. In either can we should retry the
    // allocation attempt in case another thread successfully
    // performed a collection and reclaimed enough space. We do the
    // first attempt (without holding the Heap_lock) here and the
    // follow-on attempt will be at the start of the next loop
    // iteration (after taking the Heap_lock).
    // 可能因为其他线程正在分配或者GCLocker正在被竞争使用等，
    // 在进行加锁分配前再尝试进行无锁分配 
    result = _allocator->mutator_alloc_region(context)->attempt_allocation(word_size,
                                                                           false /* bot_updates */);
    if (result != NULL) {
      return result;
    }

    // Give a warning if we seem to be looping forever.
    if ((QueuedAllocationWarningCount > 0) &&
        (try_count % QueuedAllocationWarningCount == 0)) {
      warning("G1CollectedHeap::attempt_allocation_slow() "
              "retries %d times", try_count);
    }
  }

  ShouldNotReachHere();
  return NULL;
}

这里GCLocker是与JNI相关的。简单来说Java代码可以和本地代码交互，在访问JNI代码时，因为JNI代码可能会进入临界区，所以此时会阻止GC垃圾回收。这部分知识相对独立，有关GCLocker的知识可以参看其他文章。

日志及解读

从一个Java的例子出发，代码如下：

public class Test {
  private static final LinkedList<String> strings = new LinkedList<>();
  public static void main(String[] args) throws Exception {
    int iteration = 0;
    while (true) {
      for (int i = 0; i < 100; i++) {
        for (int j = 0; j < 10; j++) {
          strings.add(new String("String " + j));
        }
      }
      Thread.sleep(100);
    }
  }
}

通过命令设置参数，如下所示：

-Xmx128M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
  -XX:+PrintTLAB -XX:+UnlockExperimentalVMOptions -XX:G1LogLevel=finest

可以得到：

garbage-first heap   total 131072K, used 37569K [0x00000000f8000000, 0x00000000f8100400, 0x0000000100000000) region size 1024K, 24 young (24576K), 0 survivors (0K)
TLAB: gc thread: 0x0000000059ade800 [id: 16540] desired_size: 491KB slow 
  allocs: 8  refill waste: 7864B alloc: 0.99999    24576KB refills: 50 
  waste  0.0% gc: 0B slow: 816B fast: 0Bd

对于多线程的情况，这里还会有每个线程的输出结果以及一个总结信息。由于篇幅的关系此处都已经省略。下面我们分析日志中TLAB这个信息的每一个字段含义：

desired_size为期望分配的TLAB的大小，这个值就是我们前面提到如何计算TLABSize的方式。在这个例子中，第一次的时候，不知道会有多少线程，所以初始化为1，desired_size=24576/50=491.5KB这个值是经过取整的。
slow allocs为发生慢速分配的次数，日志中显示有8次分配到heap而没有使用TLAB。
refill waste为retire一个TLAB的阈值。·alloc为该线程在堆分区分配的比例。
refills发生的次数，这里是50，表示从上一次GC到这次GC期间，一共retire过50个TLAB块，在每一个TLAB块retire的时候都会做一次refill把尚未使用的内存填充为dummy对象。
waste由3个部分组成：
gc：发生GC时还没有使用的TLAB的空间。
slow：产生新的TLAB时，旧的TLAB浪费的空间，这里就是新生成50个TLAB，浪费了816个字节。
fast：指的是在C1中，发生TLAB retire（产生新的TLAB）时，旧的TLAB浪费的空间。

学海_无涯_苦作舟

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JVM G1源码分析——快速分配TLAB

TLAB全称ThreadLocalAllocationBuffer，是线程的一块私有内存，如果设置了虚拟机参数 -XX：UseTLAB，在线程初始化时，同时也会申请一块指定大小的内存，只给当前线程使用，这样每个线程都单独拥有一个Buffer，如果需要分配内存，就在自己的Buffer上分配，这样就不会存在竞争的情况，可以大大提升分配效率，当Buffer容量不够的时候，再重新从Eden区域申请一个继续使用，这个申请动作还是需要原子操作的。
复制链接

扫一扫

专栏目录