1. 简介
当开发者使用JAVA语言实例化一个对象时,排除JIT的标量替换等优化手段,该对象会在JAVA Heap上分配存储空间。
分配空间时,为了提高JVM的运行效率,应当尽量减少临界区范围,避免全局锁。G1的通常的应用场景中,会存在大量的Mutator同时执行,为减少锁冲突,引入了TLAB(线程本地分配缓冲区 Thread Local Allocation Buffer)机制。
G1支持基于TLAB的快速分配,当TLAB快速分配失败时,使用TLAB外慢速分配。
2. 分配流程
空间分配的入口在instanceKlass.cpp
instanceOop InstanceKlass::allocate_instance(TRAPS) {
bool has_finalizer_flag = has_finalizer(); // Query before possible GC
int size = size_helper(); // Query before forming handle.
instanceOop i;
i = (instanceOop)Universe::heap()->obj_allocate(this, size, CHECK_NULL);
if (has_finalizer_flag && !RegisterFinalizersAtInit) {
i = register_finalizer(i, CHECK_NULL);
}
return i;
}
- 判断是否重写了finalize方法,如是,则注册finalizer
- 调用CollectedHeap::obj_allocate分配
collectedHeap.cpp
oop CollectedHeap::obj_allocate(Klass* klass, int size, TRAPS) {
ObjAllocator allocator(klass, size, THREAD);
return allocator.allocate();
}
- 调用ObjAllocator::allocate分配
memAllocator.cpp
HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
if (UseTLAB) {
HeapWord* result = allocate_inside_tlab(allocation);
if (result != NULL) {
return result;
}
}
return allocate_outside_tlab(allocation);
}
- 如果开启了UseTLAB选项,则先尝试在TLAB中分配;JVM参数-XX:+UseTLAB、-XX:-UseTLAB开启或关闭TLAB
- 如果TLAB分配失败,则在TLAB外分配
对象堆空间分配的流程图如下:
3. TLAB分配
Eden区域是所有线程都可以访问的区域,为了快速分配内存,TLAB机制是通过给每个线程分配一个线程独享的缓冲区来减少锁的。
TLAB位于Eden区域中,所有的TLAB对于其他线程都是可见的,但是只有本地线程可以在其TLAB中分配空间。
另外,TLAB分配时,为了线程安全仍然需要加锁。
memAllocator.cpp
HeapWord* MemAllocator::allocate_inside_tlab(Allocation& allocation) const {
assert(UseTLAB, "should use UseTLAB");
// Try allocating from an existing TLAB.
HeapWord* mem = _thread->tlab().allocate(_word_size);
if (mem != NULL) {
return mem;
}
// Try refilling the TLAB and allocating the object in it.
return allocate_inside_tlab_slow(allocation);
}
- 在已有的TLAB中分配,逻辑较为简单,调用本地线程的tlab的allocate方法
- 如果在已有的TLAB中分配失败,则调用allocate_inside_tlab_slow,执行TLAB慢分配
threadLocalAllocBuffer.inline.hpp
inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
invariants();
HeapWord* obj = top();
if (pointer_delta(end(), obj) >= size) {
// successful thread-local allocation
#ifdef ASSERT
// Skip mangling the space corresponding to the object header to
// ensure that the returned space is not considered parsable by
// any concurrent GC thread.
size_t hdr_size = oopDesc::header_size();
Copy::fill_to_words(obj + hdr_size, size - hdr_size, badHeapWordVal);
#endif // ASSERT
// This addition is safe because we know that top is
// at least size below end, so the add can't wrap.
set_top(obj + size);
invariants();
return obj;
}
return NULL;
}
- 分配成功后,TLAB的top修改为当前的top+size
TLAB慢分配逻辑主要在allocate_inside_tlab_slow中:
HeapWord* MemAllocator::allocate_inside_tlab_slow(Allocation& allocation) const {
HeapWord* mem = NULL;
ThreadLocalAllocBuffer& tlab = _thread->tlab();
if (JvmtiExport::should_post_sampled_object_alloc()) {
// Try to allocate the sampled object from TLAB, it is possible a sample
// point was put and the TLAB still has space.
tlab.set_back_allocation_end();
mem = tlab.allocate(_word_size);
if (mem != NULL) {
allocation._tlab_end_reset_for_sample = true;
return mem;
}
}
// Retain tlab and allocate object in shared space if
// the amount free in the tlab is too large to discard.
if (tlab.free() > tlab.refill_waste_limit()) {
tlab.record_slow_allocation(_word_size);
return NULL;
}
// Discard tlab and allocate a new one.
// To minimize fragmentation, the last TLAB may be smaller than the rest.
size_t new_tlab_size = tlab.compute_size(_word_size);
tlab.retire_before_allocation();
if (new_tlab_size == 0) {
return NULL;
}
// Allocate a new TLAB requesting new_tlab_size. Any size
// between minimal and new_tlab_size is accepted.
size_t min_tlab_size = ThreadLocalAllocBuffer::compute_min_size(_word_size);
mem = _heap->allocate_new_tlab(min_tlab_size, new_tlab_size, &allocation._allocated_tlab_size);
if (mem == NULL) {
assert(allocation._allocated_tlab_size == 0,
"Allocation failed, but actual size was updated. min: " SIZE_FORMAT
", desired: " SIZE_FORMAT ", actual: " SIZE_FORMAT,
min_tlab_size, new_tlab_size, allocation._allocated_tlab_size);
return NULL;
}
assert(allocation._allocated_tlab_size != 0, "Allocation succeeded but actual size not updated. mem at: "
PTR_FORMAT " min: " SIZE_FORMAT ", desired: " SIZE_FORMAT,
p2i(mem), min_tlab_size, new_tlab_size);
if (ZeroTLAB) {
// ..and clear it.
Copy::zero_to_words(mem, allocation._allocated_tlab_size);
} else {
// ...and zap just allocated object.
#ifdef ASSERT
// Skip mangling the space corresponding to the object header to
// ensure that the returned space is not considered parsable by
// any concurrent GC thread.
size_t hdr_size = oopDesc::header_size();
Copy::fill_to_words(mem + hdr_size, allocation._allocated_tlab_size - hdr_size, badHeapWordVal);
#endif // ASSERT
}
tlab.fill(mem, mem + _word_size, allocation._allocated_tlab_size);
return mem;
}
- 判断是否TLAB未分配空间大于阈值,该阈值可以通过JVM参数-XX:TLABRefillWasteFraction指定,默认64分之一。如果TLAB未分配空间大于阈值,则保留此TLAB,进入TLAB外慢分配;如小于,则抛弃此TLAB重新申请TLAB
- 动态计算TLAB的size,并在将要抛弃的TLAB中填充dummy对象
- 申请一个新的TLAB
- 在新的TLAB中分配对象并返回
计算TLAB size逻辑如下:
inline size_t ThreadLocalAllocBuffer::compute_size(size_t obj_size) {
// Compute the size for the new TLAB.
// The "last" tlab may be smaller to reduce fragmentation.
// unsafe_max_tlab_alloc is just a hint.
const size_t available_size = Universe::heap()->unsafe_max_tlab_alloc(thread()) / HeapWordSize;
size_t new_tlab_size = MIN3(available_size, desired_size() + align_object_size(obj_size), max_size());
// Make sure there's enough room for object and filler int[].
if (new_tlab_size < compute_min_size(obj_size)) {
// If there isn't enough room for the allocation, return failure.
log_trace(gc, tlab)("ThreadLocalAllocBuffer::compute_size(" SIZE_FORMAT ") returns failure",
obj_size);
return 0;
}
log_trace(gc, tlab)("ThreadLocalAllocBuffer::compute_size(" SIZE_FORMAT ") returns " SIZE_FORMAT,
obj_size, new_tlab_size);
return new_tlab_size;
}
可以通过JVM参数TLABSize设置TLAB使用的空间;如果未设置,则动态计算的,受JVM参数TLABWasteTargetPercent(默认1%)和当前线程数影响,约等于Eden空间 * 2 * 1%
填充dummy对象的逻辑如下:
void ThreadLocalAllocBuffer::retire(ThreadLocalAllocStats* stats) {
if (stats != NULL) {
accumulate_and_reset_statistics(stats);
}
if (end() != NULL) {
invariants();
thread()->incr_allocated_bytes(used_bytes());
insert_filler();
initialize(NULL, NULL, NULL);
}
}
void ThreadLocalAllocBuffer::insert_filler() {
assert(end() != NULL, "Must not be retired");
Universe::heap()->fill_with_dummy_object(top(), hard_end(), true);
}
填充dummy对象的目的是为了遍历HR时,不再需要一个字节一个字节的遍历,dummy对象是一个int[]。
分配新的TLAB逻辑在g1CollectedHeap.hpp和g1CollectedHeap.cpp中,如下:
static size_t humongous_threshold_for(size_t region_size) {
return (region_size / 2);
}
HeapWord* G1CollectedHeap::allocate_new_tlab(size_t min_size,
size_t requested_size,
size_t* actual_size) {
assert_heap_not_locked_and_not_at_safepoint();
assert(!is_humongous(requested_size), "we do not allow humongous TLABs");
return attempt_allocation(min_size, requested_size, actual_size);
}
- 首先判断是否大对象,如否则分配
- 如果size > region_size / 2,则判定为大对象
4. 慢分配
当TLAB分配失败时,进入慢速分配阶段。慢速分配首先需要尝试对Heap加锁,加锁成功后在TLAB外的YHR或HHR分配。
memAllocator.cpp
HeapWord* MemAllocator::allocate_outside_tlab(Allocation& allocation) const {
allocation._allocated_outside_tlab = true;
HeapWord* mem = _heap->mem_allocate(_word_size, &allocation._overhead_limit_exceeded);
if (mem == NULL) {
return mem;
}
NOT_PRODUCT(_heap->check_for_non_bad_heap_word_value(mem, _word_size));
size_t size_in_bytes = _word_size * HeapWordSize;
_thread->incr_allocated_bytes(size_in_bytes);
return mem;
}
- 调用mem_allocate方法分配,如果成功则更新线程占用内存数
g1CollectedHeap.cpp
HeapWord* G1CollectedHeap::mem_allocate(size_t word_size,
bool* gc_overhead_limit_was_exceeded) {
assert_heap_not_locked_and_not_at_safepoint();
if (is_humongous(word_size)) {
return attempt_allocation_humongous(word_size);
}
size_t dummy = 0;
return attempt_allocation(word_size, word_size, &dummy);
}
- 如果是大对象,则进入attempt_allocation_humongous;否则进入attempt_allocation
大对象分配逻辑如下,代码较长,仅保留关键代码
HeapWord* G1CollectedHeap::attempt_allocation_humongous(size_t word_size) {
if (g1_policy()->need_to_start_conc_mark("concurrent humongous allocation",
word_size)) {
collect(GCCause::_g1_humongous_allocation);
}
HeapWord* result = NULL;
for (uint try_count = 1, gclocker_retry_count = 0; /* we'll return */; try_count += 1) {
bool should_try_gc;
uint gc_count_before;
{
MutexLockerEx x(Heap_lock);
result = humongous_obj_allocate(word_size);
if (result != NULL) {
size_t size_in_regions = humongous_obj_size_in_regions(word_size);
g1_policy()->add_bytes_allocated_in_old_since_last_gc(size_in_regions * HeapRegion::GrainBytes);
return result;
}
should_try_gc = !GCLocker::needs_gc();
gc_count_before = total_collections();
}
if (should_try_gc) {
bool succeeded;
result = do_collection_pause(word_size, gc_count_before, &succeeded,
GCCause::_g1_humongous_allocation);
if (result != NULL) {
assert(succeeded, "only way to get back a non-NULL result");
log_trace(gc, alloc)("%s: Successfully scheduled collection returning " PTR_FORMAT,
Thread::current()->name(), p2i(result));
return result;
}
if (succeeded) {
log_trace(gc, alloc)("%s: Successfully scheduled collection failing to allocate "
SIZE_FORMAT " words", Thread::current()->name(), word_size);
return NULL;
}
log_trace(gc, alloc)("%s: Unsuccessfully scheduled collection allocating " SIZE_FORMAT "",
Thread::current()->name(), word_size);
} else {
// Failed to schedule a collection.
if (gclocker_retry_count > GCLockerRetryAllocationCount) {
log_warning(gc, alloc)("%s: Retried waiting for GCLocker too often allocating "
SIZE_FORMAT " words", Thread::current()->name(), word_size);
return NULL;
}
GCLocker::stall_until_clear();
gclocker_retry_count += 1;
}
}
}
- 由于大对象分配可能导致堆占用快速增长,因此在分配之前先判断是否满足GC标记的条件
- 加heap lock锁
- 尝试分配,如果成功则返回
- 如果需要gc,则触发gc,gc成功后,回到加锁逻辑,重新开始分配直到成功,或者gc重试次数大于GCLockerRetryAllocationCount
attempt_allocation逻辑在heapRegion.inline.hpp中,如下:
inline HeapWord* G1ContiguousSpace::par_allocate_impl(size_t min_word_size,
size_t desired_word_size,
size_t* actual_size) {
do {
HeapWord* obj = top();
size_t available = pointer_delta(end(), obj);
size_t want_to_allocate = MIN2(available, desired_word_size);
if (want_to_allocate >= min_word_size) {
HeapWord* new_top = obj + want_to_allocate;
HeapWord* result = Atomic::cmpxchg(new_top, top_addr(), obj);
// result can be one of two:
// the old top value: the exchange succeeded
// otherwise: the new value of the top is returned.
if (result == obj) {
assert(is_aligned(obj) && is_aligned(new_top), "checking alignment");
*actual_size = want_to_allocate;
return obj;
}
} else {
return NULL;
}
} while (true);
}
- 使用CAS操作分配,如果可用空间大于对象容量则持续重试,否则退出重新选择region或触发GC。
5. 总结
综上所述,JVM G1的对象的堆空间分配首先在TLAB中进行;当TLAB分配失败后,则在HR中直接进行分配;仍然失败后,则触发GC。
6. 引用
jdk12源代码[https://hg.openjdk.java.net/jdk/jdk12]