结论:
- 新创建的ArrayList内部存储是一个空数组
- 首次添加元素扩容为默认容量
DEFAULT_CAPACITY=10
- 日常扩容是当前容量的1.5倍
- 扩容时使用
System.arraycopy
复制数组,native 方法,效率很不错PS: JDK版本1.8.0_66
故事的开始
近期面试被问到一个问题:ArrayList是如何扩容的?在这深入了解一下其内部实现。
首先要知道什么时候会触发 扩容 —– 不难想到,在添加元素时需要考虑容量是否足够,来看一下 ArrayList
的源码:
/**
* Appends the specified element to the end of this list.
*
* @param e element to be appended to this list
* @return <tt>true</tt> (as specified by {@link Collection#add})
*/
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
看上去 ensureCapacityInternal(size + 1)
应该是验证空间是否足够,并且扩容了。且继续看:
private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
ensureExplicitCapacity(minCapacity);
}
minCapacity
是指本次扩容需求的最小空间。如果是新创建的ArrayList,会取 DEFAULT_CAPACITY(10)
和 minCapacity(1)
的最大值,即新创建的ArrayList首次增加元素时会直接需求 10 个存储空间。
我有一句话不知当讲不讲
elementData
是什么?elementData
是ArryList实际存储元素的地方,是用一个数组实现d的。新创建一个ArrayList时,elementData
是一个 空数组 ,当添加第一个元素时, elementData会被按照 DEFAULT_CAPACITY 扩容./** * The array buffer into which the elements of the ArrayList are stored. * The capacity of the ArrayList is the length of this array buffer. Any * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA * will be expanded to DEFAULT_CAPACITY when the first element is added. */ transient Object[] elementData; // non-private to simplify nested class access /** * Shared empty array instance used for default sized empty instances. We * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when * first element is added. */ private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {}; /** * Default initial capacity. */ private static final int DEFAULT_CAPACITY = 10;
size
是实际存入的对象数量, 与容量capacity
不同。
渐入佳境
确定扩充容量后,进行下一步 ensureExplicitCapacity(minCapacity)
:
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
modCount
用来统计ArrayList变更的次数。
当需求的容量 minCapacity
大于内部存储数组的长度 elementData.length
时,需要进行扩容。
/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
/**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*
* @param minCapacity the desired minimum capacity
*/
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
可以看到:
- 扩容的核心算法
oldCapacity + (oldCapacity >> 1)
:原始容量右移(除以2) + 原始容量,相当于扩容了 1.5 倍。 - 如果扩容后的容量小于需求的容量
minCapacity
,就按照需求的容量来扩容; - 如果扩容后超出了最大容量
MAX_ARRAY_SIZE=Integer.MAX_VALUE - 8
,会使用特殊的方法hugeCapacity(minCapacity)
重新计算扩充后的容量大小。
超大容量处理
具体逻辑看下面的代码,逻辑比较简单:
private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
return (minCapacity > MAX_ARRAY_SIZE) ?
Integer.MAX_VALUE :
MAX_ARRAY_SIZE;
}
再进一步
扩容时是怎么进行搬运原数据的?
核心搬运方法: Arrays.copyOf(elementData, newCapacity)
,注释太长就不贴了:
public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
@SuppressWarnings("unchecked")
T[] copy = ((Object)newType == (Object)Object[].class)
? (T[]) new Object[newLength]
: (T[]) Array.newInstance(newType.getComponentType(), newLength);
System.arraycopy(original, 0, copy, 0,
Math.min(original.length, newLength));
return copy;
}
这里在创建数组的时候用了一些优化技巧:
创建数组Array.newInstance
内部实现是一个 native 方法,应该是类似 c 语言中的内存申请。
/**
* Creates a new array with the specified component type and
* length....
*/
public static Object newInstance(Class<?> componentType, int length)
throws NegativeArraySizeException {
return newArray(componentType, length);
}
private static native Object newArray(Class<?> componentType, int length)
throws NegativeArraySizeException;
复制数组System.arraycopy
这是一个 native 方法, 在网上看到一篇文章阐述其内原理,搬运过来。 原文
public static native void arraycopy(Object src, int srcPos,
Object dest, int destPos,
int length);
深入熔岩之心
找到对应的openjdk6-src/hotspot/src/share/vm/prims/jvm.cpp,这里有JVM_ArrayCopy的入口:
JVM_ENTRY(void, JVM_ArrayCopy(JNIEnv *env, jclass ignored, jobject src, jint src_pos,
jobject dst, jint dst_pos, jint length))
JVMWrapper("JVM_ArrayCopy");
// Check if we have null pointers
if (src == NULL || dst == NULL) {
THROW(vmSymbols::java_lang_NullPointerException());
}
arrayOop s = arrayOop(JNIHandles::resolve_non_null(src));
arrayOop d = arrayOop(JNIHandles::resolve_non_null(dst));
assert(s->is_oop(), "JVM_ArrayCopy: src not an oop");
assert(d->is_oop(), "JVM_ArrayCopy: dst not an oop");
// Do copy
Klass::cast(s->klass())->copy_array(s, src_pos, d, dst_pos, length, thread);
JVM_END
前面的语句都是判断,知道最后的copy_array(s, src_pos, d, dst_pos, length, thread)是真正的copy,进一步看这里,在openjdk6-src/hotspot/src/share/vm/oops/typeArrayKlass.cpp中:
void typeArrayKlass::copy_array(arrayOop s, int src_pos, arrayOop d, int dst_pos, int length, TRAPS) {
assert(s->is_typeArray(), "must be type array");
// Check destination
if (!d->is_typeArray() || element_type() != typeArrayKlass::cast(d->klass())->element_type()) {
THROW(vmSymbols::java_lang_ArrayStoreException());
}
// Check is all offsets and lengths are non negative
if (src_pos < 0 || dst_pos < 0 || length < 0) {
THROW(vmSymbols::java_lang_ArrayIndexOutOfBoundsException());
}
// Check if the ranges are valid
if ( (((unsigned int) length + (unsigned int) src_pos) > (unsigned int) s->length())
|| (((unsigned int) length + (unsigned int) dst_pos) > (unsigned int) d->length()) ) {
THROW(vmSymbols::java_lang_ArrayIndexOutOfBoundsException());
}
// Check zero copy
if (length == 0)
return;
// This is an attempt to make the copy_array fast.
int l2es = log2_element_size();
int ihs = array_header_in_bytes() / wordSize;
char* src = (char*) ((oop*)s + ihs) + ((size_t)src_pos << l2es);
char* dst = (char*) ((oop*)d + ihs) + ((size_t)dst_pos << l2es);
Copy::conjoint_memory_atomic(src, dst, (size_t)length << l2es);//还是在这里处理copy
}
这个函数之前的仍然是一堆判断,直到最后一句才是真实的拷贝语句。
在openjdk6-src/hotspot/src/share/vm/utilities/copy.cpp中找到对应的函数:
// Copy bytes; larger units are filled atomically if everything is aligned.
void Copy::conjoint_memory_atomic(void* from, void* to, size_t size) {
address src = (address) from;
address dst = (address) to;
uintptr_t bits = (uintptr_t) src | (uintptr_t) dst | (uintptr_t) size;
// (Note: We could improve performance by ignoring the low bits of size,
// and putting a short cleanup loop after each bulk copy loop.
// There are plenty of other ways to make this faster also,
// and it's a slippery slope. For now, let's keep this code simple
// since the simplicity helps clarify the atomicity semantics of
// this operation. There are also CPU-specific assembly versions
// which may or may not want to include such optimizations.)
if (bits % sizeof(jlong) == 0) {
Copy::conjoint_jlongs_atomic((jlong*) src, (jlong*) dst, size / sizeof(jlong));
} else if (bits % sizeof(jint) == 0) {
Copy::conjoint_jints_atomic((jint*) src, (jint*) dst, size / sizeof(jint));
} else if (bits % sizeof(jshort) == 0) {
Copy::conjoint_jshorts_atomic((jshort*) src, (jshort*) dst, size / sizeof(jshort));
} else {
// Not aligned, so no need to be atomic.
Copy::conjoint_jbytes((void*) src, (void*) dst, size);
}
}
上面的代码展示了选择哪个copy函数,我们选择conjoint_jints_atomic,在openjdk6-src/hotspot/src/share/vm/utilities/copy.hpp进一步查看:
// jints, conjoint, atomic on each jint
static void conjoint_jints_atomic(jint* from, jint* to, size_t count) {
assert_params_ok(from, to, LogBytesPerInt);
pd_conjoint_jints_atomic(from, to, count);
}
继续向下查看,在openjdk6-src/hotspot/src/cpu/zero/vm/copy_zero.hpp中:
static void pd_conjoint_jints_atomic(jint* from, jint* to, size_t count) {
_Copy_conjoint_jints_atomic(from, to, count);
}
继续向下查看,在openjdk6-src/hotspot/src/os_cpu/linux_zero/vm/os_linux_zero.cpp中:
void _Copy_conjoint_jints_atomic(jint* from, jint* to, size_t count) {
if (from > to) {
jint *end = from + count;
while (from < end)
*(to++) = *(from++);
}
else if (from < to) {
jint *end = from;
from += count - 1;
to += count - 1;
while (from >= end)
*(to--) = *(from--);
}
}
可以看到,直接就是内存块赋值的逻辑了,这样避免很多引用来回倒腾的时间,必然就变快了。