《partner4java 讲述 java基础》之第一步：list 、set 、map 粗浅性能对比分析

最新推荐文章于 2024-12-28 00:03:01 发布

原创最新推荐文章于 2024-12-28 00:03:01 发布 · 1.2w 阅读

62 ·

CC 4.0 BY-SA版权

j2se 专栏收录该内容

36 篇文章

订阅专栏

本文深入对比分析了Java中List、Map等常用集合类在不同场景下的性能表现，并结合源码解析了性能差异的原因。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

不知道有多少同学和我一样，工作五年了还没有仔细看过list、set的源码，一直停留在老师教导的：“LinkedList插入性能比ArrayList好，LinkedList顺序遍历性能比ArrayList好”的世界里。可是真是如此么？本文很“肤浅”的对比和分析了几种常用的集合，“高手”可以就此打住不必往下阅读了。。。

本文分开介绍了List、Map、Set：

（测试环境：win7、jdk、4G、i3；文章示例为了节省篇幅，只会列出测试大体形式和遍历次数）

第一部分：List

1.add(E e)

==============================性能测试=================================

（1）表格统计的差异在于new的方式不同：

		for (int i = 0; i < 10000; i++) {
			List<Integer> list = new LinkedList<Integer>();
			for (int j = 0; j < 10000; j++) {
				list.add(j);
			}
			list = null;
		}

new方式	消耗时间（毫秒）
new LinkedList<Integer>()	3420
new ArrayList<Integer>()	3048
new ArrayList<Integer>(10000)	2280
new Vector<Integer>()	4045
new Vector<Integer>(10000)	3859

（2）我们变化一下集合的长度和遍历次数：

		for (int i = 0; i < 1000000; i++) {
			List<Integer> list = new LinkedList<Integer>();
			for (int j = 0; j < 100; j++) {
				list.add(j);
			}
			list = null;
		}

new方式	消耗时间（毫秒）
new LinkedList<Integer>()	2286
new ArrayList<Integer>()	2832
new ArrayList<Integer>(10000)	1714
new Vector<Integer>()	3937
new Vector<Integer>(10000)	3184

*************************************************源码分析*************************************************

似乎并没有看到期待的差异，若非“冒充下大牛”纠结一下区别：

1、若集合长度比较小（小于百位），LinkedList比ArrayList的插入性能稍胜一筹。

2、但是当长度达到五位数时，ArrayList的插入性能反而会比LinkedList好一些。

3、若你能预知ArrayList的长度，可以完胜LinkedList。（原因后面会说）

4、Vector的作用不在一个纬度上，不过也没有想象中的那么差。

LinkedList.add:

    先调用了自身的add：
    public boolean add(E e) {
	addBefore(e, header);
        return true;
    }
    然后又调用了addBefore，表示插入在谁之前（有点绕，因为是双项链）：
    private Entry<E> addBefore(E e, Entry<E> entry) {
	Entry<E> newEntry = new Entry<E>(e, entry, entry.previous);
	newEntry.previous.next = newEntry;
	newEntry.next.previous = newEntry;
	size++;
	modCount++;
	return newEntry;
    }
    用于维护指定前后关系：
    private static class Entry<E> {
	E element;
	Entry<E> next;
	Entry<E> previous;

	Entry(E element, Entry<E> next, Entry<E> previous) {
	    this.element = element;
	    this.next = next;
	    this.previous = previous;
	}
    }

也就是说LinkedList的上下级维护关系是借助了指针，若任意位置插入，则是先查找到before前的entry，然后重指原有位置前后元素的指针。

ArrayList.add：

首先判断下是否超过了最大长度数组长度，若超过会重新扩展拷贝（这也就说明了，为什么当我们在newArrayList时指定合适的长度会提升性能）

    相比LinkedList来说就简单了狠多：
    public boolean add(E e) {
	ensureCapacity(size + 1);  // Increments modCount!!
	elementData[size++] = e;
	return true;
    }
    //首先判断下是否超过了最大长度数组长度，若超过会重新扩展拷贝（这也就说明了，为什么当我们在newArrayList时指定合适的长度会提升性能）
    public void ensureCapacity(int minCapacity) {
	modCount++;
	int oldCapacity = elementData.length;
	if (minCapacity > oldCapacity) {
	    Object oldData[] = elementData;
	    int newCapacity = (oldCapacity * 3)/2 + 1;
    	    if (newCapacity < minCapacity)
		newCapacity = minCapacity;
            // minCapacity is usually close to size, so this is a win:
            elementData = Arrays.copyOf(elementData, newCapacity);
	}
    }
    //默认长度10
    public ArrayList() {
	this(10);
    }

通过上面的源码可以理解出为什么当集合的长度会左右LinkedList和ArrayList的插入性能PK。影响ArrayList性能损耗的一大元凶就是数组的不断增长拷贝，LinkedList由于自身的特性当数据插入时需要借助特殊的内部类“Entry”来维护插入数据的上下级链。

2.add(int index, E element)

通过上面的分析出现了一个想法，如果是

随机插入位置呢？

==============================性能测试=================================

(1)

for (int i = 0; i < 100; i++) {
	List<Integer> list = new LinkedList<Integer>();
	for (int j = 0; j < 10000; j++) {
		list.add(list.size() / 2, j);
	}
	list = null;
}

new方式	消耗时间
new LinkedList<Integer>()	15782
new ArrayList<Integer>()	3513
new ArrayList<Integer>(10000)	3490

(2)我们变化一下集合的长度和遍历次数：

		for (int i = 0; i < 100000; i++) {
			List<Integer> list = new LinkedList<Integer>();
			for (int j = 0; j < 100; j++) {
				list.add(list.size() / 2, j);
			}
			list = null;
		}

new方式	消耗时间
new LinkedList<Integer>()	880
new ArrayList<Integer>()	1262
new ArrayList<Integer>(10000)	2308

*************************************************源码分析*************************************************

从纯插入性能上来说，随机插入LikedList要比ArrayList性能好：（至于为什么当LinkedList集合长度为10000时性能变化会如此之大，我们后面的get会有介绍）

LikedList.add(int index, E element) :

大体分为两个步骤

    public void add(int index, E element) {
        addBefore(element, (index==size ? header : entry(index)));
    } 
    //第一步获取到addBefore需要的插入位置（entry），第二步如上
    private Entry<E> entry(int index) {
        if (index < 0 || index >= size)
            throw new IndexOutOfBoundsException("Index: "+index+
                                                ", Size: "+size);
        Entry<E> e = header;
        if (index < (size >> 1)) {
            for (int i = 0; i <= index; i++)
                e = e.next;
        } else {
            for (int i = size; i > index; i--)
                e = e.previous;
        }
        return e;
    }

所以从代码上看， LikedList. add(int index, E element) 没有理由在集合长度为10000时会有这么大的差异，除非entry(int index)存在问题（我们后面可能要说的hash也正是为了避免遍历）。

ArrayList.add(int index, E element) :

大体分为三步

第二步暴漏了为什么现在初始指定了集合长度反而不一定是好事。

    public void add(int index, E element) {
	if (index > size || index < 0)
	    throw new IndexOutOfBoundsException(
		"Index: "+index+", Size: "+size);
	//第一步：扩充长度
	ensureCapacity(size+1);  // Increments modCount!!
	//第二步：挪动数据（这一步决定了为什么现在初始指定了集合长度反而不一定是好事）
	System.arraycopy(elementData, index, elementData, index + 1,
			 size - index);
	//第三步：插入
	elementData[index] = element;
	size++;
    }

3.get(int index)

==============================性能测试=================================

		List<Integer> list = new LinkedList<Integer>();
		for (int j = 0; j < 10000; j++) {
			list.add(j);
		}

		for (int i = 0; i < 100; i++) {
			for (int j = 0; j < 10000; j++) {
				list.get(j);
			}
		}

new方式	消耗时间
new LinkedList<Integer>()	8349
new ArrayList<Integer>(10000)	15

*************************************************源码分析*************************************************

通过这个表首先能够知道如果直接或间接（如模板语言）调用for循环遍历，LikedList的性能要差很多。

LinkedList.get(int index)：

    public E get(int index) {
        return entry(index).element;
    }
    //性能损耗在这里，每次都需要线性搜索（遍历链定位位置）
    private Entry<E> entry(int index) {
        if (index < 0 || index >= size)
            throw new IndexOutOfBoundsException("Index: "+index+
                                                ", Size: "+size);
        Entry<E> e = header;
        if (index < (size >> 1)) {
            for (int i = 0; i <= index; i++)
                e = e.next;
        } else {
            for (int i = size; i > index; i--)
                e = e.previous;
        }
        return e;
    }

ArrayList.get(int index)：

    public E get(int index) {
	//检查是否越界
	RangeCheck(index);
	//直接从数组中取出对应index的数据
	return (E) elementData[index];
    }

4.迭代器

==============================性能测试=================================

		List<Integer> list = new LinkedList<Integer>();
		for (int j = 0; j < 100; j++) {
			list.add(j);
		}

		for (int i = 0; i < 1000000; i++) {
			Iterator<Integer> it = list.iterator();
			while (it.hasNext()) {
				it.next();
			}
			it = null;
		}

new方式	消耗时间
new LinkedList<Integer>()	1335
new ArrayList<Integer>();	3940

*************************************************源码分析*************************************************

LinkedList.iterator():

    private class ListItr implements ListIterator<E> {
	//header为LikedList的成员变量Entry
	private Entry<E> lastReturned = header;
	private Entry<E> next;
	private int nextIndex;
	private int expectedModCount = modCount;

	//这里我们会默认传入0，
	ListItr(int index) {
	    if (index < 0 || index > size)
		throw new IndexOutOfBoundsException("Index: "+index+
							    ", Size: "+size);

 	    //下面的if和else很有意思，为了实现“所有操作都是按照双重链接列表的需要执行的。在列表中编索引的操作将从开头或结尾遍历列表（从靠近指定索引的一端）。”
	    //size为外部LikedList的成员变量
	    if (index < (size >> 1)) {
		next = header.next;
		for (nextIndex=0; nextIndex<index; nextIndex++)
		    next = next.next;
	    } else {
		next = header;
		for (nextIndex=size; nextIndex>index; nextIndex--)
		    next = next.previous;
	    }
	}

	//判断是否已经超过长度
	public boolean hasNext() {
	    return nextIndex != size;
	}

	//便宜至下一个元素并返回element
	public E next() {
	    checkForComodification();
	    if (nextIndex == size)
		throw new NoSuchElementException();

	    lastReturned = next;
	    next = next.next;
	    nextIndex++;
	    return lastReturned.element;
	}

	...

	final void checkForComodification() {
	    if (modCount != expectedModCount)
		throw new ConcurrentModificationException();
	}
    }

也就是说我们的LinkedList的Iterator性能损耗主要在变换指针，所以对比for遍历方式来说性能提升了很多。

ArrayList.iterator():

    public Iterator<E> iterator() {
	return new Itr();
    }

    private class Itr implements Iterator<E> {
	/**
	 * Index of element to be returned by subsequent call to next.
	 */
	int cursor = 0;

	/**
	 * Index of element returned by most recent call to next or
	 * previous.  Reset to -1 if this element is deleted by a call
	 * to remove.
	 */
	int lastRet = -1;

	/**
	 * The modCount value that the iterator believes that the backing
	 * List should have.  If this expectation is violated, the iterator
	 * has detected concurrent modification.
	 */
	int expectedModCount = modCount;

	public boolean hasNext() {
            return cursor != size();
	}

	public E next() {
            checkForComodification();
	    try {
		//主要内容在这里，又绕回了ArrayList的获取方式
		E next = get(cursor);
		lastRet = cursor++;
		return next;
	    } catch (IndexOutOfBoundsException e) {
		checkForComodification();
		throw new NoSuchElementException();
	    }
	}

    }

ArrayList“很懒”的没有做什么，主要借助了父类AbstractList，获取方式采用了模板方法设计模式（Template Method Pattern）。

总结：无论你遍历ArrayList还是LinkedList都可以尽量采用迭代器。

通过add和get的分析，我们最常用的场景就是单页数据获取，然后利用jstl或velocity等遍历展示：

1、如果能确定遍历采用的for循环建议使用ArrayList并采用带参的构造器指定集合长度（也就是每页的个数）；

2、若遍历采用迭代器建议采用LinkedList，因为我们还会有时对dao获取的数据采用缓存策略。

（jstl用的迭代器，可看源码org.apache.taglibs.standard.tag.common.core.ForEachSupport.toForEachIterator方法）

第二部分：Map

1.put(K key, V value)

==============================性能测试=================================

表格统计的差异在于new的方式不同：

		for (int i = 0; i < 100000; i++) {
			Map<Integer, String> map = new TreeMap<Integer, String>();
			for (int j = 0; j < 100; j++) {
				map.put(j, "s" + j);
			}
			map = null;
		}

new方式	运行时间
new TreeMap<Integer, String>()	2945
new HashMap<Integer, String>()	2029
new HashMap<Integer, String>(100)	1845

*************************************************源码分析*************************************************

TreeMap.put(K key, V value)：

首先确认一点TreeMap采用的“红黑树”二叉查找方式。（具体算法请自行google）

该映射根据其键的自然顺序进行排序，或者根据创建映射时提供的 Comparator 进行排序，具体取决于使用的构造方法。此实现为 containsKey、get、put 和 remove 操作提供受保证的 log(n) 时间开销。

插入大体分为四步：

1、判断是否为初次；

2、获取指定Comparator，若没有指定获取key的父类的实现方式。

3、利用Comparator确定数据存放位置；

4、树的旋转。

    public V put(K key, V value) {
        Entry<K,V> t = root;
	//判断是否存在根节点，或者集合是否存在了数据
        if (t == null) {
            root = new Entry<K,V>(key, value, null);
            size = 1;
            modCount++;
            return null;
        }
        int cmp;
        Entry<K,V> parent;
        // split comparator and comparable paths
	//这点是获取构造器传入的Comparator
        Comparator<? super K> cpr = comparator;
        if (cpr != null) {
            do {
                parent = t;
                cmp = cpr.compare(key, t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
	//若没有指定Comparator，查找父类实现
        else {
            if (key == null)
                throw new NullPointerException();
            Comparable<? super K> k = (Comparable<? super K>) key;
	    //遍历放入确切位置
            do {
                parent = t;
                cmp = k.compareTo(t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        Entry<K,V> e = new Entry<K,V>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;
	//树的旋转
        fixAfterInsertion(e);
        size++;
        modCount++;
        return null;
    }

HashMap.put(K key, V value)：

1、HashMap 的实例有两个参数影响其性能：初始容量和加载因子。容量是哈希表中桶的数量，初始容量只是哈希表在创建时的容量。加载因子是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时，则要对该哈希表进行 rehash 操作（即重建内部数据结构），从而哈希表将具有大约两倍的桶数；

2、key的hashcode至关重要。

    //构造一个带指定初始容量和加载因子的空 HashMap
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
	//取大于等于初始容量最近的2的倍数
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
	//计算key的hash值
        int hash = hash(key.hashCode());
	//计算出在第几个“水桶”，计算方式是&与了table.length-1，这应该也是水桶数据是2的倍数的原因
        int i = indexFor(hash, table.length);
	//通过这步可以理解为什么看到好多文档要求key的hashCode“均匀分布”便于均匀落入各个水桶
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

2.get

==============================性能测试=================================

（1）表格统计的差异在于new的方式不同：

		Map<Integer, String> map = new TreeMap<Integer, String>();
		for (int j = 0; j < 100; j++) {
			map.put(j, "s" + j);
		}

		for (int i = 0; i < 100000; i++) {
			for (int j = 0; j < 100; j++) {
				map.get(j);
			}
		}

new方式	运行时间
new TreeMap<Integer, String>()	542
new HashMap<Integer, String>(100)	280

（2）变化下集合的大小（有时我们使用Map存放上万条乃至几十万条数据用于小数据本地快速缓存）：

		Map<Integer, String> map = new HashMap<Integer, String>(300000);
		for (int j = 0; j < 300000; j++) {
			map.put(j, "sasd");
		}
		beginTime = System.currentTimeMillis();
		for (int i = 0; i < 100; i++) {
			for (int j = 0; j < 300000; j++) {
				map.get(j);
			}
		}

new方式	运行时间
new TreeMap<Integer, String>()	5100
new HashMap<Integer, String>(300000)	1400

（3）当hashcode为极端情况下：

Map<HashKeyTest, String> map = new TreeMap<HashKeyTest, String>();
for (int j = 0; j < 10000; j++) {
	map.put(new HashKeyTest(j), "sasd");
}
beginTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
	for (int j = 0; j < 10000; j++) {
		map.get(new HashKeyTest(j));
	}
}

public class HashKeyTest implements Comparable<HashKeyTest> {

	private int value;
	...
	
	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		HashKeyTest other = (HashKeyTest) obj;
		if (value != other.value)
			return false;
		return true;
	}
	//模拟hashCode的极端情况，返回相同的值
	@Override
	public int hashCode() {
		return 345345435;
	}
	//用于TreeMap的二叉树对比排序
	@Override
	public int compareTo(HashKeyTest o) {
		int thisVal = this.value;
		int anotherVal = o.getValue();
		return (thisVal < anotherVal ? -1 : (thisVal == anotherVal ? 0 : 1));
	}

}

new方式	消耗时间
new TreeMap<HashKeyTest, String>()	18
new HashMap<HashKeyTest, String>()	3480

*************************************************源码分析*************************************************

TreeMap.get(Object key):

基于红黑树查找

    public V get(Object key) {
        Entry<K,V> p = getEntry(key);
        return (p==null ? null : p.value);
    }
    final Entry<K,V> getEntry(Object key) {
        // Offload comparator-based version for sake of performance
        if (comparator != null)
            return getEntryUsingComparator(key);
        if (key == null)
            throw new NullPointerException();
	Comparable<? super K> k = (Comparable<? super K>) key;
        Entry<K,V> p = root;
        while (p != null) {
            int cmp = k.compareTo(p.key);
            if (cmp < 0)
                p = p.left;
            else if (cmp > 0)
                p = p.right;
            else
                return p;
        }
        return null;
    }

HashMap.get(Object key)：

代码中很简明的可以看出，如果hashCode不均为出现的问题

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
	//若hashCode相同，会导致查找方式就是这里的不停的遍历
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

总结：

1、key如果是自定义对象，一定要有效地重载hashCode()，可参考《覆盖equals时总要覆盖hashCode》；

2、尽量保证hashCode“均为分布”，以便于均匀填充“水桶”；

3、当我们确定Map存放的数据比较多且hashCode分配严重不均匀，切记不要使用HashMap，建议使用TreeMap；

4、正常情况下，也就是我们常把表id作为key时，建议使用HashMap。

Set留给同学自己学习吧。（HashSet其实就是使用的HashMap作为容器）