Java容器类默认初始容量研究

昨天参加了个公司的Java Performance的培训,其间提到了一些容器类的构造函数有默认的参数,用来控制容器的初始容量,建议是尽可能准确地预测容量需求,根据需求创建指定大小的容器,而不是使用默认的容量,这样可以有效减少内存的浪费,以及扩容所带来的开销。培训中主要提到了StringBuilder和HashMap两个类,下面我们结合JDK源代码来看看到底是怎样的情况,下面的代码基于JDK 1.8,和1.6版本的相比有较大变化。

  1. StringBuilder
    首先看下StringBuilder,它的默认构造函数是这样的:
    /**
     * Constructs a string builder with no characters in it and an
     * initial capacity of 16 characters.
     */
    public StringBuilder() {
        super(16);
    }

写的很清楚了,默认长度为16。另外如果传入的是一个String或者是CharSequence的话,capacity也是在其基础上加16:

    public StringBuilder(String str) {
        super(str.length() + 16);
        append(str);
    }

另外还有一个扩容的问题,当StringBuilder现有的capacity装不下新append进来的字符串时怎么办呢,这个实现是在其基类AbstractStringBuilder里面实现的:

    public AbstractStringBuilder append(String str) {
        if (str == null)
            return appendNull();
        int len = str.length();
        ensureCapacityInternal(count + len);
        str.getChars(0, len, value, count);
        count += len;
        return this;
    }

其中value和count是AbstractStringBuilder的私有成员变量,count记录了现在有效的字符串长度,只有对字符串进行操作时才更改,并不是capacity。value才是真正存储字符串的数组,StringBuilder的capacity对应的是value的长度。所以上面这段代码就很简单了,先确保capacity能容下新旧字符串,不够就扩容,然后再把传入参数的字符串接在现有的字符串后面。
然后来看看这段的核心,ensureCapacityInternal具体是怎么扩容的,一路跟进最后看到是在ensureCapacity里面实现的:

    /**
     * This implements the expansion semantics of ensureCapacity with no
     * size check or synchronization.
     */
    void expandCapacity(int minimumCapacity) {
        int newCapacity = value.length * 2 + 2;
        if (newCapacity - minimumCapacity < 0)
            newCapacity = minimumCapacity;
        if (newCapacity < 0) {
            if (minimumCapacity < 0) // overflow
                throw new OutOfMemoryError();
            newCapacity = Integer.MAX_VALUE;
        }
        value = Arrays.copyOf(value, newCapacity);
    }

先预设一个新长度是现有capacity的两倍再加2,如果还不满足这一次扩容的需要就按输入的扩容,最大是整数型的最大值。也就是说,StringBuilder扩容时候最少是翻倍。 然后再申请一段新的内存空间把原有的字符串复制到新的空间里。由此可见扩容确实会引起额外的内存开销以及系统调用和GC,所以尽可能的准确预测所拼接的字符串长度,一次到位地申请空间能够有效提高Java性能。

  1. HashMap
    HashMap要比StringBuilder复杂一些,除了capacity一个因素以外还有一个factor,关于他们之间的关系官方文档是这么描述的:

Iteration over collection views requires time proportional to the “capacity” of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

结合HashMap的成员变量来看:

    //用来存储Node的链表
    transient Node<K,V>[] table;
    //已存储的key-value对的数量
    transient int size;
    //rehash门限值
    int threshold;
    //加载参数
    final float loadFactor;
    //
    transient int modCount;

    //默认初始容量16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    //默认加载参数0.75
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

HashMap的构造函数主要一共有三种重载:

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

我们可以看到在默认情况下只是设置了loadFactor而没有申请空间,直到第一次put的时候才申请默认初始大小的capacity,当每次put数据的时候会进行如下处理,这里省略了部分代码

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //默认情况(实际存储用的数组为空)下,先初始化空间
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //根据key的hash值和bucket数量确定位置,如果该位置为空就直接在该位置上新建Node
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        //如果散列发生碰撞
        else {
            Node<K,V> e; K k;
            //当前key已经存在,且key值和hash值完全一样
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                ...
            }
            //覆盖原有的value
            if (e != null) { 
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        //如果Node的数量超过了门限值就要重新分配空间
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

其中涉及到空间分配的就是resize()这个函数:

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        //已经分配过空间
        if (oldCap > 0) {
            //现有的容量已达上限就不再分配空间
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            //已经完成过初始化,并且容量翻倍以后没有超过上限
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               //默认构造函数的话就按照default的capacity和threshold来申请空间
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            //将原始map中的数据复制到新的map中
            ...
        }
        return newTab;
    }

总结一下就是除了在首次申请空间以及超过上限的情况下,基本都是直接翻倍申请新的空间,然后还涉及到原有key的rehash问题,所以扩容的开销也是蛮大的,在resize的过程中,空间占用的峰值会达到原有空间3倍,所以设置合理的capacity和loadFactor可以在很大程度上提高性能。

所有有自动扩容功能的非链表实现的容器基本都有这个过程,下面总结一下常见的一些容器的默认初始容量和扩容策略:

容器类:初始长度/扩容倍数
- ArrayList:10/1.5
- ArrayDeque:8/2
- BitSet:64/2
- HashMap:16/2
- HashSet/TreeSet:同HashMap(基于HashMap实现,value为空Object)
- Hashtable:11/2
- WeakHashMap:同HashMap
- PriorityQueue:11/Double size if small; else grow by 50%
- StringBuilder:16/按需

  • 3
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值