java基础之HashMap中的capacity和loadFactor详解_java hashmap capacity用法-CSDN博客

本文链接：https://blog.csdn.net/zhao_xinhu/article/details/85321448

上篇HashMap博客已经对存储结构以及存放过程进行了简单分析。今天我们来对HashMap中的容量(capacity)以及加载因子(loadFactor)分析一下这两个东西对于map的作用。看这篇博客之前，我已经认为你了解了hashmap的存储结构了。

我在开发中写的最多的HashMap声明为：Map map = new HashMap();不知道大家是不是这样的。如果你是下面这两种：

// 第一种
Map map = new HashMap(int initialCapacity);

// 第二种
Map map = new HashMap(int initialCapacity, float loadFactor);

我相信你对容量和加载因子是理解的（暂且不说用的好与不好，至少我觉得你应该是知道这两个东西的作用）。

说到底我们的HashMap其实就是一个容器，里面放一些key-value而已，map是一个动态的容器（也就是说容量是不固定的），既然容量不固定是多少？是无限大的吗？不知道大家有没有想过这样的问题。这就涉及到了加载因子和容量两个东西。

我们直接上源码看吧。既然map有容量一说，就说明还是有大小之分的，那么我们的map的大小是多少，我们去看下源码中怎么说的。我们正常使用map的步骤为：

// 初始化
Map map = new HashMap();
// put值
map.put("key", "value");

一：map初始化

/**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
   
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    // 以上是默认初始化操作，可以看到只是初始化了一个loadFactor(加载因子)，
    // 其他的什么都没做，我们暂时先搁置这个东西，记着初始化里面没有太多操作就行了

二： map.put才是重点，我们去看下put方法

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */

    // 真正的put方法（重点中的重点）
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);

                // 注意这个地方，有个return，说明这个方法结束了，下面的代码不会走了
                return oldValue;
            }
            // 以上这个if，是找到了与key值相同的对象，因此覆盖了旧的value值
            // 以上代码不解释具体原理了，可以参考我的上一篇博客
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

下面我们把map扩容的代码粘出来，具体说一下：

        // 如果size > threshold 其实这是第二次resize了
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;

        // 这段代码是map中没有找到与新key相同的对象，执行的逻辑
        // 我们来看下：

        // ++modCount; 
        // 这行是说明此map被执行的次数

        // if (++size > threshold)  {
        //     resize();
        // }
        // 这三行我们的容量和加载因子就派上用场了
        // 那么，此时size是多少？threshold又是什么？初始化的时候并没有这个东西呀，
        // 这两个东西都是在一个resize()方法中进行的操作，我们来看下resize()方法
        // resize()方法我们分成两大部分来看
        // 其一：控制容量（我们暂且分析这一个）
        // 我们put方法最上面先resize过一次了（自行找执行过程），我们来看看，我们的最初容量是多少
        // threshold = newThr;这行就是赋值的部分
        // 我们看这行上面有好几个if else，这就是判断threshold该赋什么样的值
        // 当我们第一次执行这个方法是，我们oldCap（我们map中table数组的长度，此时table为
        // null,oldCap即为0）
        // oldCap为0执行下面两行
        // newCap = DEFAULT_INITIAL_CAPACITY; 默认容量为16
        // newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        // newThr 下次resize的极限值为：默认容量 * 默认加载因子 = 16 * 0.75 = 12
        // 也就是说如果map的size达到12时，就会重新resize(扩容)
        
        // 以上就是第一次初始化的时候map的分配情况，
        // 那么在往下扩容的话，则，容量 和 极限值都是成倍正常，16 32 64......


        // 其二：构建新的map容器
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;

        // 以上是其一：控制容量（我们暂且分析这一个）部分

        // 以下是其二：构建新的map容器，链表赋值给新的数组而已
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

因此，大家看到了吧，map动态的原理了吧，其实他也有容量一说的。

那么简单总结一下map的特点和原理：

1、map首先为数组配合链表的数据结构，数组默认长度为16（也就是默认容量）

2、map进行put的时候，如果有相同key，则新value替换旧value，返回旧value；否则，插入链表，返回null，size++

3、map的扩容，第一次是默认的容量（16），默认的扩容极限值（16 * 0.75 = 12）；如果size达到12时，进行下次扩容，容量和极限值均扩容成原来的两倍，32,24；64,48.....，当然也有极限值，容量最大值为MAXiMUM_CAPACITY = 1 <<30，扩容临界值threshold = Integer.MAX_VALUE

最后再抛个问题：网上大家都在说为什么map的默认加载因子为0.75？