HashMap源码阅读01

最新推荐文章于 2021-07-02 11:07:51 发布

js_tengzi

最新推荐文章于 2021-07-02 11:07:51 发布

阅读量213

点赞数 3

分类专栏： Java集合框架 HashMap源码阅读

本文链接：https://blog.csdn.net/js_tengzi/article/details/90522255

版权

Java集合框架同时被 2 个专栏收录

13 篇文章 0 订阅

订阅专栏

HashMap源码阅读

8 篇文章 0 订阅

订阅专栏

HashMap源码阅读01

前言
1、介绍下HashMap内部结构
- （1）HashMap的成员变量
- （2） HashMap的静态内部类Node

前言

本篇文章介绍了JDK1.8中HashMap的内部结构及各种条件下调用put方法HashMap的行为。
另外通过对HashMap的源码进行阅读，整理了部分注释及翻译，附在文章末尾。

1、介绍下HashMap内部结构

（1）HashMap的成员变量

介绍下HashMap的成员变量：

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

table:是一个节点数组，在第一次使用是初始化，必要时会进行resize操作。分配的长度总是2的次方。在一些允许目前不需要引导机制的操作中，我们也容忍长度为0.
结合下面对 Node<K,V>的介绍，可以看出table给我们呈现出来的是一个链表结构，每个节点都指向了下一个节点。

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

entrySet：一个Set集合，元素是由Map接口中的Entry<K,V>接口组成，Entry<K,V>可视为键值对。

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

size：在map中包含键值对的数量。

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

modCount：HashMap产生结构变动的次数。

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

threshold：等于capacity * load factor，即容量和负载因子的乘积，进行resize的大小。

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

loadFactor：hash table的负载因子。

（2） HashMap的静态内部类Node<K,V>与TreeNode<K,V>

下面介绍下HashMap的静态内部类Node<K,V>：

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

用于大多数entries的基本哈希容器节点。（见下文TreeNode子类，子类LinkedHashMap的条目）
既然提到了TreeNode，那我们来看下：

    /**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     */
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }

        /**
         * Returns root of tree containing this node.
         */
        final TreeNode<K,V> root() {
            for (TreeNode<K,V> r = this, p;;) {
                if ((p = r.parent) == null)
                    return r;
                r = p;
            }
        }

        /**
         * Ensures that the given root is the first node of its bin.
         */
        static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
            int n;
            if (root != null && tab != null && (n = tab.length) > 0) {
                int index = (n - 1) & root.hash;
                TreeNode<K,V> first = (TreeNode<K,V>)tab[index];
                if (root != first) {
                    Node<K,V> rn;
                    tab[index] = root;
                    TreeNode<K,V> rp = root.prev;
                    if ((rn = root.next) != null)
                        ((TreeNode<K,V>)rn).prev = rp;
                    if (rp != null)
                        rp.next = rn;
                    if (first != null)
                        first.prev = root;
                    root.next = first;
                    root.prev = null;
                }
                assert checkInvariants(root);
            }
        }

        /**
         * Finds the node starting at root p with the given hash and key.
         * The kc argument caches comparableClassFor(key) upon first use
         * comparing keys.
         */
        final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
            TreeNode<K,V> p = this;
            do {
                int ph, dir; K pk;
                TreeNode<K,V> pl = p.left, pr = p.right, q;
                if ((ph = p.hash) > h)
                    p = pl;
                else if (ph < h)
                    p = pr;
                else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                    return p;
                else if (pl == null)
                    p = pr;
                else if (pr == null)
                    p = pl;
                else if ((kc != null ||
                          (kc = comparableClassFor(k)) != null) &&
                         (dir = compareComparables(kc, k, pk)) != 0)
                    p = (dir < 0) ? pl : pr;
                else if ((q = pr.find(h, k, kc)) != null)
                    return q;
                else
                    p = pl;
            } while (p != null);
            return null;
        }

        /**
         * Calls find for root node.
         */
        final TreeNode<K,V> getTreeNode(int h, Object k) {
            return ((parent != null) ? root() : this).find(h, k, null);
        }
		
		......

    }

键值对的树形容器。因为继承自LinkedHashMap.Entry（继承自Node），所以可以被当作普通Node节点或者连接节点使用。
可以看到TreeNode内部采用了红黑树结构，所以具备了红黑树的特性。

(3)HashMap的静态常量值

还要介绍下HashMap中的一些常量值：

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

默认的初始容量值：2的4次方=16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

最大容量：2的30次方。如果一个更大值被隐式地通过构造函数的参数来指定时，需要用到它。

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

默认负载因子：0.75。当构造函数不指定时用默认值。

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

使用树而不是使用bin的列表的bin计数阈值。
将元素添加到具有至少这么多节点的bin时，bin被转换为树。该值必须大于2且应至少为8才能与树木移除中的假设相关联，以便在收缩时转换回普通箱。

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

用于在调整大小操作期间解除（拆分）bin的bin计数阈值。应该小于TREEIFY_THRESHOLD，并且最多6个与去除时的收缩检测网格。

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

可以树箱化的最小表容量。（否则，如果bin中的节点太多，则会调整表的大小。）至少应为4 TREEIFY_THRESHOLD以避免调整大小和树化阈值之间的冲突。

2、看看HashMap初始化

Map<String,Object> map = new HashMap<>();

在HashMap中执行的是：

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

此时HashMap采用默认的负载因子0.75和默认的初始容量16来创建。

3、put方法

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

进入putVal方法：

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

这里首先定义了四个变量：

Node<K,V>[] tab; Node<K,V> p; int n, i;

接着进行了判断：

        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;

判断当table为空或者table长度为0时，调用resize方法，将结果赋值给tab变量，并将tab的长度赋值给n。
我们看看resize方法做了什么：

    /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

对table大小进行初始化或使其成倍增长。如果为空，则给目标字段分配初始容量的阈值。否则因为我们使用二次方扩展，每个容器的元素必须呆在相同的索引上，或者在新table中移动2的幂偏移量。

（1）第一次调用put方法

通过debug调试，idea给我们呈现出了当第一次调用put方法时resize方法的行为：
在这里插入图片描述
结合上述截图，我们可以得知，当第一次调用put方法时，HashMap内部采用了默认的初始容量值16和默认的负载因子0.75进行链表的初始化，得到一个容量为16，阈值为12的table，并将table作为结果返回。

回到putVal方法，紧接着进行了第二次判断：

        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

这里在第二次判断中用了一次与运算：i = (n - 1) & hash，用它的结果作为tab数组的下标。debug时与运算结果为14，即判断数组的第15个元素是否为空，也就是说第二次判断是看链表末尾是不是已经有元素占了，如果没有，则调用newNode方法。

    /*
     * The following package-protected methods are designed to be
     * overridden by LinkedHashMap, but not by any other subclass.
     * Nearly all other internal methods are also package-protected
     * but are declared final, so can be used by LinkedHashMap, view
     * classes, and HashSet.
     */

    // Create a regular (non-tree) node
    Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
        return new Node<>(hash, key, value, next);
    }

新构建了一个Node节点对象，用需要put的key与value。将构建的节点对象赋值给索引为14的tab链表上，也就是判断为空的链表末尾的位置。
回到putVal方法：
先将modCount加1，接着size加1并判断是否超过阈值，如果超过需进行扩容。
最后调用afterNodeInsertion方法，HashMap中此方法实现为空：

void afterNodeInsertion(boolean evict) { }

最后return null，putVal方法结束。将结果返回给put方法。

（2）第二次及到达阈值前调用put方法

当put第二个键值对时，HashMap会判断链表的第二个位置是否为空，若为空即生成Node节点放上去。按照这个规律一直进行put操作，直到put第12个键值对时，debug截图如下：
在这里插入图片描述
这里发现put进去的键值对的位置并没有按照从小到大的顺序依次排列，而是跳过了中间几个数字，先记下这个问题。
回到我们的putVal方法中：

（3）当存储的键值对个数到达阈值时

当HashMap内部存储的键值对达到阈值时，我们看看会发生什么：
在将目标键值对放入链表阈值的节点上后，再一次调用了resize方法：
在这里插入图片描述
在这次resize方法调用中，新容量和新阈值都是原值的两倍。如下图所示：

下一篇接着对HashMap源码进行阅读。

附上HashMap类源码注释及翻译

阅读JDK8中HashMap源码的注释可知：
Hash table based implementation of the Map interface. This
implementation provides all of the optional map operations, and permits
null values and the null key. (The HashMap
class is roughly equivalent to Hashtable, except that it is
unsynchronized and permits nulls.) This class makes no guarantees as to
the order of the map; in particular, it does not guarantee that the order
will remain constant over time.

Hash table以实现Map接口为基础，这样的实现提供了所有可选的map操作，以及允许key和values为null.
(除了线程不安全和允许nulls之外，HashMap类粗略等价于Hashtable)这个类不担保map的顺序不发生变化，特别是，它不能保证随着时间的推移顺序不发生变化。

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

假设hash方法正确的将元素分散在buckets中，那么这样的实现就提供了常数时间的性能的像get、put这样的基本操作。
迭代集合视图所需的时间，是通过成比例的把HashMap的"capacity"实例即buckets数量与HashMap的size即键值对数量相加得来。因此，当迭代性能十分重要的时候，不要把初始容量设太大（或者把加载因子设太低）。

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

一个HashMap实例有两个影响性能的参数：初始容量和加载因子。容量是指hash table的buckets 数量，那么初始容量可以简单认为是hash table被创建时的容量。负载因子是在容量自动扩容之前hash table允许进行多满的衡量。当hash table的键值对数量超过加载因子和当前容量的乘积时，hash table会进行rehashed（即内部数据结构重新构建），因此hash table大约是buckets数量的两倍。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

作为一般规则，默认加载因子（0.75）是在时间与空间花费之间较好的权衡。较高的值虽然可以降低空间开销，但却使得查询成本增加（包括get、put操作在内的许多操作都反映了这一结果）。 map中期望键值对数量与负载因子在设置时应考虑其初始容量，以减少rehash 操作的次数。如果初始容量大于键值对最大数量除以负载因子的结果时，不会进行rehash 操作。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same {@code hashCode()} is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are {@link Comparable}, this class may use comparison order among keys to help break ties.

如果一个HashMap实例要存储许多映射，比起有需要时再自动扩容，用足够大的初始容量将让映射更高效的存储。注意在使用许多hashCode()值相同的key时，一定会导致hash table性能的下降。为了减少这种影响，当keys实现了Comparable时，HashMap可能使用比较key的方式来阻止这种情况。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.

注意这种实现是线程不安全的。当多线程并发访问且至少有一个线程对map进行了结构上的修改，一定需要外部的同步操作。（结构上的修改是指任何添加或删除一个或多个映射的操作，仅仅是通过已有的key来改变对应的值不属于结构上的修改）这通常通过对一些自然封装了map的对象同步来完成。

If no such object exists, the map should be “wrapped” using the
{@link Collections#synchronizedMap Collections.synchronizedMap}
method. This is best done at creation time, to prevent accidental
unsynchronized access to the map:



Map m = Collections.synchronizedMap(new HashMap(…));

如果没有这样的对象存在，那么map应该被Collections的synchronizedMap方法“包裹”起来。这最好在创建时完成，以免意外的非同步访问。

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a {@link ConcurrentModificationException}. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

This class is a member of the Java Collections Framework.