HashMap源码解读

最新推荐文章于 2022-07-08 23:51:16 发布

Rcfee

最新推荐文章于 2022-07-08 23:51:16 发布

阅读量287

点赞数

分类专栏：源码原理文章标签： HashMap源码解读 hashmap

本文链接：https://blog.csdn.net/xiazaizhuanyong1231/article/details/113345881

版权

源码原理专栏收录该内容

2 篇文章 0 订阅

订阅专栏

jdk1.8

提要

Map中桶的元素初始化是链表保存的，其查找性能是O(n)，而树结构能将查找性能提升到O(log(n))。

链表长度达到8且数组长度大于64就转成红黑树，当长度降到6就转成普通bin(链表)。

HashMap的本质可以认为是一个数组，数组的每个索引被称为桶(bins，也叫哈希桶/存储桶，是hashMap的容量)，每个桶里放着一个单链表(或红黑树)，一个节点连着一个节点。

通过下标来检索数组元素时间复杂度为O(1)，而且遍历链表的时间复杂度是O(n)，所以在链表长度尽可能短的前提下，HashMap的查询复杂度接近O(1)。

新来的Entry节点插入链表时使用的是“头插法”，即会插在链表的头部，因为HashMap的发明者认为后插入的Entry被查找的概率更大。

resize()中do循环将旧table遍历赋值给新table认为直接如下就好，但为什么写这么多？无法理解，望知道的兄弟可以解答下。

//如果扩容后的新散列下标不变
if ((e.hash & oldCap) == 0) {
    newTab[j] = oldTab[j];
//如果变动，则加上原table长度
}else{
    newTab[j + oldCap] =  oldTab[j];
}

源码注释翻译

     * This map usually acts as a binned (bucketed) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins, but
     * relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated. However, since the vast majority of bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.

以上解释如下：map大多数情况是一个binned（bucketed）哈希表,但当bins中元素过多时，存储桶则会转变为树节点，类似java.util.TreeMap(红黑树)。
大多数方法是使用常规bins(链表)，但在某些场景下是使用TreeNode（比如检测是否为node节点）。红黑树的bins和其他常规的bins一样遍历和使用，只是当存储桶容量过大时支持更快的查找效率。然而，大多数情况下，bins中的容量并不会过大，所以在内部方法中检测树bins是否存在的逻辑也是靠后的。

Tree bins (i.e., bins whose elements are all TreeNodes) are
ordered primarily by hashCode, but in the case of ties, if two
elements are of the same "class C implements Comparable<C>",
type then their compareTo method is used for ordering. (We
conservatively check generic types via reflection to validate
this -- see method comparableClassFor).  The added complexity
of tree bins is worthwhile in providing worst-case O(log n)
operations when keys either have distinct hashes or are
orderable, Thus, performance degrades gracefully under
accidental or malicious usages in which hashCode() methods
return values that are poorly distributed, as well as those in
which many keys share a hashCode, so long as they are also
Comparable. (If neither of these apply, we may waste about a
factor of two in time and space compared to taking no
precautions. But the only known cases stem from poor user
programming practices that are already so slow that this makes
little difference.)

以上释义为：红黑树的bins中的树节点主要靠hashcode排序，但如果两个元素都是实现了Comparable，则它们用compareTo方法排序
(谨慎起见，我们通过反射来验证是否为泛型 -- 可以参考方法comparableClassFor())。当键有不同的hashcode或者可排序时，增加树bins提供时间复杂度为OlogN是值得的，因此，在意外或恶意使用hashCode()方法返回分布不佳的值以及相同哈希值的key，性能会优雅地降低，只要它们可以比较。

Because TreeNodes are about twice the size of regular nodes, we
use them only when bins contain enough nodes to warrant use
(see TREEIFY_THRESHOLD). And when they become too small (due to
removal or resizing) they are converted back to plain bins.  In
usages with well-distributed user hashCodes, tree bins are
rarely used.  Ideally, under random hashCodes, the frequency of
nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a
parameter of about 0.5 on average for the default resizing
threshold of 0.75, although with a large variance because of
resizing granularity. Ignoring variance, the expected
occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
factorial(k)). The first values are:
     *
0:    0.60653066
1:    0.30326533
2:    0.07581633
3:    0.01263606
4:    0.00157952
5:    0.00015795
6:    0.00001316
7:    0.00000094
8:    0.00000006
more: less than 1 in ten million

TreeNodes占用空间是普通Nodes的两倍，所以只有当桶中包含足够多的节点时才会转成TreeNodes(参照TREEIFY_THRESHOLD)。当节点数变少时，又会转成普通的链表bins。当hashCode离散性很好时，树形bin用到的概率很小。理想情况下，在随机hashCodes下，bin中节点的频率遵循泊松分布。
对于0.75的默认调整阈值，参数平均值约为0.5，尽管由于调整粒度而存在很大的差异。忽略方差，预期出现的列表大小K是(exp(-0.5) * pow(0.5, k) /
* factorial(k))。哈希桶元素超过8的概率为：0.00000006，几乎不可能

The root of a tree bin is normally its first node.  However,
sometimes (currently only upon Iterator.remove), the root might
be elsewhere, but can be recovered following parent links
(method TreeNode.root()).

树箱的根节点通常是第一个节点，然而，有时(当前仅在Iterator.remove)根节点可能在其他地方，但可以在父链接之后恢复。

All applicable internal methods accept a hash code as an
argument (as normally supplied from a public method), allowing
them to call each other without recomputing user hashCodes.
Most internal methods also accept a "tab" argument, that is
normally the current table, but may be a new or old one when
resizing or converting.

所有适用的内部方法都接受哈希码作为参数（通常从公共方法提供），允许它们相互调用而无需重新计算用户hashCodes。
大多数内部方法也接受“tab”参数，通常是当前表，但在调整大小或转换时可能是新的或旧的。

When bin lists are treeified, split, or untreeified, we keep
them in the same relative access/traversal order (i.e., field
Node.next) to better preserve locality, and to slightly
simplify handling of splits and traversals that invoke
iterator.remove. When using comparators on insertion, to keep a
total ordering (or as close as is required here) across
rebalancings, we compare classes and identityHashCodes as
tie-breakers.

当bin列表被树化，拆分或未解析时，我们将它们保持在相同的相对访问/遍历顺序（即，字段Node.next）中以更好地保留局部性，并略微简化对调用iterator.remove的拆分和遍历的处理。当在插入中使用比较器时，为了保持整个重新排序的总排序（或者在这里需要尽可能接近），我们比较类和identityHashCodes作为tie-breakers。

The use and transitions among plain vs tree modes is
complicated by the existence of subclass LinkedHashMap. See
below for hook methods defined to be invoked upon insertion,
removal and access that allow LinkedHashMap internals to
otherwise remain independent of these mechanics. (This also
requires that a map instance be passed to some utility methods
that may create new nodes.)

由于LinkedHashMap子类的存在，普通模式和树模式之间的使用和转换变得复杂。
请参阅下面的定义在插入，删除和访问时调用的钩子方法，这些钩子方法允许LinkedHashMap内部保持独立于这些机制。(这还需要将映射实例传递给一些可能创建新节点的实用工具方法。)

HashMap方法源码解读

put操作

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

其中求key的hash方法，如果为null，则放置在0下标，这也是key为null元素只有一个的原因

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

主要看：(h = key.hashCode()) ^ (h >>> 16)

h >>> 16取key的hashcode高16位，与key的hashcode异或运算，目的是让结果更加离散随机。异或运算是：对应位相同，输出0，不同输出1。

由于和（length-1）运算，length 绝大多数情况小于2的16次方。所以始终是hashcode 的低16位（甚至更低）参与运算。要是高16位也参与运算，会让得到的下标更加散列。

所以这样高16位是用不到的，如何让高16也参与运算呢。所以才有hash(Object key)方法。让他的hashCode()和自己的高16位^运算。所以(h >>> 16)得到他的高16位与hashCode()进行^运算。

为什么用^而不用&和|
因为&和|都会使得结果偏向0或者1 ,并不是均匀的概念,所以用^。

这就是为什么有hash(Object key)的原因。

get(Object key)

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

先计算key的hashcode，从Node(K,V)中比较hash值，如果相等则比较key，相等则返回，具体的getNode方法如下：

final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        //如果该hashMap不为空
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // always check first node，先比较第一个元素是不是要找的元素；
            //先比较hash，再比较key
            if (first.hash == hash && 
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                //如果是红黑树，则按红黑树查找
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                //除第一个元素外，再遍历其他元素是否为要找的元素
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

而判断key是否存在，也是调用了getNode方法是否为null

public boolean containsKey(Object key) {
        return getNode(hash(key), key) != null;
    }

接下来看增加元素的方法put()

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

putVal()

往下看putVal方法，注解可以了解到该方法的第4、5个参数的意思：

* @param onlyIfAbsent if true, don't change existing value //如果为true，则不能改变已有的元素(我们通常是会更改的，所以为false)；
* @param evict if false, the table is in creation mode. //如果为true，则该table为创建模式，而table指的就是HashMap的成员变量transient Node<K,V>[] table;

   /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        //定义链表数组、（进行一系列操作的）链表、数组长度、操作的数组下标
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //如果内置链表数组为空  或者  刚自定义的链表数组长度为0(也就是说如果还没有进行过数据操作)
        if ((tab = table) == null || (n = tab.length) == 0)
            //则初始化一个链表数组
            n = (tab = resize()).length;
        如果map数组中的最后一个hash值下标位为空
        if ((p = tab[i = (n - 1) & hash]) == null)
            //则生成一个链表，将数据put到这个位置
            tab[i] = newNode(hash, key, value, null);
        //当前数组下表已经存在元素
        else {
            //则再定义一个(如果key值重复的已存在)链表、当前位置已存在数据的key
            Node<K,V> e; K k;
            //如果两个hash值相等且（key内存地址相同 或者 非空情况下key值相同）
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                //则将该位置链表标记出来进行后续操作
                e = p;
            //如果该下标下是红黑树
            else if (p instanceof TreeNode)
                //则按红黑树的逻辑进行赋值且标记为e
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //如果数组最后一位不相同，且不是红黑树，则将链表进行遍历
            else {
                for (int binCount = 0; ; ++binCount) {
                    //如果到了链表的最后都没有相同的key（这时候e也为空了）
                    if ((e = p.next) == null) {
                        //则在链表最后位将数据插入
                        p.next = newNode(hash, key, value, null);
                        //如果链表长度到了8位(0-7)
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            //转红黑树的逻辑，但该方法中还有判断table是否小于64
                            //如果小于64，则采用扩容的方式，否则才转为红黑树
                            treeifyBin(tab, hash);
                        break;
                    }
                    //如果有key值相同 则跳出循环
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //如果以上操作中有相同的key
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                //如果可以进行数据修改 或者 原来的值为空,则新值替换旧值
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                //返回旧值
                return oldValue;
            }
        }

        //这个字段用做hashMap进行数据时判断是否有其他线程操作过，
        //如果modCount被修改过，则会抛出异常
        ++modCount;
        //如果是添加了新数据（没有重复的数据），则将已保存的数据数量自增
        //该值如果大于阈值，则将数组进行扩容
        //不管是初始化还是扩容都是2的幂次
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

接下来看下具体是怎么赋值的，主要的相关方法包括如下：

1、初始化容器：resize()；

2、下标无元素，新建链表：newNode(hash, key, value, null);

3、如果为红黑树，则使用红黑树((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

4、如果达到转红黑树的条件，则转红黑树treeifyBin(tab, hash);

resize()

    /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;//threshold：临界值，超过这个值就需要扩容
        int newCap, newThr = 0;
        //如果不为空
        if (oldCap > 0) {
            //MAXIMUM_CAPACITY为1 << 30，也就是2的30次方
            //数组长度不能超过int的最大值
            if (oldCap >= MAXIMUM_CAPACITY) {
                //设置下次扩容点为int的最大值0x7fffffff，2的31次方-1
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            //新长度扩大两倍，如果小于最大长度，且原长度>=等于默认长度，默认长度为1 << 4=16
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                //新的扩容点扩大一倍
                newThr = oldThr << 1; // double threshold
        }
        //如果原map元素不为空，则使用带参的构造方法
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        //如果为空，则使用默认的容量大小，临界值=默认值*扩容因子
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            //如果容器大小小于最大容器数量，且扩容后也小于最大值，则直接扩容，否则，设置最大临界值为int最大值
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        //将扩容后的临界值赋值给原值
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            //根据resize初始化链表
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        //将初始化后的链表赋值给成员变量table
        table = newTab;
        //接下来将原table的值迁移到新node中
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    //next==null，表示只有一个元素，直接放到新table中，e.hash & (newCap - 1)计算在新table中的位置
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e; //注释3
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) { //注释4
                                if (loTail == null)  //注释5
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)   //注释6
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {    //注释7
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {    //注释8
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

注释4：1.7下resize()方法是采用的头插法，也就是扩容后链表元素反过来了，比如3->5->7变成了7->5->3，同时，在多线程情况下，如果线程A扩容成功，链表已经逆向排序，线程B准备处理节点时，就可能会出现环形链表。

1.8对该方法做了较大的修改，计算节点在table中下标的方法是：

注释3的位置：e.hash & (newCap - 1)，newCap=oldCap << 1=oldTable.length()*2，所以下标公式为hash & (oldTable.length()*2- 1)=>hash & (oldTable.length()+oldTable.length()- 1)，也就是新下标位置等于旧下标位置+旧数组长度。

那么注释4中的(e.hash & oldCap) == 0是什么意思呢？解释如下：

假设oldCap=16，newCap=32，那么一个hash在扩容前后的下标位置计算如下：

如上图，假设hash值为abcdef，那么原table与扩容后的table低4位一定都是一样的，如果与hash值进行与操作后，第5位为0，则b是0，反之b为1，则新table比就table多10000(二进制)，而这个二进制就是原table的长度。

所以，新下标位置是否需要加上原table长度，只需要看hash值的第5位是不是1，位运算的方法就是hash值和1000来与运算，其结果只可能是10000或00000，而注释4就是计算位置b是0还是1，是0就是原散列下标，反之加上原table的长度。

注释5与注释6逻辑是一样的，只是用的参数不一样，用以区分新散列下标是否需要加原table长度，假设原链表是3->5->7，且下标位置不变：

第一次循环：e=3，next = e.next = 5，loTail为null，所以loHead=e=3，loTail=e=3；

第二次循环：do..while执行完第一次后，e=next=5，next=e.next=7,loTail不为null，所以loTail.next=e=5,loTail=5;

第三次循环同理：e=7，next=null(7是链表最后一个节点，所以next为null)，loTail.next=7, loTail=7;

所以执行下来的链表顺序与原顺序一致，不会出现倒叙的情况。

注释8以后的代码就是

只要loTail不是null，说明链表中的元素在新table中的下标没变，所以新table的对应下标中放的是loHead，另外把loTail的next设为null

反之，hiTail不是null，说明链表中的元素在新table中的下标，应该是原下标加原table长度，新table对应下标处放的是hiHead，另外把hiTail的next设为null。

newNode就是初始化一个node

    Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
        return new Node<>(hash, key, value, next);
    }

TreeNode的相关知识：HashMap中TreeNode解读

HashMap的扩容机制

tableSizeFor

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

"|"或运算，有1为1，无1为0，假设我们传的值是8，分析如下

int n = cap - 1;//值为7，二进制为0000 0111

n |= n >>> 1;=>n = n | n>>>1，计算结果如下

最终我们知道，该计算的结果为初始容量值的二进制低位全部补为1(因第一位是1，所以最后的10进制结果肯定为单数)，然后最终+1，使结果最终为2的倍数，且最接近传入值。

treeifyBin

treeifyBin方法是将链表转成红黑树的方法，方法中可以看出除了链表长度达到8，还需大于或等于MIN_TREEIFY_CAPACITY才会转红黑树，而MIN_TREEIFY_CAPACITY是64。

tab[index = (n - 1) & hash]) 是取出table的最大下标，与hash值进行与运算，与运算是两者为1才为1，否则为0，所以结果肯定小于最大下标，这样也防止了数组越界。一个值a与另一个值b进行&预算，结果肯定小于或等于a。

    /**
     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        //如果数组长度小于64，直接扩容
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

Rcfee

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
HashMap源码解读

JDK1.8提要： Map中桶的元素初始化是链表保存的，其查找性能是O(n)，而树结构能将查找性能提升到O(log(n))。链表长度达到8就转成红黑树，当长度降到6就转成普通bin(链表)。 HashMap的本质可以认为是一个数组，数组的每个索引被称为桶(bins，也叫哈希桶/存储桶，是hashMap的容量)，每个桶里放着一个单链表(或红黑树)，一个节点连着一个节点。通过下标来检索数组元素时间复杂度为O(1)，而且遍历链表的时间复杂度是O(n)，所以在链表长度...
复制链接

扫一扫

专栏目录