从源码的角度分析HashMap

最新推荐文章于 2021-01-07 19:21:25 发布

搬砖工-->攻城狮

最新推荐文章于 2021-01-07 19:21:25 发布

阅读量191

点赞数

分类专栏：工作日常集合

本文链接：https://blog.csdn.net/procedure_monkey/article/details/86085667

版权

工作日常同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

集合

3 篇文章 0 订阅

订阅专栏

前言
java集合对于java开发来说很重要，而集合中又是以HashMap为重头戏。所以理解HashMap的实现机制就很重要了。
HashMap的数据存储结构，随便一搜就是一大堆，所以就不说了。现在主要是从HashMap源码入手，解析HashMap是如何实现数据的存储的。

初始化 new HashMap();
欲练此功，必先自宫。欲使HashMap，必先new 之；

Map<String,String> map=new HsahMap<String,String>();

这是我们最常使用的new 一个HsahMap的方式，那么在new它的时候，它背着我们做了哪些勾当呢，这就需要去看它的构造方法了。

public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

代码中发现，其实它就给loadFactory这个参数复制，那么这两个参数是什么呢？接着追踪它的源码。

 final float loadFactor;
 static final float DEFAULT_LOAD_FACTOR = 0.75f;

HashMap默认初始化时，其实就是赋值一个实例参数loadFactory=0.75f。

HashMap初始完以后，就到了给它塞值了，也就是进行put操作。

map.put("keyValue","values");

第一次塞值 put(“XX”,“XXX”)

接下来看put的源码

 public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

在源码中可以看出，它做了一系列的判断操作，为了降低读码的难度，现在对代码做一下提取，下面提出来的代码是第一次put值时执行的代码。

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else { }
        ++modCount;
        if (++size > threshold)
           ;
        return null;
    }

经过提取以后，第一次put值时的代码就变的精简了许多。开始逐行解析。

Node<K,V>[] tab; Node<K,V> p; int n, i;

声明一个Node数组，Node变量以及两个int变量。Node究竟是什么呢？下面提出了Node类主要代码。

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
   }

接着往下看putVal的源码

 transient Node<K,V>[] table;
 if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;

将实例变量table的值赋给新new出来的Node数组tab，HashMap在初始化时并为给实例变量table赋值，所以在第一次put时，table=null，也就是说满足条件执行n = (tab = resize()).length;

2.1. HashMap第一次初始化容器大小resize()
resize()源码如下:

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

照例，还是将第一次初始化容器大小执行的代码提取出来方便分析。

 int threshold;
 final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) { }
        else if (oldThr > 0) {  }；
        else {    
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) { }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {};
        return newTab;
    }

这里一定要抓住第一次,第一次table=null;所以：oldCap =0；oldThr = threshold=0，[threshold是HashMap的一个实例变量。未赋值之前默认为0]。代码往下执行，直到：

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
static final float DEFAULT_LOAD_FACTOR = 0.75f;
///
 newCap = DEFAULT_INITIAL_CAPACITY;
 newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

这里的赋值用到两个静态常量，我已列举出来。
newCap =1 << 4 即:1<<4 =10000（二进制）=16(十进制)
newThr =int(16X 0.75f)
然后继续往下走

 threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {};
        return newTab;

这里第一次赋值给常量threshold= newThr =int(16X 0.75f)；然后new出一个长度为newCap=16的Node数组并给返回。然后我们再回到putVal方法。

n = (tab = resize()).length;

此时 tab=长度为16的Node数组。n=16。到这里初次容易初始化结束，代码接着往下执行

if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

这里需要说一下在put时，node数组的下标是如何的出来的，在源码里面可以发现它的计算方法(n - 1) & hash。node数组的长度减一按位与上key值的hash。举例put(“来来来”,“leanHash”);此时key为字符串"来来来",key的hash值为：26283717，转换为二进制：0001100100010000111101010100；此时n=16，那么数组下标i=n-1&hash即：
0001 1001 0001 0000 1111 0101 0100 26283717
0000 0000 0000 0000 0000 0000 1111 15
&
0000 0000 0000 0000 0000 0000 0100 4
可以发现，最后与出来的下标只取决于后四位。换句话来说，就是不管key的hash如何变，经过计算以后，最后得出的下标都是0—15之间的数。

继续回到程序，为什么需要if ((p = tab[i = (n - 1) & hash]) == null)这样判断一下呢？因为这样计算出来的下标可能会出现重复，也就是hash冲突。例如：
0001 1001 0001 0001 1111 0101 0100
0001 1001 0001 0010 1111 0101 0100
0001 1001 0001 0100 1111 0111 0100
这三个hash于上
0000 0000 0000 0000 0000 0000 1111
最后结果都是4。
遇到这样的问题HashMap是如何处理的呢？这个问题后面会说，咱们现在看的是第一次put值的情况，自然就不会存在hash冲突的情况，所以就执行table[i]=tab[i] = newNode(hash, key, value, null)。实例变量table于tab指向的是同一个Node数组。

newNode源码如下：

 Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
        return new Node<>(hash, key, value, next);
    }

new出了一个Node实例。
回到putVal方法继续往下走，到

 ++modCount;
        if (++size > threshold)
            ;
        return null;

modCount自增1，size自增1；threshold 在初始化容器时给它赋值，当时是这样赋值的：threshold= newThr =int(16X 0.75f)。为什么会有这样一个判断呢，这个是因为需要扩容。扩容下面会说，这里不再赘述。至此，第一次put值的操作完成。

第二次put值至扩容前put值的情况
同样提取出执行这种情况下执行的代码

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode){}
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                  e.value = value;
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        return null;
    }

开始逐步解析：

if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
++modCount;
if (++size > threshold)
   resize();
   return null;

计算put的key对应的数组下标，如果当前下标下尚未放值，即为空时，直接newNode(hash, key, value, null)后放在这个位置。modCount自增1；size自增1；（newNode(hash, key, value, null)这个方法之前以说过，就不在重复）。
如果当前下标下已经有数据，即出现hash冲突，hash冲突有两种情况，一种是此时put进来的key之前已经put过，另外一种是key没有put过，经过计算得出的数组位置重复的。现在先来看key已经put过的处理：

 if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
  if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                  e.value = value;
                return oldValue;
            }

代码中可以看出来，当key值重复时，后面的value将覆盖前一个的value覆盖并返回前一个的value；
当key值未put过出现hash冲突的情况：

for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
 if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                return oldValue;
            }
++modCount;
if (++size > threshold)
   resize();
   return null;

这里还是要提一下HashMap的数据存储结构，HashMap存储数据说简单点就是在数组里面放链表。接下来就开始解析上面这段代码。出现hash冲突并且key不重复时，就取这个位置的数据(node),然后判断该node的实例变量next，如果为空，就newNode(hash, key, value, null)new一个Node实例挂上去，完成一个链表。如果next不为空，则验证next里面的key于当前key是否重复，重复就覆盖并返回被覆盖的值。如果key不重复，则一直往下找，直至找到空的next并new新的Node挂上去为止。
计数，当key重复出现覆盖操作时，计数器不变，当出现新的key时，则计数器自增1；
为什么需要计数呢？计数是为了扩容。那么什么是扩容呢？为什么需要扩容呢?

扩容：当HashMap put值的数量大于node数组长度的0.75倍时，HashMap会重新初始化一个新的node数组，这个新的数组长度是原数组的2倍，并将原数组中put过的值移至新的node数组。这个过程就是扩容。

为什么需要扩容？
这里再次提一下HashMap的数据存储结构，HashMap是在数组里面存链表的形式存储数据。这样存储带来的好处是能通过数组下标（hash）快速查找到key所在的链表，进而查找到key所对应的value。假设node数组长度为16，在不扩容的情况下再这个数组里面put 100个key value，那么必然出现很多hash冲突，也就是说数组中存放的链表会越来越长，此时HashMap查找的性能越会越来越低。扩容就能很好解决这样的问题，扩容是一种以内存换效率的做法。

扩容resize()
resize()方法完整的源码上面已有，这里提取出来的是非第一次初始化的扩容代码：

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) 
            newCap = oldThr;
        else { }
        if (newThr == 0) { }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

下面开始逐条分析，我以第二次扩容为例来分析

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;//长度为16的node数组
        int oldCap = 16;
        int oldThr = 12;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) { }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        threshold = newThr;
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode) {};
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

这段代码中，各种判断满足扩容条件后，新new出了一个长度为32的node数组，并将原来长度为16的node数组中的值移入新的node数组中。
由代码：newCap = oldCap << 1可以看出，每次扩容扩的程度为原数组的2倍 [<<1]左移一位等价于乘以2
老的node数组里面的数据是如何搬到新node数组里面的呢？
if (e.next == null) newTab[e.hash & (newCap - 1)] = e;如果node数组中的node.next==null,也就是说这个位置只存放了一个node数据，不存在链表的情况，直接计算新的数组下边放到相应的位置。

数组中存放链表的情况：

Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
    next = e.next;
    if ((e.hash & oldCap) == 0) {
        if (loTail == null)
           loHead = e;
        else
          loTail.next = e;
     loTail = e;
   }
   else {
      if (hiTail == null)
         hiHead = e;
      else
        hiTail.next = e;
      hiTail = e;
   }
} while ((e = next) != null);
if (loTail != null) {
   loTail.next = null;
   newTab[j] = loHead;
}
if (hiTail != null) {
   hiTail.next = null;
   newTab[j + oldCap] = hiHead;
}

代码看着是很简单，可是关于数组下标的计算就有点懵了。先来看一下代码中是如何计算的。j表示老数字中当前位置的数据。代码中，先用hash与老数组长度与运算。计算结果为0，那么这个node数据在新node数组里面的下标与在老数组里面的下标一样；否则，这个node数据在新node素组里面的下标等于老数组的下标加上老数组的数组长度。下面开始来验证一下。
首先我们先找到两个会出现hash冲突的的值
hash1：0001 1001 0001 0000 1111 0100 0100
hash2：0001 1001 0001 0000 1111 0101 0100
node数组长度为16时，他们对应的数组下标，计算公式：hash&(length-1)

hash1
0001 1001 0001 0000 1111 0100 0100 hash1
0000 0000 0000 0000 0000 0000 1111 (length-1)=15
&
0000 0000 0000 0000 0000 0000 0100 4

hash2
0001 1001 0001 0000 1111 0101 0100 hash2
0000 0000 0000 0000 0000 0000 1111 (length-1)=15
&
0000 0000 0000 0000 0000 0000 0100 4

可以看出来，两个hash计算出来的数组下标都为4。

当数组扩容到长度为32时，我们再来计算两个hash对应的数组位置
hash1
0001 1001 0001 0000 1111 0100 0100 hash1
0000 0000 0000 0000 0000 0001 1111 (length-1)=31
&
0000 0000 0000 0000 0000 0000 0100 4

hash2
0001 1001 0001 0000 1111 0101 0100 hash2
0000 0000 0000 0000 0000 0001 1111 (length-1)=31
&
0000 0000 0000 0000 0000 0001 0100 20
由上可知，两个hash在扩容后hash1的位置任然为4，但是hash2的位置变成了20
接下来再来计算hash & length
hash1
0001 1001 0001 0000 1111 0100 0100 hash1
0000 0000 0000 0000 0000 0001 0000 (length)=16
&
0000 0000 0000 0000 0000 0000 0000 0

hash2
0001 1001 0001 0000 1111 0101 0100 hash2
0000 0000 0000 0000 0000 0001 0000 (length)=16
&
0000 0000 0000 0000 0000 0001 0000 不为0

代码中的逻辑是hash & oldLength(16)等于0时，当扩容的32时，数组所在位置保持不见 [hash1] ；否则，当扩容到32时，在新数组中的位置为在原数组里面的位置+原数组的数组长度，即4+16=20 [hash2] 。

总结
1:new HashMap()时并没有初始化node数组，只是初始化数组需要的一些参数。到了第一次put操作的时候，才开始初始化node数组。使用默认的初始化，初始化出来的是长度为16的node数组；
2：当put的key value超过node数组长度的0.75倍时，HashMap会进行扩容操作，扩容的程度为原来的2倍。
3：经验之谈，在使用HashMap时，应尽量避免HashMap扩容操作，我们可以使用HashMap带参数的构造方法来决定初始化的大小。