HashMap(一) — HashMap 源码分析

Elson_6

已于 2022-10-24 15:17:29 修改

阅读量852

点赞数

分类专栏：数据结构和算法文章标签：数据结构

于 2019-09-30 11:09:40 首次发布

本文链接：https://blog.csdn.net/Love667767/article/details/80791773

版权

数据结构和算法专栏收录该内容

27 篇文章 0 订阅

订阅专栏

一、概述

版本： JDK 1.7 & JDK 1.8

1.1 HashMap的重要概念

JDK 1.8 以前 HashMap 的实现是数组+链表，且添加元素时，如果hashCode相同时，采用头插法。
若 HashMap 中元素分布不均匀，有大量的元素都存放到同一个桶时，该桶下的链表就比较长，此时 HashMap 就相当于一个单链表，而遍历链表的时间复杂度为 O(n)，散失了它的优势。
为了优化上面这个问题，JDK1.8 中引入了红黑树（查找时间复杂度为 O(logn)），因此JDK1.8版本的HashMap 采用数组+链表+红黑树的实现方案。

1.2 HashMap的几个变量

变量	术语	说明
size	大小	HashMap的大小，默认16
threshold	临界值	HashMap存储元素个数的临界值，当达到临界值时，需要重新扩容。 `( threshold = size * loadFactor )`
loadFactor	负载因子	HashMap的负载因子，默认是0.75

1.3 HashMap数据结构

下图展示了 JDK1.7 和 JDK1.8 中HashMap的数据结构。
在这里插入图片描述

二、Map族群中的关系图

Alt

三、HashMap 源码分析

接下来我们分析一下JDK1.7和JDK1.8版本的HashMap，看看他们具体有哪些不同。

3.1 JDK 1.7

JDK1.7版本中，我们主要分析如下几个方法：

hash() 计算元素的hash值。
put(K key, V value) 存数据。
resize() 扩容。
get(Object key) 取数据。

3.1.1 hash(Object key)

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }
    h ^= k.hashCode();

    // 多次异或操作，让元素尽可能的均匀分布在各个桶中。
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

// 使用hash()方法计算出的值与HashMap中桶的长度作&操作，得到元素需要插入桶的下标位置。
// 桶的长度为2^n次，所以(length-1)转为二进制后，低位显示都为1。
static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

小结：

hash() 方法内多次使用异或操作，主要是让高低位都参与运算，从而可以在后续 indexFor() 方法的 hash & (length-1) 操作中，可以让元素尽可能的均匀分布在各个桶中。具体可以参考 JDK1.8中的 hash()。

参考： 浅析hash()函数及tableSizeFor函数)

3.1.2 put(K key, V value)

关联方法：

inflateTable(threshold) 初始化数组
putForNullKey(V value) 允许key和value为null，当key为null时，将数据存入table[0]。
addEntry(int hash, K key, V value, int bucketIndex) 校验是否需要扩容，添加Entry。
createEntry(int hash, K key, V value, int bucketIndex) 使用头插法添加Entry。

下面开始分析 put(K key, V value) 流程：

public V put(K key, V value) {
    /** 1.如果是空的table数组，则初始化一个长度为16的数组 */
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }

    /** 2.如果key等于null，则将其放入table[0]所在元素的链表中 */
    if (key == null) {
        return putForNullKey(value);
    }

    /** 3.通过key进行hash操作，计算出当前元素需要插入到桶数组的第i个位置。 */
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    
    // 4.遍历查找：
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        // 4.1如果这个元素之前插入过，则更新value值，并将旧的value值返回出去
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    // 修改次数，与ArrayList中的类似。
    modCount++;

    /** 4.2如果这个元素没插入过，则调用addEntry进行添加 */
    addEntry(hash, key, value, i);
    return null;
}

/** 
 * 初始化数组，默认初始容量为16，即2^4。
 * roundUpToPowerOf2()方法返回大于给定数字的最小2^n 的数字，如传入3返回4，给传入5返回8。因为HashMap中桶的容量要按照 2^n进行设置。
 */
private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    // 返回大于给定数字的最小2^n的数字，如给定3返回4，给定5返回8。因为HashMap中桶的容量要按照2^n进行设置。
    int capacity = roundUpToPowerOf2(toSize);
	// 阈值设置，即最多允许存 capacity * 0.75个数据，超过之后就需要对HashMap中的桶进行扩容(2倍)。
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    // 初始化数组(桶)。
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

/**
 * Integer.highestOneBit((number - 1) << 1) 类似JDK1.8版本HashMap中的tableSizeFor()。
 */
private static int roundUpToPowerOf2(int number) {
    // assert number >= 0 : "number must be non-negative";
    return number >= MAXIMUM_CAPACITY
    	 ? MAXIMUM_CAPACITY 
    	 : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
}

/**
 * 将key为null的值存入table[0]，value也允许为null。
 */
private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
    	// key为null，value可以为null。
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}

/**
 * 数据保存，在保存数据前需要校验是否需要扩容。
 */
void addEntry(int hash, K key, V value, int bucketIndex) {
    /** 如果table数组保存的元素个数大于等于阈值，并且待插入到table数组中已经存有其他元素，则进行扩容操作 */
    if ((size >= threshold) && (null != table[bucketIndex])) {
        // 以2倍长度进行扩容，并进行数据迁移。
        resize(2 * table.length);
        // 重新计算待插入元素key的hash值
        hash = (null != key) ? hash(key) : 0;
        // 重新计算待插入新table数组的位置下标bucketIndex
        bucketIndex = indexFor(hash, table.length);
    }
    /** 否则，创建实体元素并将其插入到table[bucketIndex]位置上，即：头插法 */
    createEntry(hash, key, value, bucketIndex);
}

/**
 * 头插法
 * 步骤1：根据待插入下标bucketIndex查找该下标所存储的实体元素e；
 * 步骤2：创建待插入元素的实体，并将e放置在它的后置节点——即：next=e；
 */
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    // 每添加一个元素，size都会自增。
    size++;
}

小结：

put(K key, V value) 方法中，对于新增Entry的操作采用头插法进行添加。
put(K key, V value) 方法允许存入key、value都为null的数据。
put(K key, V value) 方法中通过 hash()和indexFor() 定位元素在table[]中存储的位置。
当table中的元素个数等于阈值(threshold)时，不会触发扩容，等到再次新增数据时，才会触发。
JDK1.7的头插法，在并发场景下同一个桶上的链表数据容易形成循环依赖，具体原因请参考 HashMap 高并发场景下的问题分析。

put 操作的流程图：
在这里插入图片描述

3.1.3 resize()

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    // 执行数据迁移
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

/**
 * Transfers all entries from current table to newTable.
 */
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    // 遍历table数组上所有元素
    for (Entry<K,V> e : table) {
        // 遍历链表所有元素
        while(null != e) {
            Entry<K,V> next = e.next;
            // 扩容后
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            /**
             * 采用头插法
             * 步骤一：待插入的元素，放置在table数组已有元素的前面(head)
             * 步骤二：将刚刚待插入的元素放置在table数组下标为i的位置，即：作为整条链表的头节点
             */
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

小结：

扩容时重新构建一个新的 Entry[]，数组长度为原先的2倍。
遍历扩容前的 table 数据，并重新执行 indexFor() 方法获取每个元素在 newTable[] 中的位置。
使用头插法，将遍历取出的每个元素添加到 newTable[] 中。

3.1.4 get(Object key)

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }
	// 1.对key进行hash计算。
    int hash = (key == null) ? 0 : hash(key);
    // 2.通过indexFor方法获取在table[]中的位置
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        // 3.遍历table[i]处的链表，并比较hash值，key值，如果相等就返回Entry。
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

小结：

对key进行hash计算，并通过indexFor方法获取在table[]中的位置。
遍历table[i]处的链表，比较hash值和key值都，如果都相等就返回对应的Entry。

到这里，JDK1.7的HashMap的分析就结束了，下面我们来看下

3.2 JDK 1.8

主要分析如下几个方法：

hash() 计算元素的hash值。
put(K key, V value)
resize() 扩容。
get(Object key)
tableSizeFor(int cap) 根据输入值的大小，返回一个2的幂次方的数字。
treeifyBin(Node<K,V>[] tab, int hash) 将链表转为红黑树进行处理。

3.2.1 hash(Object key)

与JDK1.7类似，可以参考文章：浅析hash函数及tableSizeFor函数

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower. 
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

3.2.2 put(K key, V value)

关联方法：

resize() ：扩容方法。
newNode()：将key，value包装到Node对象中，JDK1.7版本叫Entry对象。
treeifyBin()：链表长度大于等于8时转红黑树。

public V put(K key, V value) {
	// 对key的hashCode()做hash
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 1.table[]为空时初始化数组，默认初始容量为16。
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length; // resize() 扩容方法。
    // 2.计算key在tab[]中的索引index，如果tab[index]不存在数据，则直接添加Node。
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // 3.节点key存在，直接覆盖value
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 4.节点为红黑树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 5.节点为链表
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 链表长度大于等于8时，将链表转为红黑树进行处理
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // 在链表中找到对应的key，则返回相应的Node对象。
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 6.当前table[]中插入的元素个数超过了临界值时，则进行扩容。
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
    return new Node<>(hash, key, value, next);
}

小结：

Node节点与TreeNode：

Node节点是普通的链表节点。
TreeNode节点是红黑树节点。

put 方法的执行流程：

先对key做hash。
判断table是否初始化，没有就初始化数组。
计算 key 在table[] 中的位置index，方便后面知道对table的第index位置进行操作。
如果 table[index] == null 成立，则直接添加 Node到 table[index]位置。
如果 table[index] == null 不成立，则 table[index] 位置的Node有可能是链表的Node，也可能是TreeNode。但是如果key已经添加过，且是 table[index] 的第一个Node，则不管 table[index] 处第一个Node是哪一种类型都可以直接给Node.value赋值 (两种Node在table[index]第一个位置操作的共性)。
如果 table[index] == null 不成立，且不再 table[index] 第一个位置，则需要按照两个维度进行处理(Node类型维度、是否添加过的维度)。
1. 优先红黑树判断：如果没有添加过，则新增TreeNode节点；如果添加过，则返回之前的TreeNode节点。具体操作请参考：HashMap源码分析putTreeVal（红黑树部分）
2. 如果 table[index] 的第一个Node 不是TreeNode，那就是普通的链表Node。此时遍历table[index] 处的链表校验key。
  a. 遍历找到了与key对应的Node.key，则直接返回找到的Node。
  b. 遍历到链表结尾时没找到，则在链表的最后一个位置新增一个Node节点并返回。
获取到对应的Node对象并更新Value。
当第4步操作后，如果table[]中的元素个数超过阈值时，则进行扩容(与JDK1.7中不同：全部元素个数与阈值比)。

put操作的流程图：
在这里插入图片描述

3.2.3 resize()

/**
 * Initializes or doubles table size.  
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    // 由于可能会进行多次扩容，所以这里获取上一次散列表的大小
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // 获取上一次散列表的临界值
    int oldThr = threshold;
    int newCap, newThr = 0;
    // oldCap > 0 说明散列表不是初始化扩容，而是存储达到临界值进行了扩容。
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) { // MAXIMUM_CAPACITY = 1 << 30
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 扩容为原来的2倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {  // zero initial threshold signifies using defaults
    	// 初始化的时候，默认大小为 DEFAULT_INITIAL_CAPACITY = 1 << 4
        newCap = DEFAULT_INITIAL_CAPACITY;
        // 临界值 = (1 << 4) * 0.75 = 12
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // 此处是对之前存入的数据进行迁移
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // 原索引
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // 原索引 + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 原索引放到桶数组里
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 原索引 + oldCap放到桶数组里
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

3.2.4 get(Object key)

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

/**
 * Implements Map.get and related methods
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // 通过 (n - 1) & hash 计算出元素在tab中所在的位置
    if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
    	// 如果第一个就是对应的元素，就直接返回。
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 遍历该节点下的链表或红黑树，查找指定key的值。
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

小结：

计算key的hash值，并计算出key在table[]中的下标位置(index)。
判断table[index]第一个元素是否是要找的 Node。因为对于红黑树和链表这两种数据结构的非第一个元素的查找方式是不一样的。
判断table[index]第一个Node是否是TreeNode。
1. 是TreeNode，则按照红黑树的查找方式进行搜索。
2. 是普通的Node，则按照链表方式遍历查找。

3.2.5 tableSizeFor(int cap)

参考：浅析hash函数及tableSizeFor函数

public HashMap(int initialCapacity, float loadFactor) {
    // ....略...
    this.loadFactor = loadFactor;
    // 找到大于等于initialCapacity的最小的2的幂
    this.threshold = tableSizeFor(initialCapacity);
}
    
/**
   * Returns a power of two size for the given target capacity.
   */
  static final int tableSizeFor(int cap) {
      int n = cap - 1;
      n |= n >>> 1;
      n |= n >>> 2;
      n |= n >>> 4;
      n |= n >>> 8;
      n |= n >>> 16;
      return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
  }

当在实例化HashMap实例时，如果给定了initialCapacity，由于HashMap的capacity都是2的幂，因此这个方法用于找到大于等于initialCapacity的最小的2的幂（initialCapacity如果就是2的幂，则返回的还是这个数）。

3.2.6 `treeifyBin(Node<K,V>[] tab, int hash)`

/**
 * 将桶内所有的 链表节点 替换成 红黑树节点
 * Replaces all linked nodes in bin at index for given hash unless
 * table is too small, in which case resizes instead.
 */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    //如果当前哈希表为空，或者哈希表中元素的个数小于进行树形化的阈值(默认为64)，就去新建/扩容
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        // 如果哈希表中的元素个数超过了 树形化阈值，进行树形化
        // e 是哈希表中指定位置桶里的链表节点，从头结点开始
        TreeNode<K,V> hd = null, tl = null; // 红黑树的头、尾节点
        do {
            // 新建一个红黑树节点，内容和当前链表节点e一致
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)// 确定红黑树头节点
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        // 让桶的第一个元素指向新建的红黑树头结点，以后这个桶里的元素就是红黑树而不是链表了
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

四、小结

HashMap底层实现的数据结构是什么，JDK1.8对 HashMap 有什么优化？
数据结构：
- JDK1.7，数组 + 链表
- JDK1.8，数组 + 链表 + 红黑树
  数据插入方式：
- JDK1.7，头插
- JDK1.8，尾插
HashMap默认的初始长度是多少，为什么这么设置？
1. 初始化长度为16 (DEFAULT_INITIAL_CAPACITY = 1 << 4) 。
2. HashMap的容量大小始终为2的幂次方，主要是为了服务于从 Key 映射到 index 的 Hash算法。
3. index的计算公式为：index = hash(key) & (length - 1) ，Hash 算法最终得到的 index 结果，完全取决于 Key 的 Hashcode 值的最后几位。 参考：浅析hash()函数及tableSizeFor函数)
HashMap 是否支持key、value传入null？

支持，且会把key为null的数据保存在Table[0]的位置。
为什么将 HashMap 中几个变量都定义为transient (不支持序列化)。

将HashMap中几个变量都定义为transient (即不可序列化)，原因是因为hashCode的计算跟虚拟机相关，不同的虚拟机计算的hashCode值不同，所以如果HashMap支持序列化，则在不同虚拟机上无法反序列化出相同的数据。
高并发下，HashMap为何会出现死循环，如何产生的？

JDK1.7中，HashMap采用头插法添加数据，当并发操作时会出现死循环，且死循环只会出现在同一个table[index]链表中。
死循环产生的步骤：
如table[i]的链表为：A -> B -> C
Thread1：插入D
Thread2：扩容后头插元素，变成 C -> B -> A
此时Thread1继续插入(头插)元素，变成如下链表
Thread1：D -> A -> B
由上述线程1、2可知，A、B两个节点形成了环状。

会导致死循环的代码如下图所示。