HashMap源码解析

cblstc

于 2020-08-13 11:45:09 发布

阅读量203

点赞数

分类专栏： Java

本文链接：https://blog.csdn.net/cblstc/article/details/107976413

版权

Java 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

文章目录

前言

姗姗来迟的HashMap总结。HashMap有多重要自不必多说，基本算是必问的题目。本人之前断断续续的读了HashMap的源码，也是老了记性差，所以记录下来，方便以后回顾。这次重读HashMap源码，第一感觉就是，源码思路就是那么水到渠成，非常符合程序员的思维习惯。以至于，就算你不看源码，你也能大概说出部分功能的实现流程。好了，废话不多说，进入正题。

由于不同版本jdk的HashMap实现不同，本文暂以1.7的HashMap为例，分析HashMap的实现原理。

Hash解析

数据结构

数组+链表

用数组存储数据，数组的元素是Entry节点类型，它本身是一个链表结构。

构造函数

HashMap构造函数需要传递初始容量和负载因子参数。如果不传递，默认为16和0.75。

初始容量：通俗理解就是数组的长度

负载因子：我的理解就是一个负载率，比如容量是16，负载因子为0.75。如果元素有12个的时候，负载率为75%，超过这个负载率就需要扩容了。

public HashMap() {
    // 默认容量16，负载因子0.75
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        // 大于最大容量，肯定不行，直接等于最大容量
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity; // 阈值=容量
    init();
}

Hash算法

HashMap取key的hashcode值再进行一次hash运算，这个运算叫做"扰动函数"（网上也是这么说的），就是为了避免hash取模后分配不均匀。不同版本的hash算法还不太一样。

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

定位数组的下标

根据hash值和数组长度定位数组的下标，数组长度必须为2的n次。算法为hash & (length -1)，等价于hash%length，即对数组长度取模。为什么采用&的写法，因为位运算高效。个人对这个算法是这么理解的：

因为数组长度是2的n次，二进制表示为10000…0000，2的n次-1即01111…11111。

这个(length-1)就相当于一个"低位掩码"（网上是这么说的），hash&(length-1)即丢弃高位，取低位。(这块不好解释清楚，只可意会)。

static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

获取元素

传递key参数获取元素。如果key=null，那么到数组下标为0的位置去遍历链表，直到找到为止。如果key!=null，那么就先计算key的hash值，通过hash值找到数组的下标，最后遍历数组，直到找到为止。

// 获取元素
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

// key=null时，特殊查找
private V getForNullKey() {
    if (size == 0) {
        return null;
    }
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null)
            return e.value;
    }
    return null;
}

// 根据key查找
final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    int hash = (key == null) ? 0 : hash(key); // 计算hash值
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        // 找到数组元素，遍历链表。
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            // 找到了返回
            return e;
    }
    return null;
}

添加元素

添加元素，如果key=null，那么遍历数组下标为0的链表，如果已经存在，那么覆盖，如果不存在，就把当前元素的下一个节点指向链表的头节点。这种做法叫做头插法。如果key!=null，那么还是计算hash值，找到数组下标。遍历链表，如果已经存在，就覆盖，如果不存在，就采用头插法插入。

如果元素个数大于阈值，那么就需要对数组进行扩容，一般扩容为数组的两倍。扩容操作我放后面和HashMap的并发安全一起讲。

// 添加元素
public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null) // 这里说明key是可以为空的
        return putForNullKey(value);
    int hash = hash(key); // 计算hash
    int i = indexFor(hash, table.length); // 通过hash找到元素存放位置，即数组下标
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        // 通过数组下标，遍历链表
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            // 如果存在key，那么直接覆盖，并返回旧值。
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    // 走到这里说明，是新增元素
    modCount++; // 修改次数+1
    addEntry(hash, key, value, i); // 新增元素
    return null;
}

// key=null时的添加操作，永远放在数组的第一个元素
private V putForNullKey(V value) {
   
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        // 如果已经存在key=null的元素，覆盖之
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}

// 添加元素操作
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        // 元素个数大于阈值，扩容为当前数组的2倍
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

// 创建元素
void createEntry(int hash, K key, V value, int bucketIndex) {
    // 在头部插入元素
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

删除元素

删除元素就是计算key的hash值，找到数组下标，然后遍历链表，找到要删除的元素，把前一个元素的下一个节点指向待删除元素的下一个节点，典型的链表元素删除。

// 移除，并返回移除的元素
public V remove(Object key) {
    Entry<K,V> e = removeEntryForKey(key);
    return (e == null ? null : e.value);
}

// 移除
final Entry<K,V> removeEntryForKey(Object key) {
    if (size == 0) {
        return null;
    }
    int hash = (key == null) ? 0 : hash(key); // 计算hash值
    int i = indexFor(hash, table.length); // 找到在table的存储位置
    Entry<K,V> prev = table[i];
    Entry<K,V> e = prev;

    while (e != null) {
        Entry<K,V> next = e.next;
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k)))) {
            // 如果找到元素
            modCount++; // 修改数+1
            size--; // 元素个数-1
            if (prev == e)
                // 表示链表的第一个元素就是要删除的对象，直接移除
                table[i] = next;
            else
                // A->B->C，移除B，变成A->C
                prev.next = next;
            e.recordRemoval(this);
            return e;
        }
        prev = e;
        e = next;
    }

    return e;
}

扩容和并发安全问题

在put操作时，如果元素大小超过阈值，就会导致HashMap的扩容，一般扩大为原来数组的2倍。主要流程是，创建一个新数组，遍历旧数组的元素，如果元素是链表，也需要遍历链表，然后重新计算key的hash值，找到该元素在新数组的位置，完成插入操作。

// @param newCapacity 扩容后的数组容量
void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity)); // 扩容方法
    table = newTable;
    // 新的阈值为新容量*负载因子，如果这个数大于最大容量+1，那么阈值就为最大容量+1
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

// 扩容方法
// @param newTable 扩容后的数组
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length; // 旧的数组
    
    /*
     * 遍历旧数组所有元素，并重新计算hash值，找到新数组的存放位置
     */
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            // 重新计算hash值
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            // 找到元素在新数组的存放位置
            int i = indexFor(e.hash, newCapacity);
            // 采用头插法生成链表。
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

注意：HashMap在并发场景下会形成环形链表，导致get的时候死循环的问题。发生该问题的原因是：多个线程同时执行扩容操作，并对新数组中的元素进行修改。

我们假设旧数组为oldTable[2]（长度为2），oldTable[1]元素为3->5->7形式的链表结构。

当我们执行扩容方法时，创建一个新数组newTable[4]，通过hash可知newTable[1]=5，newTable[3]=7->3（扩容采用头插法，所以顺序会改变）。这是在单线程的场景下，没问题。

但是如果是多线程场景，假设有两个线程同时执行扩容方法。A线程在执行扩容方法的时候挂起，B线程完成了扩容操作。此时newTable[3]=7->3。接着A线程继续执行扩容操作，newTable[3]=7->3->7->3，这是个什么东西？循环链表，这就是问题所在。

1.8和1.7的区别

数据结构

数组+链表+红黑树，如果链表长度到达阈值8，转为红黑树；红黑树深度阈值到达6，转为链表。

Hash算法

1.8的hash算法少了很多运算

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

获取元素

由于1.8引入了红黑树，所以在查询的时候，会先判断是否是红黑树，如果是，就用红黑树的查找元素的方法；否则，按照链表的方式查找。

if (first instanceof TreeNode)
    // 如果是红黑树
    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
    // 否则是链表
    if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        return e;
} while ((e = e.next) != null);

插入元素

元素的插入，如果是链表，那么采用尾插法插入，如果链表长度达到阈值8，那么转为红黑树；如果是红黑树，那么采用红黑树插入。

if (p.hash == hash &&
    ((k = p.key) == key || (key != null && key.equals(k))))
    e = p;
else if (p instanceof TreeNode)
    // 如果是红黑树，采用红黑树的方式插入
    e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
    for (int binCount = 0; ; ++binCount) {
        // 遍历链表
        if ((e = p.next) == null) {
            p.next = newNode(hash, key, value, null); // 尾插法
            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                // 链表长度到达阈值8，转为红黑树
                treeifyBin(tab, hash);
            break;
        }
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            break;
        p = e;
    }
}

移除元素

如果是红黑树，那么根据元素查找红黑树的节点，并且采用红黑树的方式删除；如果是链表，那么查找链表中的节点，如果找到，就用链表的形式删除，即A->B->C，删除B变成A->C。

总结

本文对jdk1.7的HashMap源码进行解析，并对比1.7和1.8的HashMap的大概区别，红黑树的知识没有补充，实为才疏学浅之故，有精力去研究研究。后续会带来其他jdk集合类的解析。

cblstc

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap源码解析

文章目录前言Hash解析数据结构构造函数Hash算法定位数组的下标获取元素添加元素删除元素扩容和并发安全问题## 总结前言姗姗来迟的HashMap总结。HashMap有多重要自不必多说，基本算是必问的题目。本人之前断断续续的读了HashMap的源码，也是老了记性差，所以记录下来，方便以后回顾。这次重读HashMap源码，第一感觉就是，源码思路就是那么水到渠成，非常符合程序员的思维习惯。以至于，就算你不看源码，你也能大概说出部分功能的实现流程。好了，废话不多说，进入正题。由于不同版本jdk的HashMa
复制链接

扫一扫

专栏目录