HashMap源码分析

最新推荐文章于 2024-07-13 13:55:40 发布

78KgMiao

最新推荐文章于 2024-07-13 13:55:40 发布

阅读量102

点赞数 1

分类专栏： Java 文章标签： hashmap java

本文链接：https://blog.csdn.net/weixin_44105483/article/details/109986828

版权

Java 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

java运算符

& 与运算符：

1&1=1
1&0=0
0&1=0
0&0=0

都为真则为真

| 或运算符：

1|1=1
1|0=1
0|1=1
0|0=0

有一个为真则为真

^ 异或运算符：

1^1=0
1^0=1
0^1=1
0^0=0

一样则为假，不一样则为真

>> 右移运算

16>>2=16/2/2=4
32>>2=32/2/2=8
-32>>3=-32/2/2/2=-4

二进制中的数值整体向右移动，缺省位置用0填充，符号位不变。

>>>：无符号右移

16>>>2=16/2/2=4
32>>>2=32/2/2=8
-2>>1=2147483647

与>>区别在于负数的运算，如果是负数，缺省位置用1填充。

HashMap数据结构

默认容量

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

最大容量

static final int MAXIMUM_CAPACITY = 1 << 30;

扩容阈值

int threshold;// 如果初始化Map时自定义容量，会计算该值=大于自定义容量的最近的2的次方。cap=100,threshold=128

默认加载因子

static final float DEFAULT_LOAD_FACTOR = 0.75f;

加载因子

final float loadFactor;

由Node转为TreeNode的阈值

static final int TREEIFY_THRESHOLD = 8;

当map中的key的hash相同个数超过8时，会由Node转为TreeNode。

由TreeNode转为Node的阈值

static final int UNTREEIFY_THRESHOLD = 6;

同上，反过来。

数据

Node

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

TreeNode：红黑树

    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }
    }

存储：使用数组存储Node节点

transient Node<K,V>[] table;
transient Set<Map.Entry<K,V>> entrySet;// 不会存储数据，通过内部方法操作table数据。

hash 计算原理

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

上面代码就是HashMap计算hash值原理，对hash值又进一步的计算，主要是对后面的个十百千万进行了异或运算，更进一步避免哈希冲突吧。

以下是String类重写的hashCode()方法。

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

加入我们使用String类型作为Key。哈希值最大为2147483647
HashMap计算结果：Integer.MAX_VALUE ^ (Integer.MAX_VALUE >>> 16)=2147450880
由此可以看出只是对万位及一下的数值进行了修改。

get 取值原理

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {// 通过hash与容量计算出索引位置，获取索引的数据。
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))// 判断第一个节点的key跟查询的key相同。比较hash和索引或者hash和equlas，这就是为什么重写hashcode就需要重写equals方法了。
                return first;
            if ((e = first.next) != null) {// 不相同就遍历该节点之后的所有节点
                if (first instanceof TreeNode)// 如果是TreeNode就走TreeNode的方法
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&// 判断节点的key是否跟查询key相同。
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);// 知道节点的下一个节点不存在停止遍历，说明真没有，就返回null。
            }
        }
        return null;
    }

上面代码核心点：tab[(n - 1) & hash]=>从数组中获取索引值为[(数组长度-1) & 哈希值]的值
因为&特征是都为真才为真，所以(n - 1) & hash计算结果不会超过数组长度。不会发生数组下边越界问题。

综上所述：map查询值是非常快的，只需要通过hash和数组容量计算索引位置即可。如果存在hash冲突，那么就通过next属性遍历拥有相同hash值的节点（最多8个）。超过8个就会转为TreeNode。通过二分法查找。

为什么会这么取值呢？看接下来的存值原理。

put 存值原理

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)// 通过hash与容量计算将要存放的索引
            tab[i] = newNode(hash, key, value, null);// 如果当前索引没有值，那么存储Node
        else {// 如果当前索引已经有值，说明hash可能冲突了。
            Node<K,V> e; K k;
            if (p.hash == hash &&// 如果hash值相等，并且引用相等或者equals为true。
                ((k = p.key) == key || (key != null && key.equals(k))))// 这就是为什么重写hashCode就要重写equals的原因。
                e = p;
            else if (p instanceof TreeNode)// 如果是数节点
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {// hash冲突了。
                for (int binCount = 0; ; ++binCount) {// 遍历冲突值
                    if ((e = p.next) == null) {// 如果已经存在的节点.next不存在
                        p.next = newNode(hash, key, value, null);// 就将put的数据作为p.next节点
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st// 如果冲突超过TREEIFY_THRESHOLD = 8 - 1 那么转为树结构存储。
                            treeifyBin(tab, hash);// 转换为树结构存储
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))// 判断是否同一个key
                        break;
                    p = e;// 继续判断下一个节点
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)// 如果是调用的putIfAbsent()并且oldValue!=null则不会替换。
                    e.value = value;// 替换值
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)// 如果大小超出了阈值（容量*加载因子）那么就扩容
            resize();
        afterNodeInsertion(evict);
        return null;
    }

调用put存一个key-value。
(n - 1) & hash根据key的hash值&容量-1，来计算key将要存放的索引。
判断当前的索引是否为空
- 如果为空：说明没有值，直接创建新的Node放到该索引位置。
- 如果不为空：说明该索引位置已经有值了。判断已存在的key和要存放的key是否相同。
  - 如果相同：key的hash以及引用或者equals都相同，说明是同一个key。替换为新的value。
  - 如果不同：说明hash冲突了。就需要遍历拥有相同hash的节点。判断是否有相同的key。
    - 如果存在相同的key，那么替换为新的value。
    - 如果不存在相同的key，那么就创建新的Node。判断相同的个数是否超过TREEIFY_THRESHOLD - 1。
      - 如果超过：那么将这些重复的hash的节点转为TreeNode。
      - 如果没有超过：那么将最后一个重复节点的next属性指向新添加的Node。

综上所述：hash冲突并不会占用数组的索引位置，而是在已存在的索引位置标记next属性。

resize 扩容原理

计算下次扩容的阈值

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

上述代码用于初始化时如果自定义容量，那么计算阈值。默认threshold=0。

初始化代码：可以自定义初始化容量和加载因子。

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

通过无参构造map：loadFactor=0.75 threshold=0
自定义构造map：
loadFactor = loadFactor
threshold = tableSizeFor(initialCapacity) 0->1;3>4;5>8;100>128。大于容量的最近的2的n次方值。

重新修改大小

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;// 声明旧数组
        int oldCap = (oldTab == null) ? 0 : oldTab.length;// 声明就容量，如果没数据则为0，否则为数组长度。
        int oldThr = threshold;// 声明旧阈值
        int newCap, newThr = 0;// 声明新容量=0,新阈值=0。
        if (oldCap > 0) {// 如果旧容量大于0，说明map已经添加了值。
            if (oldCap >= MAXIMUM_CAPACITY) {// 如果大于最大容量
                threshold = Integer.MAX_VALUE;// 阈值为最大容量
                return oldTab;// 返回旧的数组
            } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)// 如果新容量=旧容量*2 小于最大容量，并且旧容量大于等于默认容量16
                newThr = oldThr << 1; // double threshold // 新阈值 = 旧阈值 * 2
        } else if (oldThr > 0) { // 说明初始化时自定义了map容量，oldThr就是初始化容量最近的2的n次方值。
            newCap = oldThr;// 如果就阈值大于0，那么新阈值等于旧阈值
        } else {// 说明通过无参构造初始化的map。新容量=默认容量，新阈值=默认加载因子*默认容量
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {// 如果自定义了容量，会走一次这个代码。设置阈值=容量*加载因子
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];// 创建新容量的数组
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {// 遍历旧数组
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {// 如果不为空
                    oldTab[j] = null;// 之前索引设置为null，方便垃圾回收
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;// 如果没有hash冲突元素，那么重新根据hash计算索引位置并存放。
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order 有hash冲突，重新分配索引。
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;// 获取下一个节点
                            if ((e.hash & oldCap) == 0) {// 如果索引位置为0
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            } else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

综上所述：不管是否初始化容量，容量永远是2的n次方。阈值=容量*加载因子。
并且每次扩容都是2倍扩容。
每次扩容都会根据hash重新计算索引。

为什么默认加载因子为0.75，为了达到一个空间和冲突率的平衡吧。

遍历

    abstract class HashIterator {
        Node<K,V> next;        // next entry to return
        Node<K,V> current;     // current entry
        int expectedModCount;  // for fast-fail
        int index;             // current slot

        HashIterator() {// 初始化迭代器
            expectedModCount = modCount;// 用于fast-fail，如果再迭代期间发生modCount不一致情况直接抛异常。
            Node<K,V>[] t = table;
            current = next = null;// 当前为空
            index = 0;
            if (t != null && size > 0) { // advance to first entry
            	// 遍历数组，找到第一个节点不为空的索引。
                do {} while (index < t.length && (next = t[index++]) == null);
            }
        }
		// 通过next是否为空判断是否有下一个。
        public final boolean hasNext() {
            return next != null;
        }
		// 获取下一个节点
        final Node<K,V> nextNode() {
            Node<K,V>[] t;
            Node<K,V> e = next;
            if (modCount != expectedModCount)// 如果数据发生改变，抛异常
                throw new ConcurrentModificationException();
            if (e == null)// e为空则抛异常
                throw new NoSuchElementException();
            // 这一行代码用来寻找下一个节点，可能是下一个索引上，可能是当前索引上的hash冲突节点。
            if ((next = (current = e).next) == null && (t = table) != null) {
            	// 当node.next==null时，即当前索引位置的hash冲突读取完了，才会找下一个不为空的索引。
                do {} while (index < t.length && (next = t[index++]) == null);
            }
            return e;// 返回当前节点
        }

        public final void remove() {
            Node<K,V> p = current;
            if (p == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            current = null;
            K key = p.key;
            removeNode(hash(key), key, null, false, false);
            expectedModCount = modCount;
        }
    }

    final class KeyIterator extends HashIterator
        implements Iterator<K> {
        public final K next() { return nextNode().key; }
    }

    final class ValueIterator extends HashIterator
        implements Iterator<V> {
        public final V next() { return nextNode().value; }
    }

    final class EntryIterator extends HashIterator
        implements Iterator<Map.Entry<K,V>> {
        public final Map.Entry<K,V> next() { return nextNode(); }
    }

由上述代码可以得知：key遍历，value遍历，entry遍历都是HashIterator的子类。所以核心代码就在HashIterator中。
主要的是nextNode方法，可以多看几遍，这个方法返回的是当前的节点，寻找的是下一个节点。
如果找不到下一个节点了，那么就没有元素了。hasNode()方法也会返回false。
如果继续调用nextNode()，那么会抛出throw new NoSuchElementException();异常

关于TreeNode感兴趣的可以去看源码。

78KgMiao

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap源码分析

文章目录java运算符& 与运算符：| 或运算符：^ 异或运算符：>> 右移运算>>>：无符号右移HashMap数据结构默认容量最大容量扩容阈值默认加载因子加载因子由Node转为TreeNode的阈值由TreeNode转为Node的阈值数据存储：使用数组存储Node节点hash 计算原理get 取值原理put 存值原理resize 扩容原理遍历java运算符& 与运算符：1&1=11&0=00&1=00&0=0都
复制链接

扫一扫