Java数据结构：HashMap源码分析

无知的村民

已于 2023-03-28 20:15:10 修改

阅读量159

点赞数

分类专栏： Java数据结构文章标签： java 数据结构哈希算法

于 2023-03-26 21:48:00 首次发布

本文链接：https://blog.csdn.net/xie397361457/article/details/129785292

版权

Java数据结构专栏收录该内容

3 篇文章 0 订阅

订阅专栏

HashMap简介

一、内部结构

transient Node<K,V>[] table;

static class Node<K,V> implements Map.Entry<K,V> {
	final int hash;
	final K key;
	V value;
	Node<K,V> next;

	Node(int hash, K key, V value, Node<K,V> next) {
		this.hash = hash;
		this.key = key;
		this.value = value;
		this.next = next;
	}
}

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }
}

由上面源码可以看出 HashMap实际上是一个数组+链表+红黑树的结构，也就是散列表的结构

如图：
请添加图片描述

二、构造函数

    /**
     * Constructs an empty {@code HashMap} with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        //下一次扩容大小
        this.threshold = tableSizeFor(initialCapacity);
    }

构造函数主要就是赋值系数loadFactor和下次扩容大小threshold

那么为什么loadFactor默认值是0.75？

因为：

数据一开始是保存在数组里面的，当发生了Hash碰撞的时候，就是在这个数据节点上，生出一个链表，当链表长度达到一定长度的时候，就会把链表转化为红黑树。
当负载因子为1.0的时候，就会产生很多hash碰撞，红黑树就会变得异常复杂，查询数据的效率就会大大降低
当负载因子为0.5的时候，虽然可以减少hash碰撞，链表和红黑树也会变得简单，但是这样会增加数据的占用空间，原本1M的数据现在需要2M
复杂因子0.75也是为了权衡查询效率和空间利用率得出来的默认值

下次扩容大小tableSizeFor如下

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        //Integer.numberOfLeadingZeros :得到 32 - 二进制最高位数 ；eg ：参数5 转换为二进制 101；有两位，那么结果就等于32-3=29 
        //n = -1>>>29 = 7
        //整个连起来 得到 初始大小-1 数据的 二进制最高位的最大数据
        int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1);
        //最终结果在 n的基础上 +1，或者最大值
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

三、get数据

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(key)) == null ? null : e.value;
    }
    
    final Node<K,V> getNode(Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k;
        //使用 (tab.length-1) & (hash(key)) 得到数组index
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & (hash = hash(key))]) != null) {
            //判断first节点是否为查询的node
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                //区分节点是否为TreeNode
                if (first instanceof TreeNode)
                    //使用TreeNode自身的get函数
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    //遍历后续节点，拿到数据
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

整个get函数过程：

采用 (tab.length-1) & (hash(key)) 得到数组index 进行判断此index在数组tab中是否有数据
判断首节点是否为查询的Node
不是首节点，则判断是否是TreeNode，是TreeNode则使用getTreeNode查询数据，链表则使用链表遍历方法进行查询后续节点数据

时间复杂度：

由key获取hash值然后定位到首节点的时间复杂度都是O（1）
如果没有后续节点那么时间复杂度就是O（1）
如果后续节点为链表，则需要遍历链表，时间复杂度为O（n）
如果后续节点为红黑树，则查询数据时间复杂度为O（logn）

三、put数据

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    
    /**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            //当前没有数据，则需要初始化
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            //查询到数组index位置没有数据，则newNode
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                //首节点是否为需要修改的node
                e = p;
            else if (p instanceof TreeNode)
                //使用红黑树的方式put
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //使用链表方式put
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        //重要点：当链表长度 >= TREEIFY_THRESHOLD（8）的时候会变为红黑树treeNode
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            //超过下次扩容容量则需要扩容
            resize();
        afterNodeInsertion(evict);
        return null;
    }

整个put函数过程：

当前没有数据，则需要初始化容量
采用 (tab.length-1) & (hash(key)) 得到数组index 进行判断此index在数组tab中是否有数据，没有数据则new一个node数据
判断首节点是否为需要修改的node，如果是则修改
判断后续节点是否是红黑树，如果是则使用红黑树put函数
不为红黑树，则使用链表遍历查找是否是需要修改的node，如果是则修改，如果遍历玩置灰仍然没有则new一个node；

重要点：当链表长度 >= TREEIFY_THRESHOLD（8）的时候会变为红黑树treeNode

四、扩容机制

为什么hashmap 采用的是2倍扩容机制？

HashMap的容量为什么是2的n次幂，和这个(n - 1) & hash的计算方法有着千丝万缕的关系，符号&是按位与的计算，这是位运算，计算机能直接运算，特别高效，按位与&的计算方法是，只有当对应位置的数据都为1时，运算结果也为1，当HashMap的容量是2的n次幂时，(n-1)的2进制也就是1111111***111这样形式的，这样与添加元素的hash值进行位运算时，能够充分的散列，使得添加的元素均匀分布在HashMap的每个位置上，减少hash碰撞