HashMap底层实现分析

最新推荐文章于 2023-05-25 22:23:14 发布

lxxxxxt

最新推荐文章于 2023-05-25 22:23:14 发布

阅读量108

点赞数

分类专栏： Java 面经问题总结

本文链接：https://blog.csdn.net/qq_36281031/article/details/105066006

版权

Java 同时被 2 个专栏收录

47 篇文章 1 订阅

订阅专栏

面经问题总结

18 篇文章 0 订阅

订阅专栏

Map接口下，基于拉链式散列算法，线程不安全
数据结构：
HashMap结构中的常量、成员变量、构造方法：
HashMap-----数据结构、常量、成员变量、构造方法

1 底层实现源码分析

1.1 创建：

Map<String, Person> personMap = new HashMap<>(); 在堆内存开辟空间。

构造函数：四种，初始化一些变量（负载因子、阈值），底层数据结构（成员变量Node<K,V>[] table）延迟到插入键值对时初始化。

/** 构造方法 1 
设置默认负载因子*/
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/** 构造方法 2 
调用了构造方法3*/
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/** 构造方法 3 
设置初始容量、负载因子、阈值*/
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
	//返回大于或等于cap的最小的2的幂
	static final int tableSizeFor(int cap) {
	    int n = cap - 1;
	    n |= n >>> 1;
	    n |= n >>> 2;
	    n |= n >>> 4;
	    n |= n >>> 8;
	    n |= n >>> 16;
	    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
	}

/** 构造方法 4 
将另一个Map中的映射拷贝到自己的存储结构中*/
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

常用第一种，设置默认负载因子static final float DEFAULT_LOAD_FACTOR = 0.75f
如果预先知道大概的初始容量，建议手动设置参数，即采用第二、三中构造方法，为了避免频繁的扩容带来的开销。

成员变量transient Node<K,V>[] table，transient代表不会被序列化，默认为null。

	static class Node<K,V> implements Map.Entry<K,V> {
       final int hash;// 哈希值，存放元素到hashmap中时用来与其他元素hash值比较
       final K key;//键
       V value;//值
       // 指向下一个节点
       Node<K,V> next;
       Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }
        // 重写hashCode()方法
        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
        // 重写 equals() 方法
        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
}

1.2 插入键值对

put()方法，其中调用了putVal()，包括以下几个步骤：
- 当桶数组 table 为空时，通过扩容的方式初始化 table(默认16)
- 查找要插入的键值对是否已经存在，存在的话根据条件判断是否用新值替换旧值
- 如果不存在，则将键值对链入链表中(插入链表尾)，并根据链表长度决定是否将链表转为红黑树
- 判断键值对数量是否大于阈值，大于的话则进行扩容操作

put方法源码：

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 初始化桶数组 table，table 被延迟到插入新数据时再进行初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 如果桶中不包含键值对节点引用，则将新键值对节点的引用存入桶中即可
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // 如果键的值以及节点 hash 等于链表中的第一个键值对节点时，则将 e 指向该键值对（key相同，用新值代替旧值）
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
            
        // 如果桶中的引用类型为 TreeNode，则调用红黑树的插入方法
        else if (p instanceof TreeNode)  
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 对链表进行遍历，并统计链表长度
            for (int binCount = 0; ; ++binCount) {
                // 链表中不包含要插入的键值对节点时，则将该节点接在链表的最后
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 如果链表长度大于或等于树化阈值，则进行树化操作
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                
                // 条件为 true，表示当前链表包含要插入的键值对，终止遍历
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        
        // 判断要插入的键值对是否存在 HashMap 中
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // onlyIfAbsent 表示是否仅在 oldValue 为 null 的情况下更新键值对的值
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;   //成员变量，表示被结构化修改的次数
    // 键值对数量超过阈值时，则进行扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

其中：put()方法调用了putVal()方法，hash(key)是key运用hash算法的返回值。
```
public V put(K key, V value) {   
	return putVal(hash(key), key, value, false, true);
}

static final int hash(Object key) {
      int h;
      // key.hashCode()：返回散列值
      // ^ ：按位异或
      // >>>:无符号右移16位，忽略符号位，空位都以0补齐
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
  }
```
为什么通过位运算重新计算 hash？
计算桶位置时，用(n-1) & hash，因为hashMap 中桶数组的大小 length 总是2的幂，此时，(n - 1) & hash 等价于对 length 取余（位置不能超过桶长度），举例：hash = 185，n = 16，余数 = 9，这里的hash是由键的 hashCode 产生：
- 计算余数时，由于 n 比较小，hash 只有低4位参与了计算，高位的计算可以认为是无效的。这样导致了计算结果只与低位信息有关，高位数据没发挥作用。为了处理这个缺陷，可以上图中的 hash 高4位数据与低4位数据进行异或运算，即 hash ^ (hash >>> 4)。通过这种方式，让高位数据与低位数据进行异或，以此加大低位信息的随机性，变相的让高位数据参与到计算中。在 Java 中，hashCode 方法产生的 hash 是 int 类型，32 位宽。前16位为高位，后16位为低位，所以要右移16位。即：
- 除此之外，重新计算 hash 的另一个好处是可以增加 hash 的复杂度。当覆写 hashCode 方法时，可能会写出分布性不佳的 hashCode 方法，进而导致 hash 的冲突率比较高。通过移位和异或运算，可以让 hash 变得更复杂，进而影响 hash 的分布性。
当放入第一个元素时，会触发resize方法，初始化桶数组容量。即putVal()中：
```
if ((tab = table) == null || (n = tab.length) == 0)
	n = (tab = resize()).length;
```
当桶数组容量未初始化时，resize()方法中会执行newCap = DEFAULT_INITIAL_CAPACITY;
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 即默认初始化为16大小的Node数组。

然后执行下面的代码：

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

//可转化为
i = (n - 1) & hash;//hash是传过来的，其中n是底层数组的长度，用&运算符计算出i的值 
p = tab[i];//用计算出来的i的值作为下标从数组取中元素
if(p == null){//如果这个元素为null，用key,value构造一个Node对象放入数组下标为i的位置
	tab[i] = newNode(hash, key, value, null);
}

hash值是hash(key)，其中n是数组的长度，目前数组长度为16，不管这个hash的值是多少，经过(n - 1) & hash计算出来的i 的值一定在n-1之间。刚好是底层数组的合法下标，用i这个下标值去底层数组里去取值，如果为null，创建一个Node放到数组下标为i的位置。

此时的内存状态：
在这里插入图片描述

当出现hash值冲突时，如果key值不相同，并且链表长度没有超出8（树化阈值默认是8），执行下面的代码：
```
p.next = newNode(hash, key, value, null);
```
new一个新的Node对象并把当前Node的next引用指向该对象，也就是说原来该位置上只有一个元素对象，现在转成了单向链表。
如果链表长度超过树化阈值（static final int TREEIFY_THRESHOLD = 8;）时，将链表转化为红黑树来处理。
```
if (binCount >= TREEIFY_THRESHOLD - 1) 
      treeifyBin(tab, hash);//把链表转化为红黑树
```
为了防止哈希碰撞攻击，提高map的效率，JDK1.8中加入了红黑树，当链表链长度为8时，转成红黑树。

1.3 查找键值对

首先定位键值对所在的桶的位置，然后再对链表或红黑树进行查找。get()方法源码如下：

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //定位键值对所在桶的位置
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            // 如果是 TreeNode 类型，则调用黑红树查找方法
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                
            //在链表中查找
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

first = tab[(n - 1) & hash] 即桶在数组中的位置。

1.4 遍历

遍历 hashMap一般都会用以下方式：

for(Object key : map.keySet()) {
    // do something
}
//编译时会转换成用迭代器遍历，等价于
Set keys = map.keySet();
Iterator ite = keys.iterator();
while (ite.hasNext()) {
    Object key = ite.next();
    // do something
}

for(HashMap.Entry entry : map.entrySet()) {
    // do something
}

在遍历 HashMap 的过程中，多次对 HashMap 进行遍历时，遍历结果顺序都是一致的。但这个顺序和插入的顺序一般都是不一致的。
遍历所有的键时，首先要获取键集合KeySet对象，然后再通过 KeySet 的迭代器KeyIterator进行遍历。KeyIterator 类继承自HashIterator类，核心逻辑也封装在 HashIterator 类中。HashIterator 的逻辑并不复杂，在初始化时，HashIterator 先从桶数组中找到包含链表节点引用的桶。然后对这个桶指向的链表进行遍历。遍历完成后，再继续寻找下一个包含链表节点引用的桶，找到继续遍历。找不到，则结束遍历。

源码：

public Set<K> keySet() {
    Set<K> ks = keySet;
    if (ks == null) {
        ks = new KeySet();
        keySet = ks;
    }
    return ks;
}

/**
 * 键集合
 */
final class KeySet extends AbstractSet<K> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<K> iterator()     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false, true) != null;
    }
    // 省略部分代码
}

/**
 * 键迭代器
 */
final class KeyIterator extends HashIterator implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    HashIterator() {
        expectedModCount = modCount;
        Node<K,V>[] t = table;
        current = next = null;
        index = 0;
        if (t != null && size > 0) { // advance to first entry 
            // 寻找第一个包含链表节点引用的桶
            do {} while (index < t.length && (next = t[index++]) == null);
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Node<K,V> nextNode() {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
        if ((next = (current = e).next) == null && (t = table) != null) {
            // 寻找下一个包含链表节点引用的桶
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }
    //省略部分代码
}

转换为红黑树的具体过程？

链表第一个节点为根，然后遍历链表，根据hash值放到树的左右子节点。
对于被转换成红黑树的链表该如何遍历呢？

1.5 删除

第一步定位桶位置，第二步遍历链表并找到键值相等的节点，第三步删除节点。

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        // 1. 定位桶位置
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        // 如果键的值与链表第一个节点相等，则将 node 指向该节点
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {  
            // 如果是 TreeNode 类型，调用红黑树的查找逻辑定位待删除节点
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                // 2. 遍历链表，找到待删除节点
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        
        // 3. 删除节点，并修复链表或红黑树
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

2 扩容

这篇太长了，写到另一篇
HashMap底层实现–扩容

3 transient 所修饰 table 变量

transient 表示易变的意思，在 Java 中，被该关键字修饰的变量不会被默认的序列化机制序列化。
考虑一个问题：桶数组 table 是 HashMap 底层重要的数据结构，不序列化的话，别人还怎么还原呢？

HashMap 并没有使用默认的序列化机制，而是通过实现readObject/writeObject两个方法自定义了序列化的内容。这样做是有原因的，试问一句，HashMap 中存储的内容是什么？不用说，大家也知道是键值对。所以只要我们把键值对序列化了，我们就可以根据键值对数据重建 HashMap。有的朋友可能会想，序列化 table 不是可以一步到位，后面直接还原不就行了吗？这样一想，倒也是合理。但序列化 talbe 存在着两个问题：

table 多数情况下是无法被存满的，序列化未使用的部分，浪费空间
同一个键值对在不同 JVM 下，所处的桶位置可能是不同的，在不同的 JVM 下反序列化 table 可能会发生错误。
以上两个问题中，第一个问题比较好理解，第二个问题解释一下。HashMap 的get/put/remove等方法第一步就是根据 hash 找到键所在的桶位置，但如果键没有覆写 hashCode 方法，计算 hash 时最终调用 Object 中的 hashCode 方法。但 Object 中的 hashCode 方法是 native 型的，不同的 JVM 下，可能会有不同的实现，产生的 hash 可能也是不一样的。也就是说同一个键在不同平台下可能会产生不同的 hash，此时再对在同一个 table 继续操作，就会出现问题。

4 为什么线程不安全

扩容时出现环形数据结构、值覆盖等。
HashMap为什么线程不安全

【参考文档】
HashMap底层实现原理（上）
HashMap底层实现原理（下）
HashMap 源码详细分析(JDK1.8)
HashMap-----数据结构、常量、成员变量、构造方法

lxxxxxt

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap底层实现分析

扩容时出现环形数据结构https://blog.csdn.net/qq_21993785/article/details/80384250值覆盖https://www.cnblogs.com/lchzls/p/6714689.html https://www.cnblogs.com/lchzls/p/6714689.htmlhttps://www.cnblogs.com/lonel...
复制链接

扫一扫

专栏目录