HashMap

右眸Remnant

已于 2022-04-20 11:51:22 修改

阅读量559

点赞数 1

文章标签：链表数据结构 java

于 2022-03-10 12:09:15 首次发布

本文链接：https://blog.csdn.net/qq_45888932/article/details/123396438

版权

1. HashMap中的成员变量：

1.1. 为什么HashMap的阈值是0.75，为什么JDK1.8后链表切换红黑树长度为8

6. JDK 1.7中HashMap线程不安全

提在前面的结论：

当table > 64 链表大于等于8 发生树化，table < 64,时，使用put添加元素，如果table>阈值且index下标位置上不等于空，进行table的2倍扩容；默认负载因子0.75

HashMap在jdk1.8之后引入了红黑树的概念，表示若桶中链表元素超过8时，会自动转化成红黑树；若桶中元素小于等于6时，树结构还原成链表形式。

概述

Map是key-value形式的键值对，具体实现类包括：HashMap， TreeMap， Hashtable, LinkedHashMap等

其中：

1. HashMap中的Key无序唯一，底层使用hash算法，数据结构：数组 + 链表；HashMap中的key可以为null，放到数组0位置，唯一，线程不安全

2. Hashtable，线程安全，使用synchronize修饰，效率较低。

3. LinkedHashMap，是HashMap的子类，对于HashMap中的元素无序，即遍历顺序和取出顺序不同；而LinkedHashMap使用链表能够保证输出顺序一致。

4. TreeMap，底层使用红黑树实现，有序，可自定义比较器

1. HashMap中的成员变量：

    //  默认初始容量16，必须是2的幂次方
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 

    static final int MAXIMUM_CAPACITY = 1 << 30;  // 最大容量

    static final float DEFAULT_LOAD_FACTOR = 0.75f;  // 默认负载因子

    // 当链表长度为8转换成红黑树； 至少为8,
    static final int TREEIFY_THRESHOLD = 8; 

     // 当元素个数小于6从红黑树转换成链表
    static final int UNTREEIFY_THRESHOLD = 6;

    // 树化最小数组容量，执行流程中只有数组容量 > 64 链表长度 > 8发生树化
    static final int MIN_TREEIFY_CAPACITY = 64;

1.1. 为什么HashMap的阈值是0.75，为什么JDK1.8后链表切换红黑树长度为8

查看源码，HashMap上面有一段注释：

* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins.  In
* usages with well-distributed user hashCodes, tree bins are
* rarely used.  Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0:    0.60653066
* 1:    0.30326533
* 2:    0.07581633
* 3:    0.01263606
* 4:    0.00157952
* 5:    0.00015795
* 6:    0.00001316
* 7:    0.00000094
* 8:    0.00000006
树的结点占用空间是常规节点的2倍，因此在存储初期使用链表的空间消耗要小于红黑树；

在理想哈希分布下，结点中数据的个数呈泊松分布，每个数组结点中的数据个数应该是均匀的，并且对于每个结点达到长度8的概率极低

但是为了避免较差的哈希计算（比如用户自己修改了参数，打破理想情况）因此设定超过8转换红黑树，从O（n）的查询变为O（logn）

还有一种说法是根据平均查找长度：

平均查找长度：

顺序查找：

从表的一端开始，顺序扫描线性表，依次将扫描到的节点关键字和给定值k相比较。

等概率条件下...平均查找长度：ASL = (n+....+2+1)/n= (n+1)/2。

原因：

　　红黑树的平均查找长度是log(n)，长度为8，查找长度为log(8)=3，链表的平均查找长度为n/2，当长度为8时，平均查找长度为8/2=4，这才有转换成树的必要；链表长度如果是小于等于6，6/2=3，虽然速度也很快的，但是转化为树结构和生成树的时间并不会太短。

选择6和8的原因：

避免频繁插入删除在 len = 8 徘徊，频繁切换

2. HashMap结构

// HashMap中以Node数组作为主体
transient Node<K,V>[] table; 

// JDK1.8后采用Node结点，本质也是Map.Entry
static class Node<K,V> implements Map.Entry<K,V>

// Node中的成员变量
final int hash;
final K key;
V value;
Node<K,V> next;  // 用于构建链表的next

2.1 其他的成员变量

// 记录Map中的实际元素个数
transient int size;

// 当修改map结构的时候，使用put， remove会修改这个字段
// 这个字段会出现在fast-fail机制中，用于避免并发错误
transient int modCount;

// 阈值 = capacity * load factor
int threshold;

3. HashMap中的成员方法

JDK1.8中，对于Node的初始化是在第一次put的时候完成，采用懒加载的方式

3.1 构造方法

默认无参构造：初始容量16，加载因子0.75

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
}

有参构造，提供容量：

public HashMap(int initialCapacity, float loadFactor) {
	if (initialCapacity < 0)
		throw new IllegalArgumentException("Illegal initial capacity: " +
										   initialCapacity);
	if (initialCapacity > MAXIMUM_CAPACITY)
		initialCapacity = MAXIMUM_CAPACITY;
	if (loadFactor <= 0 || Float.isNaN(loadFactor))
		throw new IllegalArgumentException("Illegal load factor: " +
										   loadFactor);
	this.loadFactor = loadFactor;
	// 调整HashMap容量大小，保证为2的幂次方
	this.threshold = tableSizeFor(initialCapacity);
}

// 计算距离cap最近的2的幂次方
static final int tableSizeFor(int cap) {
	// 首先将容量减1，避免给定参数是16等2的幂次方
	int n = cap - 1;
	// 通过 无符号右移，或运算调整给定参数，将末尾参数转变为全1，如00001111
	// 最后一行调整： n + 1， 变为 00010000
	n |= n >>> 1;
	n |= n >>> 2;
	n |= n >>> 4;
	n |= n >>> 8;
	n |= n >>> 16;
	return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

3.2 put操作

3.2.1 put的执行流程

对于给定的key-value，首先根据hash算法根据key和数组容量计算出数组中的索引下标。

1. 如果索引位置为空，直接存入。

2. 如果索引位置不为空，说明产生hash冲突，通过equals比较链表中的元素。如果不存在添加在链表尾部（JDK1.8中采用尾插法）；否则，覆盖原来的元素value

3. 当插入后如果链表长度大于8并且数组中容量大于64，进行红黑树的转换；如果数组容量小于64,那么进行2倍扩容。

3.2.2 Hash算法

static final int hash(Object key) {
	int h;
	return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

进行高位异或的原因：可以看到hash()上面的注释，HashMap中的table采用二进制掩码，如果只使用低位2进制发生哈希冲突的概率会很高，并且这样做会导致高位的hash完全没有用处。

所以进行哈希值右移16，将高位也参与到运算当中；而且由于采用2进制掩码，之后计算索引的结果完全取决于计算后的h。看下面这句话：

索引计算：由于数组容量为2幂次大小，减一后末尾均为1，进行&运算的结果完全取决于hash

i = (n - 1) & hash]

3.2.3 put()源码

核心是putVal方法

 public V put(K key, V value) {
     return putVal(hash(key), key, value, false, true);
}

中间会用到resize()方法，主要用于调整map的大小，先给出懒加载部分代码：

// 懒加载：默认容量16，阈值=16*0.75 = 12
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

// onlyIfAbsent含义：if true, don't change existing value
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
			   boolean evict) {
	Node<K,V>[] tab; Node<K,V> p; int n, i;
	// jdk1.8中采用懒加载的方式，在第一次put的时候进行初始化，看resize()方法
	if ((tab = table) == null || (n = tab.length) == 0)
		n = (tab = resize()).length;
	// 计算key在数组中的索引，创建Node结点放入链表
	if ((p = tab[i = (n - 1) & hash]) == null)  
		tab[i] = newNode(hash, key, value, null);
	// 发生hash冲突
	else {
		Node<K,V> e; K k;
		// 情况1：发生冲突，但是key在存在头结点，直接覆盖原来的value
		if (p.hash == hash &&
			((k = p.key) == key || (key != null && key.equals(k))))
			e = p;
		// 情况2：如果是红黑树的结点，执行相应的插入操作
		else if (p instanceof TreeNode)
			e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
		// 情况3：不存在，使用尾插法插入到链表中
		else {
			// 遍历链表
			for (int binCount = 0; ; ++binCount) {
				// 到达链表尾部,插入新结点
				if ((e = p.next) == null) {
					p.next = newNode(hash, key, value, null);
					// TREEIFY_THRESHOLD = 8， 当插入元素大于等于8，进行树化(可以点击去看还需要满足数组>64)
					if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
						treeifyBin(tab, hash);
					break;
				}
				// 访问链表的过程中发现equals，直接退出
				if (e.hash == hash &&
					((k = e.key) == key || (key != null && key.equals(k))))
					break;
				p = e;
			}
		}
		// 使用拉链法解决hash冲突，将新值覆盖旧值
		if (e != null) { // existing mapping for key
			V oldValue = e.value;
			if (!onlyIfAbsent || oldValue == null)
				e.value = value;
			afterNodeAccess(e);
			return oldValue;
		}
	}
	// 处理fast-fail
	++modCount;
	// 超过阈值进行扩容
	if (++size > threshold)
		resize();
	afterNodeInsertion(evict);
	return null;
}

If the map previously contained a mapping for the key, the old value is replaced.
当映射后的值相同，旧值被替换；Set中好像是抛弃

总结：put执行流程

3.2.4 resize方法（）

Initializes or doubles table size.  If null, allocates in accord with initial capacity target held in field threshold. Otherwise, because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move
with a power of two offset in the new table.
用来初始化或进行二倍扩容；如果为空，用初始的容量进行分配（上面说过了）。由于我们使用二倍扩容，每一个元素要么保持在原地，要么为 oldIndex + capacity

上面这段话意思：（扩容后，元素位置最多影响一位，判断这个位置是0还是1）

resize()方法较长，拆成两块，总体代码是连续的。

第一阶段是处理初始化和计算新表容量的部分：

Node<K,V>[] oldTab = table;
// 获取旧map的容量和阈值
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
	// 如果旧表容量已经大于最大值了，直接赋值为Integer_MAX_VALUE
	if (oldCap >= MAXIMUM_CAPACITY) {
		threshold = Integer.MAX_VALUE;
		return oldTab;
	}
	// 如果能扩容变为原来的2倍
	else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
			 oldCap >= DEFAULT_INITIAL_CAPACITY)
		newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
	newCap = oldThr;
// 懒加载走的下面代码
else {               // zero initial threshold signifies using defaults
	newCap = DEFAULT_INITIAL_CAPACITY;
	newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// 看上面分支，只有用阈值作为容量的时候才会走下面代码；然后重新计算阈值
if (newThr == 0) {
	float ft = (float)newCap * loadFactor;
	newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
			  (int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})

第二阶段，元素转移新表的过程：

// 开始创建新表，元素转移阶段
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
	// 遍历旧表数组，转移元素
	for (int j = 0; j < oldCap; ++j) {
		Node<K,V> e;
		if ((e = oldTab[j]) != null) {
			oldTab[j] = null;
			if (e.next == null)
				// 计算新表中的hash索引，当数组中只有一个元素的时候，直接移动
				newTab[e.hash & (newCap - 1)] = e;
			else if (e instanceof TreeNode)
				((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
			else { // preserve order，重新计算链表中元素到新数组中的索引
				Node<K,V> loHead = null, loTail = null;
				Node<K,V> hiHead = null, hiTail = null;
				Node<K,V> next;
				// JDK1.8中通过遍历链表，然后分离出两条子链（0/1）
				do {
					next = e.next;
					// 表示当前元素索引位置不变
					if ((e.hash & oldCap) == 0) {
						if (loTail == null)
							loHead = e;
						else
							loTail.next = e;
						loTail = e;
					}
					// 当前元素索引位置：原位置 + 旧数组容量
					else {
						if (hiTail == null)
							hiHead = e;
						else
							hiTail.next = e;
						hiTail = e;
					}
				} while ((e = next) != null);
				// 将新链表添加到新数组中
				if (loTail != null) {
					loTail.next = null;
					newTab[j] = loHead;
				}
				if (hiTail != null) {
					hiTail.next = null;
					newTab[j + oldCap] = hiHead;
				}
			}
		}
	}
}
return newTab;

跟着这个 up看源码：

HashMap夺命14问，你能坚持到第几问？_wenwenaier的博客-CSDN博客

3.2.5 remove()

final Node<K,V> removeNode(int hash, Object key, Object value,
						   boolean matchValue, boolean movable) {
	Node<K,V>[] tab; Node<K,V> p; int n, index;
	
	// 如果数组有元素，按照key获取索引位置Node:p
	if ((tab = table) != null && (n = tab.length) > 0 &&
		(p = tab[index = (n - 1) & hash]) != null) {
		Node<K,V> node = null, e; K k; V v;
		
		// 如果数组元素等于key
		if (p.hash == hash &&
			((k = p.key) == key || (key != null && key.equals(k))))
			node = p;
		else if ((e = p.next) != null) {
			// 如果是树结点
			if (p instanceof TreeNode)
				node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
			// 普通链表结点，遍历元素
			else {
				do {
					if (e.hash == hash &&
						((k = e.key) == key ||
						 (key != null && key.equals(k)))) {
						node = e;
						break;
					}
					p = e;
				} while ((e = e.next) != null);
			}
		}
		if (node != null && (!matchValue || (v = node.value) == value ||
							 (value != null && value.equals(v)))) {
			// 删除树结点
			if (node instanceof TreeNode)
				((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
			// 删除头结点
			else if (node == p)
				tab[index] = node.next;
			// 删除链表中结点
			else
				p.next = node.next;
			++modCount;
			--size;
			afterNodeRemoval(node);
			return node;
		}
	}
	return null;
}

3.2.6 get()

final Node<K,V> getNode(int hash, Object key) {
	Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
	if ((tab = table) != null && (n = tab.length) > 0 &&
		(first = tab[(n - 1) & hash]) != null) {
		// 如果第一个元素满足条件直接返回
		if (first.hash == hash && // always check first node
			((k = first.key) == key || (key != null && key.equals(k))))
			return first;
		// 遍历链表，检查是否为红黑树结点
		if ((e = first.next) != null) {
			if (first instanceof TreeNode)
				return ((TreeNode<K,V>)first).getTreeNode(hash, key);
			do {
				if (e.hash == hash &&
					((k = e.key) == key || (key != null && key.equals(k))))
					return e;
			} while ((e = e.next) != null);
		}
	}
	return null;
}

4. JDK1.7中的rehash

这个值一般都为false, 为true的时候表示需要重新计算一次hash, 计算结果主要来自容量和默认threshold的关系，可以在IDEA中通过参数 -D jdk.map.althashing.threshold = xxx设置；当容量大于等于这个数发生重新hash计算；默认是Integer.MAX_VALUE

5. fail-fast机制

fail-fast 机制，即快速失败机制，是java集合(Collection)中的一种错误检测机制。当在迭代集合的过程中该集合在结构上发生改变的时候，就有可能会发生fail-fast，即抛出 ConcurrentModificationException异常。fail-fast机制并不保证在不同步的修改下一定会抛出异常，它只是尽最大努力去抛出，所以这种机制一般仅用于检测bug。

HashMap中有这样一个字段：modCount; 当进行put 和 remove 的时候都会进行 ++

This field is used to make iterators on Collection-views of the HashMap fail-fast. 
这个值用来使迭代HashMap容器视图是快速失败

 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        // 省略
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        // 省略

            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                else if (node == p)
                    tab[index] = node.next;
                else
                    p.next = node.next;
                ++modCount;
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }

实现原理：

在迭代的时候会调用next（）方法，如果当前ModCount和expectedModCount不相等，说明在遍历过程中有线程调用了remove（）或者 put（）方法，修改了结构，发生并发异常，终止操作；这就是fail-fast机制

final Node<K,V> nextNode() {
       Node<K,V>[] t;
       Node<K,V> e = next;
       if (modCount != expectedModCount)
           throw new ConcurrentModificationException();
       if (e == null)
           throw new NoSuchElementException();
       if ((next = (current = e).next) == null && (t = table) != null) {
           do {} while (index < t.length && (next = t[index++]) == null);
       }
        return e;
}

6. JDK 1.7中HashMap线程不安全

多线程环境下，两个线程创建HashMap，然后进行扩容操作，使用双指针进行元素转移；假设线程2被阻塞然后线程1完成元素扩容，但是线程2的结点指针是执行线程1的，在恢复后从线程1上将元素转移到自己的容器中，但是在转移过程中，使用的where(next != null) ，最终可能导致执行e.next后产生环形链表

对于1.7和1.8而言，HashMap都是现成不安全的，但是对于1.7来说，采用头插法，while循环，当多线程环境进程上下文切换时，可能导致死锁问题。1.8中进行了解决，维护原链表顺序稳定。