HashMap的底层原理
HashMap是一个用于存储Key-Value键值对的集合,每一个键值对也叫Entry,这些个Entry分散存储在一个数组当中,这个数组就是HashMap的主干
HashMap数组每一个元素的初始值都是NULL
就是有一个初始大小为16的空数组,在HashMap进行put的时候,通过哈希函数计算出key的哈希值,然后存储到数组的相应位置上。
在JDK1.8中,HashMap的实现是基于数组+链表/红黑树(链表元素超过8)
其初始大小是16,扩容时容量翻倍
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
红黑树的插入,删除和遍历的最坏时间复杂度都是log(n),因此,意外的情况或者恶意使用下导致hashCode()方法的返回值很差时,性能的下降将会是优雅的,但由于TreeNodes的大小是常规Nodes的两倍,所以只有桶中包含足够多的元素以供使用时,我们才会使用树,那么这个树为什么是8呢?
在官方文档中有一段描述:
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:
0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006
more: less than 1 in ten million
理想情况下,在随机哈希代码下,桶中的节点频率遵循泊松分布,文中给出了桶长度K的频率表。由频率表可以看出,桶的长度超过8的概率非常小,作者应该是根据概率统计而选择了8作为阈值。
HashMap的基本属性
基本属性
/**
* The default initial capacity - MUST be a power of two.初始大小
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* The load factor used when none specified in constructor.负载因子
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
HashMap中的扩容是一项比较耗时的任务,如果能估算Map的容量,最好给它一个默认的初始值。
计算hash
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
HashMap的数据存储结构
HashMap采用Entry数组来存储key-value键值对,每一个键值对组成了一个Entry实体,Entry实体实际上是一个单向的链表结构,它具有Next指针,可以连接下一个Entry实体,以此来解决Hash冲突的问题。
数组的存储区间是连续的,占用内存严重,故空间复杂度很大,但数组的二分查找的时间复杂度小,数组的特点是:查询快,插入和删除困难(因为插入或者删除发生在数组中间的时候,需要移动此位置后的所有元素的位置)
链表特点是:寻址困难,插入和删除容易(搜索的时候总是从第一个开始,啥时候找到了啥时候停止,不管是双向链表还是单向链表,存储的除了数据之外就是指针,这个地方找不到,移动指针到下一个地方再找)
像这种数据结构,就是HashMap的数据结构了,经过定义的Hash算法之后,将相应的hash值算出来的数据存储到相应的位置。
就跟大家小时候用过的新华字典是一个道理的东西,”王“,”旺“,”望“三个字用拼音查都是wang,这种汉字与拼音之间的关系就相当于hash函数,要想在字典详情页找到”王“,就得先找到wang所在的页数,然后所有的以wang为拼音的字都在那一块儿,我们在一个个找出自己需要的(链表)。
那么这个负载因子是怎么回事呢,就是说这种数组+链表的数据结构初始的数组长度是16的空数组,如果这个空数组有一部分满了,在符合扩容条件下的时候,就会进行扩容
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
threshold就是要调整的容量大小的下一个大小值(容量*负载因子),如果大小达到这个程度,就会调用扩容方法
扩容方法
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
在hashMap中,数组中存储的元素总是最后插入的元素(设计者考虑的是最后插入的元素使用频率高)
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
putVal方法
/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
最后:HashMap不是线程安全的,如果想达到线程安全的作用,则要使用Hashtable,Hashtable的实现方法里面都添加了synchronized关键字来确保线程同步,因此相对而言HashMap性能会高一些,在多线程环境下如果使用HashMap则需要Collections.synchronizedMap()方法来获取一个线程安全的集合,这个方法其实就是帮我们在操作HashMap时自动添加了synchronized来实现线程同步。
public static void main(String[] args) {
Map<String, Object> map = new HashMap<>();
Map<String, Object> map1 = Collections.synchronizedMap(map);
}
类似于这种,调用工具类的方法扔进去一个map,在方法里面搓了一顿还是返回一个map,不过返回的这个map是线程安全的集合。