浅谈JDK1.8-HashMap底层原理

最新推荐文章于 2022-06-21 16:26:19 发布

着凉的皮皮虾

最新推荐文章于 2022-06-21 16:26:19 发布

阅读量1.2k

点赞数 1

分类专栏： Java基础文章标签： HashMap

本文链接：https://blog.csdn.net/weixin_37641413/article/details/97618705

版权

Java基础专栏收录该内容

7 篇文章 0 订阅

订阅专栏

HashMap的底层原理

HashMap是一个用于存储Key-Value键值对的集合，每一个键值对也叫Entry，这些个Entry分散存储在一个数组当中，这个数组就是HashMap的主干
HashMap数组每一个元素的初始值都是NULL
就是有一个初始大小为16的空数组，在HashMap进行put的时候，通过哈希函数计算出key的哈希值，然后存储到数组的相应位置上。

在JDK1.8中，HashMap的实现是基于数组+链表/红黑树（链表元素超过8）

其初始大小是16，扩容时容量翻倍

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

红黑树的插入，删除和遍历的最坏时间复杂度都是log（n），因此，意外的情况或者恶意使用下导致hashCode（）方法的返回值很差时，性能的下降将会是优雅的，但由于TreeNodes的大小是常规Nodes的两倍，所以只有桶中包含足够多的元素以供使用时，我们才会使用树，那么这个树为什么是8呢？
在官方文档中有一段描述：
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:
0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006
more: less than 1 in ten million
理想情况下，在随机哈希代码下，桶中的节点频率遵循泊松分布，文中给出了桶长度K的频率表。由频率表可以看出，桶的长度超过8的概率非常小，作者应该是根据概率统计而选择了8作为阈值。

HashMap的基本属性

基本属性

/**
 * The default initial capacity - MUST be a power of two.初始大小
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * The load factor used when none specified in constructor.负载因子
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

HashMap中的扩容是一项比较耗时的任务，如果能估算Map的容量，最好给它一个默认的初始值。

计算hash

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

HashMap的数据存储结构

HashMap采用Entry数组来存储key-value键值对，每一个键值对组成了一个Entry实体，Entry实体实际上是一个单向的链表结构，它具有Next指针，可以连接下一个Entry实体，以此来解决Hash冲突的问题。
数组的存储区间是连续的，占用内存严重，故空间复杂度很大，但数组的二分查找的时间复杂度小，数组的特点是：查询快，插入和删除困难（因为插入或者删除发生在数组中间的时候，需要移动此位置后的所有元素的位置）
链表特点是：寻址困难，插入和删除容易（搜索的时候总是从第一个开始，啥时候找到了啥时候停止，不管是双向链表还是单向链表，存储的除了数据之外就是指针，这个地方找不到，移动指针到下一个地方再找）
在这里插入图片描述
像这种数据结构，就是HashMap的数据结构了，经过定义的Hash算法之后，将相应的hash值算出来的数据存储到相应的位置。
就跟大家小时候用过的新华字典是一个道理的东西，”王“，”旺“，”望“三个字用拼音查都是wang，这种汉字与拼音之间的关系就相当于hash函数，要想在字典详情页找到”王“，就得先找到wang所在的页数，然后所有的以wang为拼音的字都在那一块儿，我们在一个个找出自己需要的（链表）。
那么这个负载因子是怎么回事呢，就是说这种数组+链表的数据结构初始的数组长度是16的空数组，如果这个空数组有一部分满了，在符合扩容条件下的时候，就会进行扩容

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

threshold就是要调整的容量大小的下一个大小值（容量*负载因子）,如果大小达到这个程度，就会调用扩容方法

扩容方法

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

在hashMap中，数组中存储的元素总是最后插入的元素（设计者考虑的是最后插入的元素使用频率高）

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

putVal方法

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

最后：HashMap不是线程安全的，如果想达到线程安全的作用，则要使用Hashtable，Hashtable的实现方法里面都添加了synchronized关键字来确保线程同步，因此相对而言HashMap性能会高一些，在多线程环境下如果使用HashMap则需要Collections.synchronizedMap（）方法来获取一个线程安全的集合，这个方法其实就是帮我们在操作HashMap时自动添加了synchronized来实现线程同步。

public static void main(String[] args) {
    Map<String, Object> map = new HashMap<>();
    Map<String, Object> map1 = Collections.synchronizedMap(map);
}

类似于这种，调用工具类的方法扔进去一个map，在方法里面搓了一顿还是返回一个map，不过返回的这个map是线程安全的集合。

这些总结，以前都是写在某笔记上的，最近想的是不能老是一个人闭门造车，因此拿出来贴在博客上，所以如果大家看到了，发现有什么问题或者错误的地方，欢迎指正。

着凉的皮皮虾

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
浅谈JDK1.8-HashMap底层原理

HashMap的底层原理HashMap是一个用于存储Key-Value键值对的集合，每一个键值对也叫Entry，这些个Entry分散存储在一个数组当中，这个数组就是HashMap的主干HashMap数组每一个元素的初始值都是NULL就是有一个初始大小为16的空数组，在HashMap进行put的时候，通过哈希函数计算出key的哈希值，然后存储到数组的相应位置上。在JDK1.8中，HashMa...
复制链接

扫一扫

专栏目录