浅谈JDK1.8-HashMap底层原理

HashMap的底层原理

HashMap是一个用于存储Key-Value键值对的集合,每一个键值对也叫Entry,这些个Entry分散存储在一个数组当中,这个数组就是HashMap的主干
HashMap数组每一个元素的初始值都是NULL
就是有一个初始大小为16的空数组,在HashMap进行put的时候,通过哈希函数计算出key的哈希值,然后存储到数组的相应位置上。

在JDK1.8中,HashMap的实现是基于数组+链表/红黑树(链表元素超过8)

其初始大小是16,扩容时容量翻倍

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

红黑树的插入,删除和遍历的最坏时间复杂度都是log(n),因此,意外的情况或者恶意使用下导致hashCode()方法的返回值很差时,性能的下降将会是优雅的,但由于TreeNodes的大小是常规Nodes的两倍,所以只有桶中包含足够多的元素以供使用时,我们才会使用树,那么这个树为什么是8呢?
在官方文档中有一段描述:
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:
0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006
more: less than 1 in ten million
理想情况下,在随机哈希代码下,桶中的节点频率遵循泊松分布,文中给出了桶长度K的频率表。由频率表可以看出,桶的长度超过8的概率非常小,作者应该是根据概率统计而选择了8作为阈值。

HashMap的基本属性

基本属性

/**
 * The default initial capacity - MUST be a power of two.初始大小
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * The load factor used when none specified in constructor.负载因子
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

HashMap中的扩容是一项比较耗时的任务,如果能估算Map的容量,最好给它一个默认的初始值。

计算hash

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

HashMap的数据存储结构

HashMap采用Entry数组来存储key-value键值对,每一个键值对组成了一个Entry实体,Entry实体实际上是一个单向的链表结构,它具有Next指针,可以连接下一个Entry实体,以此来解决Hash冲突的问题。
数组的存储区间是连续的,占用内存严重,故空间复杂度很大,但数组的二分查找的时间复杂度小,数组的特点是:查询快,插入和删除困难(因为插入或者删除发生在数组中间的时候,需要移动此位置后的所有元素的位置)
链表特点是:寻址困难,插入和删除容易(搜索的时候总是从第一个开始,啥时候找到了啥时候停止,不管是双向链表还是单向链表,存储的除了数据之外就是指针,这个地方找不到,移动指针到下一个地方再找)
在这里插入图片描述
像这种数据结构,就是HashMap的数据结构了,经过定义的Hash算法之后,将相应的hash值算出来的数据存储到相应的位置。
就跟大家小时候用过的新华字典是一个道理的东西,”王“,”旺“,”望“三个字用拼音查都是wang,这种汉字与拼音之间的关系就相当于hash函数,要想在字典详情页找到”王“,就得先找到wang所在的页数,然后所有的以wang为拼音的字都在那一块儿,我们在一个个找出自己需要的(链表)。
那么这个负载因子是怎么回事呢,就是说这种数组+链表的数据结构初始的数组长度是16的空数组,如果这个空数组有一部分满了,在符合扩容条件下的时候,就会进行扩容

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

threshold就是要调整的容量大小的下一个大小值(容量*负载因子),如果大小达到这个程度,就会调用扩容方法

扩容方法

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

在hashMap中,数组中存储的元素总是最后插入的元素(设计者考虑的是最后插入的元素使用频率高)

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

putVal方法

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

最后:HashMap不是线程安全的,如果想达到线程安全的作用,则要使用Hashtable,Hashtable的实现方法里面都添加了synchronized关键字来确保线程同步,因此相对而言HashMap性能会高一些,在多线程环境下如果使用HashMap则需要Collections.synchronizedMap()方法来获取一个线程安全的集合,这个方法其实就是帮我们在操作HashMap时自动添加了synchronized来实现线程同步。

public static void main(String[] args) {
    Map<String, Object> map = new HashMap<>();
    Map<String, Object> map1 = Collections.synchronizedMap(map);
}

类似于这种,调用工具类的方法扔进去一个map,在方法里面搓了一顿还是返回一个map,不过返回的这个map是线程安全的集合。

这些总结,以前都是写在某笔记上的,最近想的是不能老是一个人闭门造车,因此拿出来贴在博客上,所以如果大家看到了,发现有什么问题或者错误的地方,欢迎指正。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值