HashMap

最新推荐文章于 2022-03-09 15:20:28 发布

十万大山深处

最新推荐文章于 2022-03-09 15:20:28 发布

阅读量206

点赞数 1

分类专栏： java 文章标签： java hashmap

本文链接：https://blog.csdn.net/shanhaikeping/article/details/116737890

版权

java 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

前期知识

哈希(hash)：哈希也称为散列。基本原理是吧任意长度的输入通过某种规则转为固定长度的输出。这种映射规则就是哈希算法。

特点：

不能由哈希值反向推导出原始数据；
输入数据微小的变化能得到不同的hash值，相同的输入会得到相同的hash值；
执行效率高效；
冲突概率小；

hashCode()：Object类的方法，官网定义如下：

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

Returns:

a hash code value for this object.

转成人话：

hashCode是返回一个对象的哈希码值，这个值对hash表非常有用，比如hashmap；

hashCode有3点协议：

同个java应用在执行时，相同对象返回的hashCode必须一致；
两个对象调用equal方法相等，则两个对象返回的hashcode必须一致；
两个对象不相等，两个对象的hashcode不一定不一样，也就是说两个不同对象可能会得到相同的hashcode值

数据结构

数组+链表+红黑树

在这里插入图片描述

数组：通过某种hash算法，计算出当前key值应该存放的数组下标；
链表：当key的hash值冲突，存入当前数组位置的链表的节点中；
红黑树：当链表的长度过长时，查询效率降低，将链表的节点转化为红黑树结构，提高效率；

相关方法

初始化–HashMap()

 	public HashMap() {
        //将负载因子设置为默认值，其余变量默认
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

从构造方法可以看出，当声明一个hashMap对象时，并没有创建数组，数组创建时间为第一次插入数据时创建，可见与put方法，可以避免空间浪费。

计算桶的位置–hash()

对于给定的key值，不管是插入，查找，还是删除，都需要计算桶的位置index，采用将key对应的hash值对（数组长度-1）取余的方法，即i=hashCode%(n-1)，然而取余的操作效率较低，然而取余操作等价于i = (n - 1) & hashCode，可提高效率。

当桶的大小为16时，操作如下：

  11001100 11001100 11001100 11001100
& 00000000 00000000 00000000 00001111
--------------------------------------
  00000000 00000000 00000000 00001100

然而，数组大小不是2的整数幂时，比如10，此时得计算如下：

  11001100 11001100 11001100 11001100
& 00000000 00000000 00000000 00001001
--------------------------------------
  00000000 00000000 00000000 00001000

此时，虽然也可以计算得到index值，但是位置为*11*将永远取不到，也就是说当数组长度不为2的整数幂时，对于key的散列效果过于集中，不符合哈希算法的平均分配的原则，而当长度是2的整数幂时，length-1的二进制位全为1，这种情况下等同于hashCode的后几位值。

将数组长度设置为2的整数幂，虽然在一定程度上提高了散列效果，但是当不同的key的hashcode只取后几位时，仍然会产生严重的hash冲突，故此，hashMap提供了hash函数，进一步提高了散列效果。

	static final int hash(Object key) {
        int h;
        //获取key值hashcode
        //高16位与低16位 异或 操作
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

	11001010 01011100 10101011 10101100
^   00000000 00000000 11001010 01011100    //右移16位
------------------------------------------
    11001010 01011100 01100001 11110000    //高16位+混合的低16位

将低半区和高半区进行异或操作，混合hashcode的高低位，进一步加大了随机性。

插入数据–putVal()

hashMap是用来装数据的容器，其装载过程如下：
在这里插入图片描述
源代码如下：

 	 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
         Node<K,V>[] tab;//数组
         Node<K,V> p;//数组上的节点
         int n, i;//n为数组长度，i为位置下标
         //当前数组是否为空，若为空，则创建数组
        if ((tab = table) == null || (n = tab.length) == 0)
            //创建一个空数组，长度为16
            n = (tab = resize()).length;      
         //计算key的数组位置下标，且当前位置为空
        if ((p = tab[i = (n - 1) & hash]) == null)
            //创建新节点，放入该位置
            tab[i] = newNode(hash, key, value, null);
         //当前位置不为空，存在hash冲突
        else {
            Node<K,V> e; K k;
            //当前位置的key值与插入的key值是否一致,
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))   
                //记录当前为更新操作
                e = p;
            //当前节点是否是红黑树节点类型
            else if (p instanceof TreeNode)
                // 向红黑树中插入新数据
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //当前为链表结构
            else {
                //遍历链表节点
                for (int binCount = 0; ; ++binCount) {
                    //找到链表尾部
                    if ((e = p.next) == null) {
                        //创建新节点，并加入到链表尾
                        p.next = newNode(hash, key, value, null);
                        //当前链表节点数量是否达到树化阈值
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //当前节点key值相等，更新操作
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //key已存在，执行更新操作
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    //覆盖原value
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
         //是否达到扩容阈值
        if (++size > threshold)
            //扩容
            resize();
        afterNodeInsertion(evict);
        return null;
    }

扩容–resize()

在插入数据时，数组为空时的数组初始化和达到阈值时的扩容，都是调用resize（）方法，其执行流程如下：
在这里插入图片描述
源代码及其分析如下：

 final Node<K,V>[] resize() {
     	//获取当前数组
        Node<K,V>[] oldTab = table;
     	//获取当前元素组大小
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
     	//当前扩容阈值
        int oldThr = threshold;
        int newCap, newThr = 0;
     	//当前数组不为空，重新计算数组扩容的大小
        if (oldCap > 0) {
            //当前数组是否达到最大长度
            if (oldCap >= MAXIMUM_CAPACITY) {
                //数组达到最大长度，不再扩容，直接返回
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            //原数组扩容为原来的2倍
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
     	//扩容后给新数组大小赋值
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        //当前数组为空
        else {               // zero initial threshold signifies using defaults
            //给新数组长度和扩容阈值设置初始默认值
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
     	//计算新的扩容阈值
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        //更新新的扩容阈值
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
     	//根据新的数组长度创建新的数组
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
     	//更新数组
        table = newTab;
     	//原数组不为空，即不是初始化
        if (oldTab != null) {
            //对数组数据重新散列，将节点数据移动到新的位置
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    //只有一个节点
                    if (e.next == null)
                        //重新计算数组位置下标
                        newTab[e.hash & (newCap - 1)] = e;
                    //当前节点是树节点
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    //当前节点下接链表
                    else { // preserve order
                        //高位为1的链表
                        Node<K,V> loHead = null, loTail = null;
                        //高位为0的链表
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        //遍历链表
                        do {
                            next = e.next;
                            //当前节点高位为0，加入低下标链表
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                             //当前节点高位为1，加入高下标链表
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        //将低位链表放入当前位置
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        //高位链表放入当前位置+oldCap位置
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

代码分析如下：

## 原数组下标index计算：
  11001100 11001100 11001100 11011100
& 00000000 00000000 00000000 00001111
--------------------------------------
  00000000 00000000 00000000 00001100      ## 对应12
  
## 扩容后
  11001100 11001100 11001100 11011100
& 00000000 00000000 00000000 00011111
--------------------------------------
  00000000 00000000 00000000 00011100     ## 对应28=12+16

观察可以得到，扩容前和扩容后的key的对应数组下标只有两种可能，要么还在原位置，要么是在原位置向右偏移原数组大小的量，也就是说当前数组长度为16时，扩容后位置下标为j或者j+16这取决于hash值得高位是0还是1，若是0，则还在原位置，若是1，则位于新位置。

当前位置为树节点是，调用split()方法，具体源码如下：

 final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
            TreeNode<K,V> b = this;
            // Relink into lo and hi lists, preserving order
            TreeNode<K,V> loHead = null, loTail = null;
            TreeNode<K,V> hiHead = null, hiTail = null;
            int lc = 0, hc = 0;
     		//遍历树节点,将各个节点分为低位区和高位区，并分别构造链表
            for (TreeNode<K,V> e = b, next; e != null; e = next) {
                next = (TreeNode<K,V>)e.next;
                e.next = null;
                if ((e.hash & bit) == 0) {
                    if ((e.prev = loTail) == null)
                        loHead = e;
                    else
                        loTail.next = e;
                    loTail = e;
                    ++lc;
                }
                else {
                    if ((e.prev = hiTail) == null)
                        hiHead = e;
                    else
                        hiTail.next = e;
                    hiTail = e;
                    ++hc;
                }
            }
			//将低位区链表放入原位置，并判断是否需要树化
            if (loHead != null) {
                //当前链表长度达到反树化阈值，将树节点链表化
                if (lc <= UNTREEIFY_THRESHOLD)
                    tab[index] = loHead.untreeify(map);
                else {
                    tab[index] = loHead;
                    //树化
                    if (hiHead != null) // (else is already treeified)
                        loHead.treeify(tab);
                }
            }
            if (hiHead != null) {
                if (hc <= UNTREEIFY_THRESHOLD)
                    tab[index + bit] = hiHead.untreeify(map);
                else {
                    tab[index + bit] = hiHead;
                    if (loHead != null)
                        hiHead.treeify(tab);
                }
            }
        }

十万大山深处

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap

前期知识哈希(hash)：哈希也称为散列。基本原理是吧任意长度的输入通过某种规则转为固定长度的输出。这种映射规则就是哈希算法。特点：不能由哈希值反向推导出原始数据；输入数据微小的变化能得到不同的hash值，相同的输入会得到相同的hash值；执行效率高效；冲突概率小；hashCode()：Object类的方法，官网定义如下：Returns a hash code value for the object. This method is supported for the be
复制链接

扫一扫