数据结构：HashMap

最新推荐文章于 2023-05-30 15:39:04 发布

songzi1228

最新推荐文章于 2023-05-30 15:39:04 发布

阅读量204

点赞数 1

分类专栏： Java：数据结构

本文链接：https://blog.csdn.net/songzi1228/article/details/89449626

版权

Java：数据结构专栏收录该内容

18 篇文章 1 订阅

订阅专栏

面试必备：HashMap源码解析（JDK8）

JDK1.8 HashMap源码分析

参照咕泡学院公开课Jack老师。https://www.bilibili.com/video/av75970633

一、阿里面试题

1、HashMap的原理，内部数据结构？

底层使用哈希表（数组+链表），当链表过长会将链表转成红黑树以实现O（logn）时间复杂度内查找

2、讲一下HashMap中put方法过程？

a、对key求hash值，然后再计算下标
b、如果没有碰撞，直接放入桶中
c、如果碰撞了，以链表方式链接到后面
d、如果链表长度超过阈值（TREEIFY_THRESHOLD == 8）,就把链表转成红黑树
e、如果节点已经存在就替换旧值
f、如果桶满了（容量*加载因子），就需要resize

3、HashMap中hash函数是怎么实现的？还有哪些hash的实现方式？

a、高16bit不变，低16bit和高16bit做了一个异或
b、（n-1）& hash --> 得到下标

4、HashMap怎样解决冲突，讲一下扩容过程，假如一个值在原数组中，现在移动了新数组，位置肯定改变了，那是什么定位到这个值新数组中的位置；

将新节点加到链表后
容量扩充为原来的两倍，然后对每个节点重新计算哈希值
这个值只可能在两个地方，一个是原下标的位置，另一种是在下标为<原下标+原容量>的位置

5、抛开HashMap，hash冲突有哪些解决办法？

开放地址，链地址法

6、针对HashMap中某个Entry链过长，查找的时间复杂度可能达到O（n）,怎么优化？

将链表转化为红黑树，JDK1.8已经实现了。

二、源码预热

2.1 HashMap的简单使用：

public class Test1 {
    public static void main(String[] args) {
        HashMap<Integer, String> hashMap = new HashMap<>();
        hashMap.put(1,"张三");
        hashMap.put(2,"李四");
        hashMap.put(3,"王五");
        hashMap.put(4,"赵六");
        System.out.println(hashMap.get(1));
    }
}

2.2 实际案例

MessageQueue中有ThreadLocal的使用， ThreadLocal是可以理解为一个简化版的HashMap。

2.3 由浅入深，逐步深入学习

初看HashMap的源码看不懂怎么办呢？

首先想到它是一个数据结构，一个集合。

一个集合，它是要有Collections的接口的。

一个数据结构，都是有数据的存储方式和存储结构的。

比HashMap更加简单的数据结构就是ArrayList和LinkedList，它们的存储结构和存储方式了解吗？

我们可以先了解这两个简单的，掌握之后，在了解Hashmap的结构

2.4 ArrayList

ArrayList的存储方式是把数据存入到一个数组当中：

下面是ArrayList的add方法：

public boolean add(E e) {
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }

//transient  不参加序列化的关键词
transient Object[] elementData;

2.5 LinkedList

结构图：一个双向链表和一个单向链表

代码体现：内部节点类（Node）和add方法

private static class Node<E> {
        E item;
        Node<E> next;
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

public boolean add(E e) {
        linkLast(e);
        return true;
    }

void linkLast(E e) {
        final Node<E> l = last;
        final Node<E> newNode = new Node<>(l, e, null);
        last = newNode;
        if (l == null)
            first = newNode;
        else
            l.next = newNode;
        size++;
        modCount++;
    }

三、源码分析

我们已经知道了ArrayList和LinkedList的数据结构，接下来我们来看HashMap的数据结构。它结合了ArrayList和LinkedList的优势。

接下来我们结合这张结构图来剖析一下Hashmap的源码：

3.1 体现了链表的结构特点：

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        ...
}

3.2 体现了数组的结构特点：

transient Node<K,V>[] table;

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {
    Node<K,V>[] tab; 
    Node<K,V> p; 
    int n, i;
    ...
    if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
    ...
}

3.3 初始容量和最大容量：

    /**
       默认大小 2的4次方
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
       最大容量 2的30次方
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

<< : 相当于乘以2的倍数
>> : 相当于除以2的倍数
移n位，就是乘以或者除以2的n次幂。

所以 1 << 4 = 1*2的4次方 = 16

1 << 31 = 2的30次方

3.4 扩容：

加载因子，当容量达到（capacity * 加载因子）时，就进行扩容

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

3.5 每个链表的默认阈值

/**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

3.6 put方法

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
}

这里面的第一个参数是 hash(key)，这是一个哈希函数。我们知道任何对象都有一个hashCode()函数，可以获取哈希值，如下：

public class Test1 {
    public static void main(String[] args) {
        System.out.println("guan".hashCode());
        System.out.println("song".hashCode());
        System.out.println("1".hashCode());
        System.out.println("2".hashCode());
    }
}

打印结果：
3184027
3536149
49
50

可以看到，一个字符串的哈希值太大了，这样子很占内存，所以我们要采取方法来减小这个哈希值。

怎么减小呢？可以用取模运算。比如，链表大小为16（即0 - 15），我们可以 num%16 ----> 0-15 来计算。

那么，在Hashmap中有没有取模运算呢？有的

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) { 
    if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
}

其中（n-1）& hash 就等同于 hash % （n-1）

&是位运算中的与运算，它的规则是：都为1时才为1，例如

System.out.println( 7 & 9);
        /*
         *  7二进制 0111
         *  9二进制 1001
         * -----------
         *        0001   ==1
         * */
所以我们举个例子，如下

public class Test1 {
    public static void main(String[] args) {
        System.out.println(3184027 & 15);
        System.out.println(3184000 & 15);
        System.out.println(3184001 & 15);
        System.out.println(3184002 & 15);
        System.out.println(3184003 & 15);
    }
}

结果为：
11
0
1
2
3

你会发现，结果跟取模的效果是一样的，至于为什么用与运算而不用取模运算呢？这是因为位运算使用0101来运算的，在计算机中它的运算效率更高。

所以，数组的大小必须是2的n次幂，这样才能保证(n-1)是0111111的形式，这样才能保证与运算不被n值影响，而反应的就是hash值的变化。

3.7 hash(key)方法

jdk8-HashMap源码注释中文翻译

 Computes key.hashCode() and spreads (XORs) higher bits of hash
 to lower.  Because the table uses power-of-two masking, sets of
 hashes that vary only in bits above the current mask will
 always collide.

 计算key.hashCode()，并分散哈希值的高位和低位。因为表格使用了两种掩模，
 如果掩模上的哈希值只分布在较少的位上将会导致冲突。

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

当链表超过一定长度就会转为红黑树，而红黑树是很消耗内存的，所以要尽可能避免这么做。所以要让hashmap的每个链表被充分利用起来，避免其中一个或几个链表过长。而通过hashcode的高16位和低16位的异或运算，就可以减少重复值的产生概率，即避免哈西碰撞。

3.8 table的初始化

transient Node<K,V>[] table;

//The next size value at which to resize (capacity * load factor).
int threshold;

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {
    if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
}

//  Node[] table=new Node[defaultInitCapacity]
//  resize() ---->  初始化Node[]

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    ...
    else{
         newCap = DEFAULT_INITIAL_CAPACITY;
         newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    ...
    threshold = newThr;
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
   

    return newTab;
}

3.9 扩容

resize()方法有两个功能：一个是初始化table，另一个就是扩容

else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold

双倍扩容 double

扩容的话，有三种情况：

1、数组位有数据，链表没有数据； --> 直接按照取模运算重新摆放位置
2、数组位有数据，且下面是红黑树结构； --> 把红黑树拆掉，重新分配数据
3、数组为有数据，且下面是链表（数据量小于8）； -->

                    if (e.next == null)//只有数组位有数据
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)//数组位有数据，下面是红黑树
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order  数组位有数据，下面是链表
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }

四、其他知识点：

ArrayList和LinkedList的比较

那么有没有身么结构能够结合两者的优势呢？
有，就是HashTable

数组和链表怎么组织工作？

int hash是什么？有什么作用？

Hash的原理是什么？

Hash的put方法原理？

Hash的get方法原理？

songzi1228

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据结构：HashMap

目录相关文章：一、阿里面试题二、源码预热2.1 HashMap的简单使用：2.2实际案例2.3 由浅入深，逐步深入学习2.4ArrayList2.5 LinkedList三、源码分析3.1 体现了链表的结构特点：3.2 体现了数组的结构特点：3.3 初始容量和最大容量：3.4 扩容：3.5 每个链表的默认阈值3.6 put方法...
复制链接

扫一扫