HashMap详解

最新推荐文章于 2024-08-01 16:09:38 发布

happens-before

最新推荐文章于 2024-08-01 16:09:38 发布

阅读量206

点赞数

本文链接：https://blog.csdn.net/qq_38119372/article/details/79540323

版权

HashMap的概述
HashMap的数据结构及解决 hash冲突的方法
HashMap源码分析：存储，读取，扩容
HashMap的多线程不安全的原因
HashMap的多线程不安全的解决方法

HashMap的概述

HashMap 是基于哈希表的 Map 接口的非同步实现。此实现提供所有可选的映射操作，并允许使用 null 值和 null 键。此类不保证映射的顺序，特别是它不保证该顺序恒久不变。

HashMap的数据结构及解决 hash冲突的方法

HashMap实际上是一个“链表散列”的数据结构，即数组和链表的结合体。从上图中可以看出，HashMap 底层就是一个数组结构，数组中的每一项又是一个链表。当新建一个 HashMap 的时候，就会初始化一个数组。如下图：“
这里写图片描述

transient Node<K,V>[] table;
static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
}

HashMap中主要是通过key的hashcode来计算hash值，只要hashcode的值相同，hash值就一样。加入存储的对象很多，那么就有可能出现相同的hash值，也就是出现所谓的hash冲突。为了解决hash冲突，HashMap底层应用链表来解决的。(JDK 1.8 以前 HashMap 的实现是数组+链表，即使哈希函数取得再好，也很难达到元素百分百均匀分布。当 HashMap 中有大量的元素都存放到同一个桶中时，这个桶下有一条长长的链表，这个时候 HashMap 就相当于一个单链表，假如单链表有 n 个元素，遍历的时间复杂度就是 O(n)，完全失去了它的优势。针对这种情况，JDK 1.8 中引入了红黑树（查找时间复杂度为 O(logn)）来优化这个问题。）

HashMap源码分析：存储，读取，扩容

存储：

1. public V put(K key, V value) {
2. // HashMap 允许存放 null 键和 null 值。
3. // 当 key 为 null 时，调用 putForNullKey 方法，将 value 放置在数组第一个位置。
4. if (key == null)
5. return putForNullKey(value);
6. // 根据 key 的 keyCode 重新计算 hash 值。
7. int hash = hash(key.hashCode());
8. // 搜索指定 hash 值在对应 table 中的索引。
9. int i = indexFor(hash, table.length);
10. // 如果 i 索引处的 Entry 不为 null，通过循环不断遍历 e 元素的下一个元素。
11. for (Entry<K,V> e = table[i]; e != null; e = e.next) {
12. Object k;
13. if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
14. V oldValue = e.value;
15. e.value = value;
16. e.recordAccess(this);
17. return oldValue;
18. }
19. }
20. // 如果 i 索引处的 Entry 为 null，表明此处还没有 Entry。
21. modCount++;
22. // 将 key、value 添加到 i 索引处。
23. addEntry(hash, key, value, i);
24. return null;
25. }

从上面的源代码中可以看出：当我们往 HashMap 中 put 元素的时候，先根据 key 的
hashCode 重新计算 hash 值，根据 hash 值得到这个元素在数组中的位置（即下标），如
果数组该位置上已经存放有其他元素了，那么在这个位置上的元素将以链表的形式存放，新
加入的放在链头，最先加入的放在链尾。如果数组该位置上没有元素，就直接将该元素放到
此数组中的该位置上。

1. void addEntry(int hash, K key, V value, int bucketIndex) {
2. // 获取指定 bucketIndex 索引处的 Entry
3. Entry<K,V> e = table[bucketIndex];
4. // 将新创建的 Entry 放入 bucketIndex 索引处，并让新的 Entry 指向原来的 Entr
y
5. table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
6. // 如果 Map 中的 key-value 对的数量超过了极限
7. if (size++ >= threshold)
8. // 把 table 对象的长度扩充到原来的 2 倍。
9. resize(2 * table.length);
10. }

上面方法的代码很简单，但其中包含了一个设计：系统总是将新添加的 Entry 对象放入 table 数组的 bucketIndex 索引处——如果 bucketIndex 索引处已经有了一个 Entry 对象，那新添加的 Entry 对象指向原有的 Entry 对象（产生一个 Entry 链），如果 bucketIndex 索引处没有 Entry 对象，也就是上面程序代码的 e 变量是 null，也就是新放入的 Entry 对象指向 null，也就是没有产生 Entry 链。
HashMap里面没有出现hash冲突时，没有形成单链表时，hashmap查找元素很快,get()方法能够直接定位到元素，但是出现单链表后，单个bucket 里存储的不是一个 Entry，而是一个 Entry 链，系统只能必须按顺序遍历每个 Entry，直到找到想搜索的 Entry 为止——如果恰好要搜索的 Entry 位于该 Entry 链的最末端（该 Entry 是最早放入该 bucket 中），那系统必须循环到最后才能找到该元素。

读取：

1. public V get(Object key) {
2. if (key == null)
3. return getForNullKey();
4. int hash = hash(key.hashCode());
5. for (Entry<K,V> e = table[indexFor(hash, table.length)];
6. e != null;
7. e = e.next) {
8. Object k;
9. if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
10. return e.value;
11. }
12. return null;
13. }

有了上面存储时的 hash 算法作为基础，理解起来这段代码就很容易了。从上面的源代码中可以看出：从 HashMap 中 get 元素时，首先计算 key 的 hashCode，找到数组中对应位置的某一元素，然后通过 key 的 equals 方法在对应位置的链表中找到需要的元素。归纳起来简单地说，HashMap 在底层将 key-value 当成一个整体进行处理，这个整体就是一个 Entry 对象。HashMap 底层采用一个 Entry[] 数组来保存所有的 key-value 对，当需要存储一个 Entry 对象时，会根据hash算法来决定其在数组中的存储位置，在根据equals方法决定其在该数组位置上的链表中的存储位置；当需要取出一个 Entry 时，也会根据 hash算法找到其在数组中的存储位置，再根据 equals 方法从该位置上的链表中取出该 Entry。

扩容：

1.  void resize(int newCapacity) {  
2.          Entry[] oldTable = table;  
3.          int oldCapacity = oldTable.length;  
4.          if (oldCapacity == MAXIMUM_CAPACITY) {  
5.              threshold = Integer.MAX_VALUE;  
6.              return;  
7.          }  
8.    
9.          Entry[] newTable = new Entry[newCapacity];  
10.         transfer(newTable);  
11.         table = newTable;  
12.         threshold = (int)(newCapacity * loadFactor);  
13.     }

当创建 HashMap 时，有一个默认的负载因子（load factor），其默认值为 0.75，这是时间和空间成本上一种折衷：增大负载因子可以减少 Hash 表（就是那个 Entry 数组）所占用的内存空间，但会增加查询数据的时间开销，而查询是最频繁的的操作（HashMap 的 get() 与 put() 方法都要用到查询）；减小负载因子会提高数据查询的性能，但会增加 Hash 表所占用的内存空间。

HashMap的多线程不安全的原因

大家都知道HashMap线程是不安全的，HashMap为什么线程不安全，多线程并发的时候在什么情况下可能出现问题？
1.在hashmap做put操作的时候会调用到addEntry的方法。现在假如A线程和B线程同时对同一个数组位置调用addEntry，两个线程会同时得到现在的头结点，然后A写入新的头结点之后，B也写入新的头结点，那B的写入操作就会覆盖A的写入操作造成A的写入操作丢失。
2.删除键值对时，当多个线程同时操作同一个数组位置的时候，也都会先取得现在状态下该位置存储的头结点，然后各自去进行计算操作，之后再把结果写会到该数组位置去，其实写回的时候可能其他的线程已经就把这个位置给修改过了，就会覆盖其他线程的修改。
3.addEntry中当加入新的键值对后键值对总数量超过门限值的时候会调用一个resize操作。这个操作会新生成一个新的容量的数组，然后对原数组的所有键值对重新进行计算和写入新的数组，之后指向新生成的数组。当多个线程同时检测到总数量超过门限值的时候就会同时调用resize操作，各自生成新的数组并rehash后赋给该map底层的数组table，结果最终只有最后一个线程生成的新数组被赋给table变量，其他线程的均会丢失。而且当某些线程已经完成赋值而其他线程刚开始的时候，就会用已经被赋值的table作为原始数组，这样也会有问题。

HashMap的多线程不安全的解决方法

1.Hashtable ：HashTable 源码中是使用 synchronized 来保证线程安全的
Map<String, String> hashtable = new Hashtable<>();
2.SynchronizedMap
Map<String, String> synchronizedHashMap = Collections.synchronizedMap(new HashMap<String, String>());
3.ConcurrentHashMap
Map<String, String> concurrentHashMap = new ConcurrentHashMap<>();
例子：来源与https://yemengying.com/2016/05/07/threadsafe-hashmap/

import java.util.Collections;
import java.util.HashMap;
import java.util.Hashtable;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class TestHashMap
{
    public final static int THREAD_POOL_SIZE = 5;
    public static Map<String, Integer> crunchifyHashTableObject = null;
    public static Map<String, Integer> crunchifySynchronizedMapObject = null;
    public static Map<String, Integer> crunchifyConcurrentHashMapObject = null;
    public static HashMap<String, Integer> dhcHashMap = null;

    public static void main(String[] args) throws InterruptedException
    {
        // Test with Hashtable Object
        crunchifyHashTableObject = new Hashtable<>();
        crunchifyPerformTest(crunchifyHashTableObject);
        // Test with synchronizedMap Object
        crunchifySynchronizedMapObject = Collections.synchronizedMap(new HashMap<String, Integer>());
        crunchifyPerformTest(crunchifySynchronizedMapObject);
        // Test with ConcurrentHashMap Object
        crunchifyConcurrentHashMapObject = new ConcurrentHashMap<>();
        crunchifyPerformTest(crunchifyConcurrentHashMapObject);
    }

    public static void crunchifyPerformTest(final Map<String, Integer> crunchifyThreads)
            throws InterruptedException
    {
        System.out.println("Test started for: " + crunchifyThreads.getClass());
        for(int i = 0; i < 5; i++)
        {
            long startTime = System.nanoTime();
            ExecutorService crunchifyExServer = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
            for(int j = 0; j < THREAD_POOL_SIZE; j++)
            {
                crunchifyExServer.execute(new Runnable()
                {
                    @SuppressWarnings("unused")
                    @Override
                    public void run()
                    {
                        for(int i = 0; i < 500000; i++)
                        {
                            Integer crunchifyRandomNumber = (int) Math.ceil(Math.random() * 550000);
                            // Retrieve value. We are not using it anywhere
                            Integer crunchifyValue = crunchifyThreads.get(String.valueOf(crunchifyRandomNumber));
                            // Put value
                            crunchifyThreads.put(String.valueOf(crunchifyRandomNumber),
                                                 crunchifyRandomNumber);
                        }
                    }
                });
            }
            // Make sure executor stops
            crunchifyExServer.shutdown();
            // Blocks until all tasks have completed execution after a shutdown request
            crunchifyExServer.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
            long entTime = System.nanoTime();
            long totalTime = (entTime - startTime) / 1000000L;
            averageTime += totalTime;
            System.out.println("2500K entried added/retrieved in " + totalTime + " ms");
        }
        System.out.println("For " + crunchifyThreads.getClass() + " the average time is "
                + averageTime / 5 + " ms\n");
    }
}

运行结果为：

Test started for: class java.util.Hashtable
2500K entried added/retrieved in 3371 ms
2500K entried added/retrieved in 2740 ms
2500K entried added/retrieved in 2847 ms
2500K entried added/retrieved in 2698 ms
2500K entried added/retrieved in 2683 ms
For class java.util.Hashtable the average time is 2867 ms

Test started for: class java.util.Collections$SynchronizedMap
2500K entried added/retrieved in 3265 ms
2500K entried added/retrieved in 2705 ms
2500K entried added/retrieved in 2662 ms
2500K entried added/retrieved in 2591 ms
2500K entried added/retrieved in 2680 ms
For class java.util.Collections$SynchronizedMap the average time is 2780 ms

Test started for: class java.util.concurrent.ConcurrentHashMap
2500K entried added/retrieved in 1614 ms
2500K entried added/retrieved in 925 ms
2500K entried added/retrieved in 806 ms
2500K entried added/retrieved in 834 ms
2500K entried added/retrieved in 1360 ms
For class java.util.concurrent.ConcurrentHashMap the average time is 1107 ms