定制化高效使用Map的一些经验技巧

最新推荐文章于 2023-05-17 15:02:52 发布

飞飞好奇

最新推荐文章于 2023-05-17 15:02:52 发布

阅读量1.4k

点赞数 1

分类专栏： Java 文章标签：数据结构 Map HashMap CurrentHashMap

Java 专栏收录该内容

21 篇文章 1 订阅

订阅专栏

文章出处

http://wenshao.iteye.com/blog/886621

Map是一种非常用的数据结构，在一些底层框架或者效率十分关键的地方也是十分常用的。我写这篇文章的意图就是把我关于高效使用map的一些经验技巧写下来，当然其中我的一些观点可能不对，如果有朋友发现有错误的地方，欢迎指正。

在Java中Map是什么呢？先说HashMap，java.util.HashMap这个类，就是一个链表数组，最简单的理解就是把key的hashCode % len得到所在链表的位置，然后在链表上挨个查找。

这个代码摘抄自JDK 6的java.util.HashMap，为了方便说明问题，有所删减。其中一些关键点，我都已经注释说明

    Java代码   
    
  
 public class HashMap<K,V> {  
     // Entry是一个很标准的链表结构  
     static class Entry<K,V> {  
         final K key;  
         V value;  
         Entry<K,V> next;  
         final int hash;  
     }  
   
     transient Entry[] table; // table就是一个链表数组  
     transient int size;  
       
    public HashMap(int initialCapacity) {  
         // 注意，这里是性能的一个很关键地方，如果自行编写HashMap时，table.length不是2的n次方，性能会很容易很糟糕。  
         int capacity = 1;  
         while (capacity < initialCapacity)  
             capacity <<= 1;  
   
         table = new Entry[capacity];  
     }  
       
     // get方法所做的事情包括hash、indexFor、链表遍历、链表中每个Entry.key的相等比较  
     public V get(Object key) {  
         int hash = hash(key.hashCode()); //   
         int index = indexFor(hash, table.length);   
         for (Entry<K,V> e = table[index]; e != null; e = e.next) { // 链表遍历  
             Object k;  
             if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
                 return e.value;  
             }  
         }  
         return null;  
     }  
       
     public V put(K key, V value) {  
         int hash = hash(key.hashCode());  
         int index = indexFor(hash, table.length);  
         for (Entry<K,V> e = table[index]; e != null; e = e.next) {  
             Object k;  
             if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
                 V oldValue = e.value;  
                 e.value = value;  
                 return oldValue;  
             }  
         }  
   
         addEntry(hash, key, value, index);  
         return null;  
     }  
       
     void addEntry(int hash, K key, V value, int bucketIndex) {  
         Entry<K,V> e = table[bucketIndex];  
         table[bucketIndex] = new Entry<K,V>(hash, key, value, e);  
         if (size++ >= threshold) {  
             resize(2 * table.length); // 多个线程并发访问时，resize会调用transfer方法，而transfer方法会在某种情况下产生死循环。  
         }  
     }  
   
     static int hash(int h) {  
         h ^= (h >>> 20) ^ (h >>> 12);  
         return h ^ (h >>> 7) ^ (h >>> 4);  
     }  
   
     static int indexFor(int h, int length) {  
         return h & (length-1); // 这个比取模运算%速度快。  
     }  
 }  

通过上面的代码和注释，我们基本能够了解HashMap是干啥的，他是一个很简单很常用的数据结构，本身HashMap的性能很好，但是某些场景下，我们还是对HashMap做定制化的处理，使得其更高效。

例如Key是int的场景，专门编写IntHashMap能够获得更高效的性能，中间能够减少Integer对象的产生，减轻GC负担，同时，hash函数和遍历时的equals也能省下不少动作。一个性能比较数据如下：

测试的代码：

    Java代码   
    
  
 final int COUNT = 1000 * 10;  
 final int loopCount = 10000;  
 HashMap<Object, Object> map = new HashMap<Object, Object>(); // IntHashMap测试时相对应是IntHashMap  
   
 for (int i = 0; i < loopCount; ++i) {  
     for (int j = 0; j < COUNT; ++j) {  
         if (map.get(j) == null) {  
             map.put(j, value);  
         }  
     }  
 }  

结果：

Map类型	耗时	YoungGC	FullGC
IntHashmap<Object>	5,437	0	0
Hashmap<Integer, Object>	13,312	251	0

从结果来看，使用原生类型int替代Integer作为key，性能翻倍。

HashMap的性能是很好的，但不是线程安全的，最恶劣的并发问题就是table的resize时产生死循环。为了线程安全，我们通常需要使用ConcurrentHashMap，ConcurrentHashMap缺省能够支持16个并发写，而且不会产生令人十分讨厌的ConcurrentModificationException。可是ConcurrentHashMap的性能并不好，如上面的测试场景，测试性能数据如下：

Map类型	耗时	YoungGC	FullGC
IntHashmap<Object>	5,437	0	0
Hashmap<Integer, Object>	13,312	251	0
ConcurrentIntHashmap<Object>	21,452	0	0
ConcurrentHashmap<Integer, Object>	37,624	1409	0

通过测试数据看，ConcurrentHashMap的get性能要比HashMap性能要差很多，3倍多的差距。

有没有办法做到线程安全，但是get性能接近HashMap呢？答案是肯定的！ConcurrentHashMap其实就是一个Segment数组，每个Segment的功能类似一个HashMap，Segment是线程安全的，一个ConcurrentHashMap缺省包含16个Segment，所以支持16个并发写入，但大多数场景我们并不需要并发写入，需要的是线程安全并且高效查找。那么思路就很明确了，把ConcurentHashMap的Segement拿出来，修改成一个HashMap，性能如何？测试数据补上：

Map类型	耗时	YoungGC	FullGC	说明
IntHashmap<Object>	5,437	0	0	参考HashMap修改而成
Hashmap<Integer, Object>	13,312	251	0
ConcurrentIntHashmap<Object>	21,452	0	0	参考ConcurrentHashMap修改而成
ConcurrentHashmap<Integer, Object>	37,624	1409	0
FastConcurrentIntHashmap<Object>	5,703	0	0	参考ConcurrentHashMap.Segment修改而成
FastConcurrentHashmap<Integer, Object>	12,499	225	0	参考ConcurrentHashMap.Segment修改而成

从数据来看，FastConcurrentIntHashmap的性能非常好，接近IntHashMap了，FastConcurrentHashmap<Integer, Object>的性能则比HashMap速度还快一点点，可能是ConcurrentHashMap.Segement的实现更高效吧。

总结一下，我们可以参考HashMap编写IntHashMap，参考ConcurrentHashMap编写ConcurrentIntHashMap，参考ConcurrentHashMap.Segment编写专门针对读取优化的FastConcurrentHashMap，从而在特别场景下获得更快的性能。

同理，我们也可以参考HashMap和ConcurrentHashMap编写相应的CharArrayHashMap和CharArrayConcurrentHashMap，在特别场景下，能够获得比HashMap<String, Object>以及ConcurrentHashMap<String, Object>更好的性能。