【Java】Java HashMap的设计精髓

最新推荐文章于 2024-06-18 10:52:15 发布

devnn

最新推荐文章于 2024-06-18 10:52:15 发布

阅读量1.2k

点赞数

分类专栏： java Android

本文链接：https://blog.csdn.net/devnn/article/details/82085894

版权

Android 同时被 2 个专栏收录

61 篇文章 5 订阅

订阅专栏

java

18 篇文章 13 订阅

订阅专栏

HashMap是Java开发者最常用的集合类之一，今天阿楠结合jdk7的源码来对HashMap作一翻总结，盘点一下HashMap的设计精髓。了解源代码之前，先了解一下两位赫赫有名的HashMap源代码的作者。

Josh Bloch
Java 集合框架创办人，Joshua Bloch 领导了很多 Java 平台特性的设计和实现，包括 JDK 5.0 语言增强以及屡获殊荣的 Java 集合框架。2004年6月他离开了SUN公司并成为 Google 的首席 Java 架构师。此外他还因为《Effective Java》一书获得著名的 Jolt 大奖。

Doug Lea

纽约州立大学Oswego分校的计算机教授，在那里他专攻并发编程和并发数据结构设计。他曾是JCP（Java Community Process）执行委员会的一员，并担任Java 规范请求166（JSR Specification Request 166）的主席。JSR 166为Java加入了并发功能（详见Java并发）。他设计了util.concurrent开发包。

怎么样，这两位老人家是不是很厉害？更厉害的可能要数Java之父——詹姆斯·高斯林（James Gosling）,相信Java工程师都听过他的大名，在这里就不作介绍了。

下面是jdk7中HashMap源代码开头部分，可以看到有四位作者参与了该类的编写，前两位已经作了介绍。有几个重要成员变量大家也很熟悉，不熟悉的看一下注释就懂了，这里也不作详细介绍。

 * <p>This class is a member of the
 * <a href="{@docRoot}/../technotes/guides/collections/index.html">
 * Java Collections Framework</a>.
 *
 * @param <K> the type of keys maintained by this map
 * @param <V> the type of mapped values
 *
 * @author  Doug Lea
 * @author  Josh Bloch
 * @author  Arthur van Hoff
 * @author  Neal Gafter
 * @see     Object#hashCode()
 * @see     Collection
 * @see     Map
 * @see     TreeMap
 * @see     Hashtable
 * @since   1.2
 */

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable
{

  /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 16;

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

下面就来盘点一下HashMap的高明之处。

精髓一：链表散列结构

大家都知道，在Java语言里，要根据一个key快速找到它对应的value，肯定需要依靠数组，并且要采用散列的形式存储(依靠哈希值定位和查找)，但是散列的结构容易发生哈希冲突，如果哈希值相同怎么办？HashMap在数组的基础上引入了链表结构，巧妙地解决了这一问题。即将数组的元素设计成链表结构，而数组元素本身存储的是链表的表头，这就是链表散列结构。HashMap的整体结构如下图所示。它利用了数组存取快的优点并且加入链表解决了哈希冲突。

看到这里，阿楠要提醒一下了。其实这样设计还是有问题的，如果程序员对集合元素key的hash()函数重写不当。导致了大量哈希值相同的元素，那么在数组的同一位置就会产生很长的单链表，单链表一旦过长就会导致查找效率降低，HashMap极端情况下就会变成链表，而链表大家都知道它的缺点就是查找慢，因为它要对所有元素进行遍历。对此，jdk8对HashMap作了改进，将单链表设计成了红黑树结构。进一步优化了HashMap的结构。但是jdk8源代码比较复杂，在这里就不作参照了，不过大体原理差不多。

看一下HashMap的构造函数：

 /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new Entry[capacity];
        useAltHashing = sun.misc.VM.isBooted() &&
                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        init();
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs a new <tt>HashMap</tt> with the same mappings as the
     * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified <tt>Map</tt>.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        putAllForCreate(m);
    }

    // internal utilities

    /**
     * Initialization hook for subclasses. This method is called
     * in all constructors and pseudo-constructors (clone, readObject)
     * after HashMap has been initialized but before any entries have
     * been inserted.  (In the absence of this method, readObject would
     * require explicit knowledge of subclasses.)
     */
    void init() {
}

可以看到，无论调用哪个构造函数，最后都是调用了下面这个构造函数：

  public HashMap(int initialCapacity, float loadFactor) {

那么我们就将注意力放在这个终极构造函数：

 /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {

        //以下三个if判断都是对容量和加载因子进行过滤

        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " + loadFactor);

         //注意这里了，下面这个while循环就是阿楠今天要讲的精髓之一，大家暂时也不用关心，后面会讲。

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;

        //threshold就是hashmap的元素数量临界值，元素数量达到这个值，就会扩容。是否要扩容操在添加元素的时候进行判断。

        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);

       //重点：创建了一个容量为capacity的数组
        table = new Entry[capacity];

      //是否使用备选哈希函数，用来对key为String类型的hash函数进行特殊处理，减少hash值的碰撞。
        useAltHashing = sun.misc.VM.isBooted() &&(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        init();
    }

}

注意到构造函数中，创建了一个Entry类型的数组。这个数组就是用来存放元素的，而元素在数组中的位置是由元素key的哈希值计算的(后面会介绍)。当元素key的哈希值冲突怎么办呢？上文讲到了它将数组元素设计成了单链表。那么我们来看看Entry的结构，到底是不是一个单链表。

  static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return (key==null   ? 0 : key.hashCode()) ^
                   (value==null ? 0 : value.hashCode());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
}

可以看到有一个Entry类型的next变量就是存放下一个结点的。那么我们再看一看HashMap的put函数，进一步验证。

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {

        //将key为null的元素，单独处理，放在了数组下标为0的位置,下面有putForNullKey函数的源码。
        if (key == null)
            return putForNullKey(value);

        //重新计算一遍hash值，这里也是阿楠要讲一精髓之一了，先忽略。
        int hash = hash(key);

        //根据新计算出的hash值，找到对应的数组下标i，先忽略，后面会详细讲。
        int i = indexFor(hash, table.length);

        //下面就是遍历单链表了，查找是否有key相同的元素，key如果相同，就是将value进行替换。
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;

        //没有找到key相同的元素，那么就是在单链表表头添加一个新元素。下面会贴上addEnry函数的源码。
        addEntry(hash, key, value, i);
        return null;
    }

    /**
     * Offloaded version of put for null keys
     */
    private V putForNullKey(V value) {

        //这部分代码也很容易理解，就是在数组第一个位置插入元素，先判断key是否为null,再将value进行替换。因为数组下标为0的位置也有可能key不为null。但是key为null的元素一定是放在了数组下标为0的位置。

        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
}

可以看到，put函数中允许了key为null值的元素，并且将key为null的元素放在了数组下标为0的位置。下面，我们接着看addEntry函数的源码。

     /**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
         //当元素数量达到临界值，就会进行扩容操作，新的容量是原来容量的两倍。
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
        //创建新的元素
        createEntry(hash, key, value, bucketIndex);
    }

    /**
     * Like addEntry except that this version is used when creating entries
     * as part of Map construction or "pseudo-construction" (cloning,
     * deserialization).  This version needn't worry about resizing the table.
     *
     * Subclass overrides this to alter the behavior of HashMap(Map),
     * clone, and readObject.
     */
    void createEntry(int hash, K key, V value, int bucketIndex) {
       //下面是经典的单链表插入表头的算法：先将表头元素记录下来，再将新表头重新赋值。
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash,  key, value, e);
        size++;
    }

添加元素时，进行了容量判断，超过临界值就会扩容。扩容操作就是新建一个容量为原来两倍的数组，将原来的元素复制过来。讲到这里，HashMap的整体结构已经很清晰了。感兴趣的同学可以看一看查找元素的代码，阿楠就不作阐释了。

HashMap查找元素的源代码：

  /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

    /**
     * Offloaded version of get() to look up null keys.  Null keys map
     * to index 0.  This null case is split out into separate methods
     * for the sake of performance in the two most commonly used
     * operations (get and put), but incorporated with conditionals in
     * others.
     */
    private V getForNullKey() {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

    /**
     * Returns <tt>true</tt> if this map contains a mapping for the
     * specified key.
     *
     * @param   key   The key whose presence in this map is to be tested
     * @return <tt>true</tt> if this map contains a mapping for the specified
     * key.
     */
    public boolean containsKey(Object key) {
        return getEntry(key) != null;
    }

    /**
     * Returns the entry associated with the specified key in the
     * HashMap.  Returns null if the HashMap contains no mapping
     * for the key.
     */
    final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
}

现在大家对HashMap的整体结构已经有所了解，即链表散列结构。下面阿楠再详细讲一讲HashMap的细节之美。

精髓二：数组容量为什么必须是2的n次幂

在HashMap的构造函数中，对传入的容量initialCapacity重新进行了计算，而没有直接使用initialCapacity作为数组容量。回顾一下构造函数：

        //注意这里了，下面这个while循环就是阿楠今天要讲的精髓之二。
        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;

        //threshold就是hashmap的元素数量临界值，元素数量达到这个值，就会扩容。是否要扩容操在添加元素的时候进行判断。

        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);

       //重点：使用新计算的容量创建了数组
        table = new Entry[capacity];

重点看一下这个while循环：

    int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

初始capacity的值是1，如果capacity小于传入的容量，则将capacity左移一位，大家都知道，对一个数左移一位，即是将这个数乘以2。最后capacity的值就是最接近initialCapacity的2的n次幂。而2的n次幂的二进制很有规律，高位是1，其它都是0。举例：

2的二进制：0000 0000 0000 0000 0000 0000 0000 0010

4的二进制：0000 0000 0000 0000 0000 0000 0000 0100

8的二进制：0000 0000 0000 0000 0000 0000 0000 1000

16的二进制：0000 0000 0000 0000 0000 0000 0001 0000

计算后的capacity即数组的长度length，它的二进制低位全是0，高位是1。更神奇的是，length-1的结果，低位全是1，高位是0。举例：

1的二进制：0000 0000 0000 0000 0000 0000 0000 0001

3的二进制：0000 0000 0000 0000 0000 0000 0000 0011

7的二进制：0000 0000 0000 0000 0000 0000 0000 0111

15的二进制：0000 0000 0000 0000 0000 0000 0000 1111

这就是重新计算容量的目的，它保证的容量一定要是2的n次幂，将为后面的计算数组下标提供良好的支持。

精髓三：从哈希值到数组下标

HashMap源代码的put函数中，对key的hash值重新进行了计算(源码在上面)，得出了新哈希值hash，并通过indexFor函数得出了元素在数组中的下标：

        //重新计算一遍hash值，即是阿楠要讲的精髓之三。
        int hash = hash(key);
        //根据新计算出的hash值，找到对应的数组下标i。
        int i = indexFor(hash, table.length);

那么这样计算有什么道理呢？大家都知道，根据key的hash值转换成数组下标，有一种更加直接的方法，即取模运算。key的hash()函数返回的是int类型，但是这个哈希值可能会超过了数组上限，那么我们进行取模(求余)运算不就行了吗？即数组下标index=key.hash() % (table.length-1)，计算的结果一定是在数组下标范围之内。但是HashMap并没有这样计算。我们来看看源代码中hash(key)函数和indexFor函数的实现：

 /**
     * Retrieve object hash code and applies a supplemental hash function to the
     * result hash, which defends against poor quality hash functions.  This is
     * critical because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }

        h ^= k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
}

可以看到hash(key)函数重新计算出一个哈希值，其中进行了各种位操作，其实这样做的目的只有一个，就是减少key的hash值碰撞。

举个例子，假设h^=h.hashCode()后h的值是0x7FFFFFFF，它的二进制除了符号位之外全是1，经过上述各种位运算的过程如下：

最后返回的二进制数已经不是连续的全1，而是0与1都有，这就是hash(key)函数的精髓，它保证了一个数低位如果连续都是1，那么就打乱它的这种连续性。至于为什么要打乱，下面会进一步阐释。

indexFor函数很好理解，即将哈希值与length-1进行位与运算，而数组长度在构造函数中已经作了处理，将数组长度控制在了2的n次幂，上面一节精髓二已经讲到了。length-1的结果，二进制低位全是1，高位是0，那么length-1与哈希值进行位与运算的结果是什么？那就是取哈希值的低位，并且结果不会超过length-1，从而保证了下标不会越界。那么为什么要使用位运算，而不是取模，很简单，因为位运算的效率远远高于取模运算。

那么hash(Object k)函数的作用是什么呢？它为什么要打乱key的hashCode二进制排序顺序呢？因为如果多个key的哈希值的低位是一样的，高位不一样，那么取低位的时候，就有可能取到相同的结果，这样计算的数组下标就是一样的，最后就导致数组中某一个位置的单链表过长，从而降低了HashMap的检索效率。

HashMap的精髓已经讲完了，相信大家已经有所了解，有疑问的同学可以在下面留言。阿楠会在第一时间回复。

devnn

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
【Java】Java HashMap的设计精髓

HashMap是Java开发者最常用的集合类之一，今天阿楠结合jdk7的源码来对HashMap作一翻总结，盘点一下HashMap的设计精髓。了解源代码之前，先了解一下两位赫赫有名的HashMap源代码的作者。Josh BlochJava 集合框架创办人，Joshua Bloch 领导了很多 Java 平台特性的设计和实现，包括 JDK 5.0 语言增强以及屡获殊荣的 Java 集合框架。2...
复制链接

扫一扫

专栏目录