1.8HashMap源码分析

1、类的基本结构

HashMap则是采用拉链法解决哈希冲突最具代表性且最广泛被使用的应用。在1.7及之前版本完全是采用链表来解决,而1.8版本中则比较复杂,在链表的基础上,当链表达到一定长度的时候会转换为红黑树。先读下API文档

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.(除了HashMap不同步而且允许空值外,其余与hashtable完全相同)) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time(特别的,它不保证其顺序随着时间变化而变化或者不变).
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses(分散) the elements properly (合适)among the buckets. Iteration over collection views requires time proportional(成比例) to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings)(集合视图的遍历与其容量+映射数量成比例). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased(容量自动增长之前,允许该hashmap有多满的一种检测方式). When the number of entries in the hash table exceeds the product of the load factor and the current capacity(当容器的中的entry数量超过了负载因子与当前容量的乘积), the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately (大约)twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff (权衡)between time and space costs. Higher values decrease the space overhead (减少空间)but increase the lookup cost (增加查询成本)(reflected in most of the operations of the HashMap class, including get and put(影响HashMap中的大部分操作,包括get和set)). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down(减缓) performance of any hash table. To ameliorate impact(为了改善影响), when keys are Comparable, this class may use comparison order among keys to help break ties(断绝链接).
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification(修改一个已经存在对应key对应的value,这种不属于结构型修改).) This is typically accomplished by synchronizing on some object that naturally encapsulates the map(同步一般通过封装该map到一些同步对象来完成). If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized (防止意外的非同步的)access to the map:
     Map m = Collections.synchronizedMap(new HashMap(...));
The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException(迭代器遍历时,如果不是通过其提供的方法删除且发生了结构上的修改,迭代器将会快速失败并且返回一个并发修改异常). Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.(而不是冒着在未来产生任意风险、不确定的行为的风险往下执行)
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification(在非同步并发修改时,不可能做出任何硬保证). Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
This class is a member of the Java Collections Framework.
  
  
  
public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {

注释大意为: HashMap实现了基本的Map接口,这个实现提供了map的所有操作方法。允许null的key和value。除开HashMap的非同步性和允许空key和value之外,HashMap与HashTable等价。但是HashMap并不保证顺序性。 这个实现提供了恒定性能的get和put方法,假设HashMap将所有的元素均匀的分散在buckets中,对这个容器进行迭代或者汇总的时间与容器的容量成正比,因此,不要将初始化的容量设置得过大或者负载因子设置得过低对于迭代的性能是非常重要的。 HashMap的一个构造方法有两个参数,初始化容量和负载因子。初始容量是hash表创建时初始buckets的数量,负载因子则是哈希表允许其内部元素有多满的度量,在其容量扩容之前获取,当哈希表的size超过了负载因子和当前容量的乘积,这个哈希表通过rehash方式进行重建扩容,大概是之前buckets数量的两倍。 通常情况下,负载因子的大小是0.75,这是对时间和空间的权衡。数值越高,则空间的开销越小,但是检索的时间成本就越高。包括get和put方法。设置初始容量的时候,需要考虑map中预期的条目数和其负载因子,以便最小化rehash操作的次数,如果初始化的容量大于条目数除以负载因子,则不会发生rehash操作。 如果有许多数据要存储在HashMap的实例中,那么足够大的初始化容量来创建这个哈希表将比让这个哈希表随着元素的添加而自动扩容更加有效率。注意,如果有多个元素的hashcode相同,这会导致hashTable的性能降低。为了改善这个影响,可以将Comparable作为key,使用这些元素之间的比较顺序来避免这个问题。 注意HashMap是非同步的,如果多线程并发环境下,最后的这个线程如果修改了HashMap的结构,它必须在外部加上同步方法。结构修改是指任何添加或者删除操作,仅仅是改变value则不是属于结构修改。这通常是在自然封装的映射对象上同步。 如果没有这些对象,那么请使用Collections.synchronizedMap方法。最好是在对象创建完成之后访问,以防止不同步的访问方法。

Map m = Collections.synchronizedMap(new HashMap(...));

迭代器返回了这个类的所有集合视图方法,这个方法也是fail-fast的,在迭代器创建之后,如果Map的结构被修改,将会抛出ConcurrentModificationException。因此,在并行修改之前,迭代器最好死fail-fast,而不是冒险执行哪些未确定的行为。 注意,迭代器快速失败行为并不能得到保证,因为通常而言,在非同步的并发修改时不可能做出任何确定性的保证。fail-fast机制将最大努力的抛出ConcurrentModificationException,但是你的程序需要依赖这个异常来保证其正确性,那就错了,fail-fast机制只能用来检测bug。

2、成员变量及常量

2.1 常量

    /*
     * Implementation notes.
     *
     * This map usually acts as a binned(装箱) (bucketed——桶装的) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins(大多数方法都是用一般的bin结构), 
     * but relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated(人口过多). However, since the vast majority of(绝大多数) bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.
     *
     * Tree bins (i.e., bins whose elements are all TreeNodes) are
     * ordered primarily by hashCode(以hashcode为序), but in the case of ties, if two
     * elements are of the same "class C implements Comparable<C>",
     * type then their compareTo method is used for ordering. (We
     * conservatively (保守的)check generic types via reflection to validate
     * this -- see method comparableClassFor).  The added complexity
     * of tree bins(添加树的复杂度) is worthwhile in providing worst-case O(log n)
     * operations when keys either have distinct hashes or are
     * orderable(当keys的hash值不同或者是可排序时,树的插入复杂度时logn), Thus, performance      		 * degrades (性能退化)gracefully under
     * accidental or malicious usages(在意外或恶意使用) in which hashCode() methods
     * return values that are poorly distributed(分布不均), as well as those in
     * which many keys share a hashCode, so long as they are also
     * Comparable. (If neither of these apply, we may waste about a
     * factor of two in time and space compared to taking no
     * precautions(与不采取措施相比,可能会浪费大约二倍的空间和时间). But the only known cases stem from poor user
     * programming practices that are already so slow that this makes
     * little difference.)
     *
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant(授权) use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity(调整粒度). Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     * The root of a tree bin is normally its first node.  However,
     * sometimes (currently only upon Iterator.remove), the root might
     * be elsewhere, but can be recovered following parent links
     * (method TreeNode.root()).
     *
     * All applicable internal methods accept a hash code as an
     * argument (as normally supplied from a public method), allowing
     * them to call each other without recomputing user hashCodes.
     * Most internal methods also accept a "tab" argument, that is
     * normally the current table, but may be a new or old one when
     * resizing or converting.
     *
     * When bin lists are treeified, split, or untreeified, we keep
     * them in the same relative access/traversal order (i.e., field
     * Node.next) to better preserve locality, and to slightly
     * simplify handling of splits and traversals that invoke
     * iterator.remove. When using comparators on insertion, to keep a
     * total ordering (or as close as is required(尽可能接近是必须的) here) across
     * rebalancings, we compare classes and identityHashCodes as
     * tie-breakers.
     *
     * The use and transitions among plain vs tree modes is
     * complicated by the existence of subclass LinkedHashMap. See
     * below for hook methods defined to be invoked upon insertion,
     * removal and access that allow LinkedHashMap internals to
     * otherwise remain independent of these mechanics. (This also
     * requires that a map instance be passed to some utility methods
     * that may create new nodes.)
     *
     * The concurrent-programming-like SSA-based coding style helps
     * avoid aliasing errors amid all of the twisty pointer operations.
     */

    /**
     * The default initial capacity - MUST be a power of two.
     * 默认的初始容量,必须为2的次方
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * 最小树化容量,必须节点数达到时才会进行树化,否则resize
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

注释的大意:

​ 这个map通常是由一个个bin(bucket)组成的哈希表。当某个bin的长度变大之后,将会转换为红黑树的TreeNode,这比较类似于java.util.TreeMap的结构,大多数方法都使用bin的数据结构(可以用instanceof 来判断是否是一个Node)。树化的bin也支持普通bin的操作,但是树化之后,查询性能会高于传统的bin,然而,由于正常情况下,绝大多数的bucket都不会有过多的数量,因此,检查是否存在树化的bin方法可能会被延迟。

​ 树化的bins,其每个元素都是TreeNode,根据hashcode进行排序,但是如果在hashcode相同的情况下,如果有两个元素具有相同的Comparable实现,则他们会通过compareTo方法进行排序。我们保守的通过反射验证泛型方法。当key具有不同的hash值或者可排序的时候,增加树化容器的复杂性,提供了最坏为O(log n)的时间复杂度,因此,在意外或者恶意使用的情况下,hashcode的值要是离散程度不够的话,性能会优雅的下降。如果有许多key的hashcode相同,只要他们是可以比较的,hashMap也是支持的。如果这两种方式都不适用,那么不采取预防措施相比,我们可能会在时间和空间上浪费大约两倍的时间,这种情况的唯一的原因是由于不良的编码所致。

​ 因为TreeNode节点的大小大约是常规节点的两倍,所以我们仅在bins包含足够多的节点时才被授权使用(参阅TREEIFY_THRESHOLD),当每个树化的bin的数量变小时,他们又会转换为普通的bin,即变成链表。在使用离散性能很好的hashCodes方法下,很少会造成树化的hashMap。在随机的hashCodes方法中,bin节点的频率服从泊松分布,(http://en.wikipedia.org/wiki/Poisson_distribution)由于默认的负载因子为0.75,平均参数约为0.5,尽管由于调整粒度的差异很大,在忽略方差的情况下,列表的大小k的预期出现的次数是:

(exp(-0.5)* pow(0.5,k)/ factorial(k)) 计算的值如下表:

冲突大小概率
00.60653066
10.30326533
20.07581633
30.01263606
40.00157952
50.00015795
60.00001316
70.00000094
80.00000006

之后都低于千万分之一。

树化的bin的根通常是它的第一个节点,然而,有时候根可能会在其他地方(目前只在iterator.remove上出现),但是可以恢复到父节点,通过TreeNode.root()方法执行。

所有使用的内部方法都接受一个hashcode参数,(通常由公共的方法提供,Object对象就有),允许无须重新计算的用户hashcode既可互相调用。大多数的内部方法也接受一个tab参数,通常是当前的表,但也可能是新表或者旧表,在调整大小的时候或者转换的情况下

当bin列表被树化,拆分或者未被树化时,我们将其保持在相同的相对访问/遍历顺序,即Node的next属性中。并略微简化的对调用iterator.remove的拆分和遍历处理。当在插入时使用comparators时,为了在总体排序下重新平衡,我们将类和identityHashCodes进行最终的比较。

其子类LinkedHashMap在链表和红黑树之间的转换变得非常复杂,请参考下面的hook方法,这些钩子函数定义在插入、删除、和访问时被调用,这些方法允许LinkedHashMap内部保持独立于当前的机制。这还要求将map实例传递给一些可能创建新节点的实用程序方法。 类似于并行编程的基于SSL的编码风格很有帮组,能避免在所有扭曲的指针操作中出现混叠错误。

2.2 成员变量

    /* ---------------- Fields -------------- */

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate(容忍) length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.(当前不需要这种自举机制))
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

2.3 内部类

2.3.1 Node

类Node实现了了Map.Entry接口。这个接口中定义了一系列的方法,如getKey、getValue、setValue、equals、hashCode等方法。还实现了比较器Comparator。 这是一个基本的bin的node节点,TreeNode是其子类,在LinkedHashMap中,其Entry是Node的子类。

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
      	//如果key和value是同一个对象,那么其hash值异或后为0
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

Node是构成bucket中的链表的基本元素,其主要的属性有key和value两个泛型类型的成员变量。然后由于是链表结构,其还维护了一个next的指针。指向其下一个元素。 需要注意的是其hashcode方法: Objects.hashCode(key) ^ Objects.hashCode(value)。 也就是说,如果key和value为同一对象的话,Node的hashcode为0。 另外还实现了equals方法,当二者的key和value都相等或者equalse的方法为true的时候返回true。

2.3.2 TreeNode

TreeNode则是hashMap树化之后,组成树的基本节点。需要注意的是,TreeNode继承了LiknedHashMap.Entry ,LinkedHashMap.Entry又继承了Node。

2.3.3 视图内部类KeySet,Values,EntrySet

  1. KeySet

    HashMap提供了一个返回所有key的视图。其继承了AbstractSet,其中的元素还是Map.Entry<K,V>。实际上就是上面哈希表中的Node。或者TreeNode。

        /**
         * Returns a {@link Set} view of the keys contained in this map.
         * The set is backed by (支持)the map, so changes to the map are
         * reflected in the set, and vice-versa.  If the map is modified
         * while an iteration over the set is in progress (except through
         * the iterator's own <tt>remove</tt> operation), the results of
         * the iteration are undefined.  The set supports element removal,
         * which removes the corresponding mapping from the map, via the
         * <tt>Iterator.remove</tt>, <tt>Set.remove</tt>,
         * <tt>removeAll</tt>, <tt>retainAll</tt>, and <tt>clear</tt>
         * operations.  It does not support the <tt>add</tt> or <tt>addAll</tt>
         * operations.
         *
         * @return a set view of the keys contained in this map
         */
        public Set<K> keySet() {
            Set<K> ks = keySet;
            if (ks == null) {
                ks = new KeySet();
                keySet = ks;
            }
            return ks;
        }
    
        final class KeySet extends AbstractSet<K> {
            public final int size()                 { return size; }
            public final void clear()               { HashMap.this.clear(); }
            public final Iterator<K> iterator()     { return new KeyIterator(); }
            public final boolean contains(Object o) { return containsKey(o); }
            public final boolean remove(Object key) {
                return removeNode(hash(key), key, null, false, true) != null;
            }
            public final Spliterator<K> spliterator() {
                return new KeySpliterator<>(HashMap.this, 0, -1, 0, 0);
            }
            public final void forEach(Consumer<? super K> action) {
                Node<K,V>[] tab;
                if (action == null)
                    throw new NullPointerException();
                if (size > 0 && (tab = table) != null) {
                    int mc = modCount;
                    for (int i = 0; i < tab.length; ++i) {
                        for (Node<K,V> e = tab[i]; e != null; e = e.next)
                            action.accept(e.key);
                    }
                    if (modCount != mc)
                        throw new ConcurrentModificationException();
                }
            }
        }

    注释大意为,提供了一个包含map中全部key的视图,需要注意的是,这里仅仅是一个视图,对集合的任何操作都会反应到视图中,同理,在视图中的任何操作也会反馈到Map上。可以在视图中对元素进行删除等操作,具体支持的操作有Iterator.remove、Set.remove、removeAll、retainAll和clear操作。但是并不支持add和addAll操作

    实际上通过源码可以看到,支持的这些操作只是在类中对Map本身属性或者方法的封装。 我们对KeySet的最基本的操作就是通过keySet获取一个迭代器,之后对其中的key进行迭代。通过源代码可以发现,如果调用KeySet.contains和调用EntrySet.contains实际上都是在对哈希表中的table进行操作。只是在返回的时候在forEach中的accept方法中只传入了key:

     action.accept(e.key);

    这是keySet与valueSet、EntrySet最大的区别。 因此我们在遍历HashMap的时候直接通过EntrySet就能完成。而不是很多人认为的需要先遍历Key再get。

  2. Values

    Values与keySet同理,只是forEach方法中的accept不同。

  3. EntrySet

    其原理也与keySe和Values相同:

    /**
     * Returns a {@link Set} view of the mappings contained in this map.
     * The set is backed by the map, so changes to the map are
     * reflected in the set, and vice-versa.  If the map is modified
     * while an iteration over the set is in progress (except through
     * the iterator's own <tt>remove</tt> operation, or through the
     * <tt>setValue</tt> operation on a map entry returned by the
     * iterator) the results of the iteration are undefined.  The set
     * supports element removal, which removes the corresponding
     * mapping from the map, via the <tt>Iterator.remove</tt>,
     * <tt>Set.remove</tt>, <tt>removeAll</tt>, <tt>retainAll</tt> and
     * <tt>clear</tt> operations.  It does not support the
     * <tt>add</tt> or <tt>addAll</tt> operations.
     *
     * @return a set view of the mappings contained in this map
     */
    public Set<Map.Entry<K,V>> entrySet() {
        Set<Map.Entry<K,V>> es;
        return (es = entrySet) == null ? (entrySet = new EntrySet()) : es;
    }
    
    final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
        public final int size()                 { return size; }
        public final void clear()               { HashMap.this.clear(); }
        public final Iterator<Map.Entry<K,V>> iterator() {
            return new EntryIterator();
        }
        public final boolean contains(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry<?,?> e = (Map.Entry<?,?>) o;
            Object key = e.getKey();
            Node<K,V> candidate = getNode(hash(key), key);
            return candidate != null && candidate.equals(e);
        }
        public final boolean remove(Object o) {
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>) o;
                Object key = e.getKey();
                Object value = e.getValue();
                return removeNode(hash(key), key, value, true, true) != null;
            }
            return false;
        }
        public final Spliterator<Map.Entry<K,V>> spliterator() {
            return new EntrySpliterator<>(HashMap.this, 0, -1, 0, 0);
        }
        public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
            Node<K,V>[] tab;
            if (action == null)
                throw new NullPointerException();
            if (size > 0 && (tab = table) != null) {
                int mc = modCount;
                for (int i = 0; i < tab.length; ++i) {
                    for (Node<K,V> e = tab[i]; e != null; e = e.next)
                        action.accept(e);
                }
                if (modCount != mc)
                    throw new ConcurrentModificationException();
            }
        }
    }

    可以通过entry进行remove。forEach方法中:

    action.accept(e);

    此时是整个对象。

2.3.4 迭代器HashIterator及并行迭代器HashMapSpliterator

hashMap中的剩余内部类都是与迭代器相关的,再单线程模式下,我们会使用Iterator,在前面说到,HashMap实际上是将内部的table分成了KeySet、Values、EntrySet等三部分视图。那么就需要KeyIterator、ValueIterator和EntryIterator配合一起进行迭代。考虑到HashMap的广泛使用场景,如果要加快遍历迭代的速度,可能会在多线程下进行,因此HashMap内部还提供了支持并行的迭代器KeySpliterator、ValueSpliterator、EntrySpliterator等。

3、基本原理

3.1 HashMap基本机构

  • bucket:HashMap中数组对应的索引位置,或者称为槽位。实际上就是数组table的每一项元素。见下图,在HashMap中,根据每个Node的key的hashcode,再与table的size取模,计算出对应的bucket。
  • bin :在HashMap中,当有多个元素的key都计算到同一个bucket之后,那么将通过链表或者红黑树的方式组合取来。这个链表/红黑树就被称为一个bin。

上图仅仅表示HashMap的基本构成。红黑树和HashMap的bucket总数不具备现实中的参考意义。 通过上图我们可以知道,HashMap实际上就是一个内部由Node组成的数组加链表/红黑树结构。 **当链表的长度大于或者等于8的时候,同时HashMap的size大于等于64的时候,入果此时没有触发HashMap扩容,那么这个bin将由链表变成红黑树。**链表转红黑树的条件必须要注意,不一定为8就会直接转。可能会导致table扩容。因此,在size小于64的时候,可能会出现bin的长度大于8的情况。

3.2 位运算操作

3.2.1 扩容

在HashMap中,当size触发threshold之后,会进行扩容,扩容是采用的位移计算:

newCap = oldCap << 1

也就是说,HashMap的size大小通常是2的幂。由于初始容量为16,每扩容一次,容量就增加2倍。需要注意的是,HashMap没有提供缩容机制。这个Hashmap只能扩大,不能缩小。

3.2.2 bucket计算

HashMap的另外一个很重要的地方就是如何计算bucket,我们知道,在常规情况下,我们可以通过取模%来实现。但是熟悉计算机底层的人都知道,计算机对于位运算的操作是最快的。在计算机系统中,当b为2的n次幂的时候:

//当b为2的n次幂的时候
a % b = a & ( b - 1 ) 

那么hashMap既然其初始长度为16而且每次都以左移扩展。那么显然符合上述规律。我们计算bucket的时候的方法如下:

first = tab[(n - 1) & hash]

first表示hash计算后的bucket的节点。通过(n-1)&hash很快就计算出了bucket。 上述原理可以通过如下值计算,如189和205的hash值,按n=16来计算:

计算的结果都是13。

3.2.3 split

我们知道,hashmap会扩容,那么扩容之后,原来的元素怎么分配呢?如果没有什么规律,比如扩容每次加1,从8扩容到9,显然没有任何规律可言,全部节点重新计算一遍取模。但是这种方式对于hashMap来说显然是低效的。在HashMap中,每次都是以2的倍数扩容。也就是说,每扩容一次,容量增加2倍。这就有规律可循了。肯定会将原有的节点分为两部分,一部分还在原有的bucket,而分出来的部分,将会是原来的索引加上新增的长度oldsize+index。即将原来的元素分为高低位两部分,低位继续在原有的bucket,而高位则是扩容出来的新位置。 这个区分高低位的算法非常巧妙:

if ((e.hash & bit) == 0)

这里bit是原有的size大小。

我们用之前的例子进行说明,扩容之后为32, n=31计算如下:

我们可以看出,实际上就是在原来的基础上增加了1位,所以高位的索引位置很容易就得出了是13+16=29。之后,这两个数字就是在增加的这位上的反应不一样导致会分配到不同的索引。因此实际上我们只用关注这个新增加的位即可: size为16:

新增加的位和全部数据计算之后要么为0,要么不为0,等于16。考虑到代码的通用性,实际上不管怎么扩容,只要为0就说明保持低位不变。否则就将该节点放到高位。 可见Hashmap不愧为大神级的代码,在一开始就考虑到了扩容、索引、拆分的效率问题。

3.2.4 为什么HashMap的DEFAULT_INITIAL_CAPACITY为16

这也是在面试中经常会碰到的一个问题。关于这个问题,实际上是对计算机底层基本原理的考察,是一个非常有深度的问题。 我们知道,HashMap中为了性能的提升,采用了很多位运算来实现,如扩容、索引、拆分等。正如上面三部分所示。因此这就要求hashmap的初始化长度为2的幂。如果不是,那么第一次扩容之后split和bucket就会出问题。 那么其初始长度必须是 2、4、8、16、32等等。这些2的幂来构成。 但是为什么是16呢?这个没有资料来说明,个人觉得,应该是个经验值。如果这个值太小,那么一上来就会扩容,如果太大则会造成空间浪费,16显然是个中间的数字。

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

注释中也说了,这个数字必须是2的幂。

4、构造函数

4.1 HashMap()

我们使用最多的就是这个无参的构造函数:

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

它只会设置默认的负载因子为0.75,默认容量为16。

4.2 HashMap(int initialCapacity)

Hashmap还提供了指定初始化容量的构造函数:

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

其默认的负载因子为0.75。

4.3 HashMap(int initialCapacity, float loadFactor)

可以同时指定初始化容量和负载因子:

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

对initialCapacity的有效范围进行判断,其范围位于0-MAXIMUM_CAPACITY之间,否则,则会抛出IllegalArgumentException异常。loadFactor不能小于0或者不是一个数字。之后根据数量计算tableSizeFor。

4.4 HashMap(Map<? extends K, ? extends V> m)

HashMap也可以将另外一个HashMap直接变成一个新的HashMap。

/**
 * Constructs a new <tt>HashMap</tt> with the same mappings as the
 * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
 * default load factor (0.75) and an initial capacity sufficient to
 * hold the mappings in the specified <tt>Map</tt>.
 *
 * @param   m the map whose mappings are to be placed in this map
 * @throws  NullPointerException if the specified map is null
 */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

这个构造函数底层调用的是putMapEntries。其可以将Map的Entrys全部插入。

/**
 * Implements Map.putAll and Map constructor.
 *
 * @param m the map
 * @param evict false when initially constructing this map, else
 * true (relayed to method afterNodeInsertion).
 */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    //当插入的map不为空的时候
    if (s > 0) {
       //如果table为空
        if (table == null) { // pre-size
            //计算负载因子
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
            //计算出阈值。
        }
        //如果插入的size大于阈值则扩容。
        else if (s > threshold)
            resize();
        //遍历插入
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

同时putMapEntries也是putAll内部的实现方法。也就是说putAll与通过new一个构造函数等价。

5、重要方法

5.1 tableSizeFor

/**
 * Returns a power of two size for the given target capacity.
 * 其实就是返回一个最接近cap的2次幂的值
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    //将最高位的1移动倒数第二位
    n |= n >>> 1;
    //将最高两位的1移动到倒数三四位
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

5.2 get

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 * 返回空并不是说map没有对应的key,可能key对应的值位空
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

get方法中隐藏的有两个方法,getNode和hash。

/**
 * Implements Map.get and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
      	//检查bucket处的第一个节点是否满足,也就是key相等,如果满足直接返回
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
          	// 树节点
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
              //链表
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
  	//没找到
    return null;

get方法中核心的就是根据hash值,从链表或者红黑树中搜索结果。如果为红黑树,则通过红黑树的方式查找。因为红黑树是颗排序的树,红黑树的效率会比链表全表扫描有显著提高。

5.3 hash

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide(碰撞). (Among known examples are sets of Float keys
     * holding consecutive(连续) whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

其注释大意为,计算hashcode,将较高的位扩展到较低的位,因为hash表的长度都是2的幂,而我们在之前bucket索引计算的时候可以发现,实际上大于size长度的高位,根本没有参与计算。因此,我们需要一个折衷的办法,将高位部分也能参与到计算中来。这样可以使得数据更加平均的分布在系统中。

5.4 put

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

底层使用putVal

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
  	//如果tab为空,则resize
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
  	//如果hash对应的bucket为空,插入节点
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
      	//如果通过计算的bucket对应的entry就是需要查找的,则直接返回即可
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
      	//如果对应的bucket的节点为TreeNode树形节点,则按照树形的查找方式进行插入或者查找
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
          // 否则就是Bin,按照链表的方式查找
            for (int binCount = 0; ; ++binCount) {
              	//如果达到链表尾部依然没有找到,则插入该节点
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                  	//如果该链表长度达到阈值,则树化,然后跳出循环
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
              	//如果未到链表尾部,就已经找到节点,则直接返回该节点
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
              	//继续查找下一个
                p = e;
            }
        }
      	// 如果在链表或者树形结构寻找到一个已经存在的节点
        if (e != null) { // existing mapping for key
          	//ge那句onlyIfAbsent决定是替换还是不变
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
          	// 子类实现可做相应的后置处理
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
  	//超过resize的大小阈值
    if (++size > threshold)
        resize();
  	// 插入之后的平衡操作
    afterNodeInsertion(evict);
    return null;
}

5.5 resize

resize方法对hashMap的table进行扩容。这是一个非常重要的方法,在HashMap中,如果触发了阈值threshold,则会调用resize方法。在实际上,在new HashMap指定容量的时候,table并没有赋值,是空值,只是根据传入的cap计算出了阈值threshold。

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    // 旧容量值大于0,也就是不是第一次设置值导致的resize
    if (oldCap > 0) {
      	//已经达到最大值,无法扩容
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
      	//2倍之后小于最大容量值,且旧容量值大于默认初始容量16
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
          	//新的resize阈值为旧的两倍
            newThr = oldThr << 1; // double threshold
    }
  	// 旧容量小于等于0但是阈值大于0的情况,只有在构造函数时指定了初始容量,而且第一次插入时才会走到该条件,
  	// 此时的oldThr其实就是tableSizeFor(oldCap)也就是最接近初始容量的二次幂,此时的oldCap其实也可以不是2的n次方,毕竟是用户自己指定的
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
  	// 以下情况其实就是默认构造函数,没有制定初始容量的情况
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;//16
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);//16*0.75=12
    }
  	// 1、其实就是之前tab不为空,但是扩容之后的新容量不小于最大容量
  	// 2、或者是旧容量不超过16
  	// 从上面可以看出,此时无法赋值给阈值
  	// 3、再有就是构造时,指定了初始容量,第一次设置值时会进来,然后重算它的阈值
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
          	//bucket不为空
            if ((e = oldTab[j]) != null) {
              	//置空旧bucket
                oldTab[j] = null;
              	// 只有一个头节点,直接赋值给新tab即可
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
              	// 树形节点时,按照树的结构拆分给新的tab
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                      	//表示e的hash值的二进制最高位不位1,这里的最高位取决于OldCap有多少位,如oldCap=32,那么就第6位,其需要移动到低位Hash中(也就是索引不变)
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                      	//与上面的情况相反,移动到高位Hash中,也就是oldIndex+oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                  	//低位链表不为空,直接赋值给新表的oldIndex
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                  	//反之赋值给新表的oldIndex+oldCap
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

5.6 treeifyBin

/**
 * Replaces all linked nodes in bin at index for given hash unless
 * table is too small, in which case resizes instead.
 */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
  	//如果数组表为空或者表长度小于最小的树化容量长度64,则直接resize就好
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
          	//将该链表节点转换为树节点
            TreeNode<K,V> p = replacementTreeNode(e, null);
          	//第一次循环
            if (tl == null)
              	设置头节点
                hd = p;
            else {
              	//不是第一次进来,则设置p的前置节点位tl也就是尾节点
                p.prev = tl;
              	//尾节点下一个节点为p
                tl.next = p;
            }
          	//尾节点为p
            tl = p;
        } while ((e = e.next) != null);
      	//上面的循环,其实就是按照链表的顺序构造一条一模一样的树形节点的链表
        if ((tab[index] = hd) != null)
          	//头节点不为空,通过树化方法转换为红黑树
            hd.treeify(tab);
    }
}

// For treeifyBin
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
   return new TreeNode<>(p.hash, p.key, p.value, next);
}

此处将链表先按链表次序转为TreeNode节点,此时的TreeNode节点还是一个与原来相同的链表,只是将元素类型进行了替换。之后再调用TreeNode的树化方法。那么这个新组成的树,同时具有了链表和红黑树的特性。在拆分遍历的时候可以用链表,在查找的时候可以用红黑树。

5.7 remove

所有的remove方法最终调用的都是下面的方法

/**
 * Implements Map.remove and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to match if matchValue, else ignored
 * @param matchValue if true only remove if value is equal
 * @param movable if false do not move other nodes while removing
 * @return the node, or null if none
 */
final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
      	// bucket对应的节点就是所需要查找的节点
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
          	//树形节点通过红黑树的查找方法查找
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
              	//链表的方式查找
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
      	//如果以上三种中的一种找到了目标节点,并且满足值对应匹配策略
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
          	//树形节点调用红黑树的删除方法
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
          	//bucket头节点的化,直接将bucket指向下一个节点即可
            else if (node == p)
                tab[index] = node.next;
          	//链表直接修改上一个节点的下一个指针为下下个节点
            else
                p.next = node.next;
          	//修改次数+1
            ++modCount;
            --size;
          	//删除节点的钩子方法
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

6、总结

  • 1.HashMap的基本结构 这个问题可以参考1.1部分,数组加链表/红黑树。需要注意的是,红黑树实际上还有一层链表结构。如果有面试中遇到此问题,就要从TreeNode的继承结构开始,TreeNode继承了LinkedHashMap.Entry 而LinkedHashMap.Entry又继承了Node,Node再继承了Map.Entry,每层都增加了若干属性。TreeNode的节点大小大约是Node节点的2倍。因此,TreeNode中还有原有的链表关系,这个再split的时候非常有用。

  • 2.HashMap为什么初始化的大小为16 这是阿里面试的一个重量级面试题,我们在前面第二部分详细有过说明。再此简单回顾,由于HashMap为了进一步提升性能,大量的采用了位运算,这体现在扩容、拆分、以及索引计算的过程中。因此,这就要求实际的长度必须为2的幂。HashMap实际上指定长度的时候,其table并不会倍创建,而是在resize的过程中,根据阈值进行计算的。那么满足2的幂的值只有2、4、8、16、32等,太大则浪费空间,太小则会导致程序不断扩容。因此16是个折衷的数字。需要注意的是,HashMap的数组大小只能在resize过程中计算,这个resize方法中根据阈值来计算的,这样一来,tableSizeFor方法只会计算出比当前cap的下一个2的幂。此处最好还介绍下tableSizeFor的过程。

  • 3.HashMap为什么树化的阈值是8 这也是在面试过程中容易出现的问题,树化的阈值,根据注释中的描述可以知道,在一个比较离散的哈希函数中,哈希冲突的概率服从泊松分布,根据泊松分布的公式,再结合当前HashMap的特点,其计算公式:

(exp(-0.5)* pow(0.5,k)/ factorial(k)

当为8的时候概率已经小于千万分之一。所以通常认为再8的时候转换为树比较合适。

4.HashMap中采用了哪些位运算操作,分别又什么用 这一点参考第二部分,都进行了总结,主要有,一、位移扩容,左移直接扩大两倍。二、根据(hashcode&(oldsize-1))计算bucket的索引。三、扩容的时候根据split拆分为高低位的计算,(hashcode&oldsize)为0则在低位,不为0则在高位,高位的位置为index+oldsize。四、还有一个地方即hash方法中,采用高低位混淆。hashcode^(hashcode>>>16) 。 详细见上文的各个章节。

5.HashMap树化的条件 HashMap并不会在链表长度大于8的时候就变成红黑树,此外还有两个条件,要么size大于64,要么触发扩容。详情参见前文

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值