Java集合-HashMap内部结构

最新推荐文章于 2024-07-30 00:45:36 发布

weixin_34302561

最新推荐文章于 2024-07-30 00:45:36 发布

阅读量837

点赞数

文章标签： java 数据结构与算法 python

原文链接：https://my.oschina.net/u/232911/blog/2250339

版权

2019独角兽企业重金招聘Python工程师标准>>>

首先看一下Map接口的继承关系

说明

Map 为最顶层的接口，AbstractMap 抽象类实现Map接口，TreeMap HashMap ConcurrentHashMap 都是继承自 AbstractMap，实现了不同的功能。ConcurrentHashMap 另外又实现了一个 ConcurrentMap 接口，这个接口继承自Map，对Map接口进行了一些扩展（看名字是在扩展了并发方面）。

概要

接下来通过分析HashMap代码，了解HashMap的内部结构。主要内容为：

Map 接口
Map.Entry
HashMap 内部结构
get 操作
put 操作
resize
hash 扰动函数

Map 接口

首先看一下什么是Map，Map是一个接口（Interface）。在 api 中的定义为

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

一个拥有键值对的对象。一个map不能包含重复的key，没一个key最多可以映射到一个值。

看一下map接口中主要的方法

public interface Map<K,V> {
    // Query Operations

    int size();
    boolean isEmpty();
    boolean containsKey(Object key);
    boolean containsValue(Object value);
    V get(Object key);


    // Modification Operations

    V put(K key, V value);
    V remove(Object key);


    // Bulk Operations

    void putAll(Map<? extends K, ? extends V> m);
    void clear();


    // Views

    Set<K> keySet();
    Collection<V> values();
    Set<Map.Entry<K, V>> entrySet();

    interface Entry<K,V> {

        K getKey();

        V getValue();

        V setValue(V value);

        boolean equals(Object o);

        int hashCode();

        。。。
    }

    // Comparison and hashing

    boolean equals(Object o);
    int hashCode();


    // Defaultable methods

    ...
}

注释写得很清楚，接口中有一些增加获取移除等操作（Query Opertions, Modification Operations, Buld Operations，View）, 还有一些java8之后引入的默认的方法（这里没有显示出来）。views 部分提供了一些可以查看map内部的方法，keySet() 返回所有key的一个Set集合，values() 返回所有value的集合，entrySet() 返回所有键值对的集合。

Map.Entry

Map 接口中有一个内部接口 Entry<K, V>。这个接口非常重要，我们平时所说的键值对就是这个东西。

它提供的方法很简单

interface Entry<K,V> {

    K getKey();

    V getValue();

    V setValue(V value);

    boolean equals(Object o);

    int hashCode();

    。。。
}

获取key，获取value，设置value的值，equals hashCode方法。

HashMap 内部结构

定义

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

继承自 AbstractMap 实现了 Map 接口

看下 AbstractMap 的定义

public abstract class AbstractMap<K,V> implements Map<K,V> {

AbstractMap 是一个抽象类也实现了 Map 接口。

看到这里就很奇怪了，为什么 AbstractMap 已经实现了 Map 接口，HashMap 还要再实现一下 Map 接口？

查询了很多资料，据说是作者写得多余了，其实 HashMap 没必要再 implements Map<K, V> 一下，下面的链接有人也提出了同样的疑问。

https://stackoverflow.com/questions/2165204/why-does-linkedhashsete-extend-hashsete-and-implement-sete

现在来看一下 HashMap 中定义的一些主要的变量

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    。。。

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

    ...

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

    ...

}

保留了源码中的注释说明，基本上看下说明可以了解这些字段的作用。

DEFAULT_INITIAL_CAPACITY 定义了初始化容量，一个map在无参数的情况下被创建出来，默认的大小就是 1<<4 （16）。

DEFAULT_LOAD_FACTOR 默认负载因子 0.75，这个非常重要，在后面的扩容会用到。

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);

    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;

    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " + loadFactor);

    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

HashMap 提供了4个构造方法，可以接收修改初始化大小和负载因子，但是一般情况下就不要去修改了，避免设置得不好性能上出现问题。

MAXIMUM_CAPACITY 最大容量 1 << 30。1左移30位二进制的形势下就是 0100 0000 0000 0000 0000 0000 0000 0000，这个的意思是2的30次方，十进制下是 1073741824。注释说了 MUST be a power of two（一定要是2的次方），再多移动一位 1<<31 就变成负数了。

TREEIFY_THRESHOLD，UNTREEIFY_THRESHOLD， MIN_TREEIFY_CAPACITY 这几个参数是后面当红黑树的参数。

结构

接下来看到2个东西 static class Node<K,V> implements Map.Entry<K,V> 和 transient Node<K,V>[] table。这2个东西就是 HashMap 的本质了。其实 HashMap 就是一个由 Node 类组成的一个二维数组，Node 是 Map.Entry 的具体实现类。

class Node<K,V>

内部定义了了4个字段，hash值，泛型key，泛型value，指向下一个Node节点的引用。

Node<K,V>[] table

The table, initialized on first use, and resized as necessary. When allocated, length is always a power of two.

table 会在第一次使用的时候初始化，并且在有必要的时候（容量超过负载因子）扩容。当扩容之后，数组的长度一定是2的n次方。（后面会解释为什么一定是2的n次方，而不是其他值。）

内部接口示意图

（此图来源于网络）

map的大致容貌是这样的，当put一个对象的时候会根据对象的hash值计算出它在数组中存放的位置（通过扰动函数计算，后面会讲到），然后判断这个位置上有没有已经存在的对象，如果没有就直接放到这个位置，如果有将已存在对象的next指向当前对象形成一个链表，当链表长度超过一定数量之后，链表会转换成红黑树（这是java8之后的修改，为了提升查询效率）。所以hashmap本质上是一个二维数组加链表加红黑树的组合。

基本操作

Get

HashMap 的 get 方法如下

transient Node<K,V>[] table;

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && 
        (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
        // always check first node
        if (first.hash == hash && 
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash && 
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

先通过key获取hash值（拿key的hashCode进行高位异或），通过key的hash值判断出这个key应该在数组的哪个位置读取（first = tab[(n-1) & hash]，这个(n-1) & hash为“扰动函数”，意在减少不同的key落在数组同一位置的机率，已在另一篇文中详细说明），通过hash值和hashcode相等来判断该位置是否已经有元素，如果没有返回null，如果有按链表顺序检索，如果链表为红黑树，则转换为红黑树的查找，找到相同的元素即返回，没有找到返回null。

Put

HashMap 的 put 方法如下

transient Node<K,V>[] table;

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
    return new Node<>(hash, key, value, next);
}

首先判断table是否为空，如果是空的那么就进行resize（resize方法下面说明），也就是说在第一次put的时候进行扩容，接着还是通过扰动函数算出key在数组中的位置，如果该位置没有元素，那么直接创建一个元素（newNode）放到该位置，如果该位置不是空的，先判断一次节点元素和传进来的key相同，如果不同判断是否是红黑树，如果是则进行红黑树查找，如果不是则循环链表，如果遍历完整个链表都没有找到相同的元素，就创建一个新的元素放到链表的最后，如果找到就返回元素的值，最后再判断一次数组的大小是否超过阀值，如果超过的话就要进行一个扩容。

Resize

resize 方法如下

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

对旧的容量判断是否需要扩容，如果需要扩容，新的数据容量大小为原来的2倍（newThr = oldThr << 1; 假设oldThr为16，转换成2进制之后左移一位结果是32，如果再次扩容左移一位，结果是64 ）。算出新的容量大小时候先创建指定大小的空数组，然后将原来的数组数据复制过来，轮询原数组，利用扰动函数重新计算出位置，如果不是链表就直接放入，如果是链表以及红黑树，则就相应的方法复制数据。