Java之HashMap的底层实现原理

最新推荐文章于 2024-06-27 20:37:39 发布

Mr丶杨先森

最新推荐文章于 2024-06-27 20:37:39 发布

阅读量377

点赞数 1

分类专栏：学习文章标签： java hashmap

本文链接：https://blog.csdn.net/Y_xiansheng123/article/details/107429527

版权

学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

HashMap的底层实现原理

最近在准备校招，正好遇到这个问题，所以做个学习总结吧，HashMap在JDK7和JDK8中底层实现是不一样的，具体如下：

JDK1.7：底层是数组+链表

hashmap1.7

主要变量：

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 16;

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table;

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

/**
 * The next size value at which to resize (capacity * load factor).
 * @serial
 */
int threshold;
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;

/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash).  This field is used to make iterators on Collection-views of
* the HashMap fail-fast.  (See ConcurrentModificationException).
*/
transient int modCount;

①DEFAULT_INITIAL_CAPACITY：默认初始化容量大小，默认是16

②MAXIMUM_CAPACITY ：HashMap的最大支持容量， 2^30

③DEFAULT_LOAD_FACTOR： HashMap的默认加载因子，0.75

④Entry[] table：存放数据的数组

⑤int size： Map中储存的键值对的数量

⑥int threshold：扩容的临界值，

⑦float loadfactor：加载因子，可以在创建时显示指定

⑧int modCount：HashMap扩容和结构改变的次数。

默认初始化过程

public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);//空参构造器，传入默认的容量和															//加载因子
    }
public HashMap(int initialCapacity, float loadFactor) {//带参构造器
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();
}

给定的默认容量为 16，负载因子为 0.75。Map 在使用过程中不断的往里面存放数据，当数量达到了 16 * 0.75 = 12 就需要将当前 16 的容量进行扩容，而扩容这个过程涉及到 rehash、复制数据等操作，所以非常消耗性能。

因此通常建议能提前预估 HashMap 的大小最好，尽量的减少扩容带来的性能损耗。

根据代码可以看到其实真正存放数据的是:

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

这个数组是这样定义的：

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
    }

Entry 是 HashMap 中的一个内部类，从他的成员变量很容易看出：

key 就是写入时的键。
value 自然就是值。
开始的时候就提到 HashMap 是由数组和链表组成，所以这个 next 就是用于实现链表结构。
hash 存放的是当前 key 的 hashcode。

put方法实现：

public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;		//新元素指向旧元素，并替代旧元素的位置
            }
        }

        modCount++;//记录HashMap扩容和结构改变的次数。
        addEntry(hash, key, value, i);
        return null;
    }

判断当前数组是否需要初始化。
如果 key 为空，则 put 一个空值进去。
根据 key 计算出 hashcode。
根据计算出的 hashcode 定位出所在桶。
如果桶是一个链表则需要遍历判断里面的 hashcode、key 是否和传入 key 相等，如果相等则进行覆盖，并返回原来的值。
如果桶是空的，说明当前位置没有数据存入；新增一个 Entry 对象写入当前位置。

addEntry（）方法：

void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);//两倍扩容
            hash = (null != key) ? hash(key) : 0;//如果key不为空，重新计算hash值
            bucketIndex = indexFor(hash, table.length);//重新定位桶的索引
        }

        createEntry(hash, key, value, bucketIndex);
    }
    
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

当调用 addEntry 写入 Entry 时需要判断是否需要扩容，如果需要就进行两倍扩充，并将当前的 key 重新 hash 并定位。而在 createEntry() 中会将当前位置的桶传入到新建的桶中，如果当前桶有值就会在位置形成链表。

get()方法实现：

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
    
final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
    }

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
     }
        return null;
}

首先也是根据 key 计算出 hashcode，然后定位到具体的桶中。
判断该位置是否为链表。
不是链表就根据 key的 hashcode 是否相等来返回值。
为链表则需要遍历直到 key 及 hashcode 相等时候就返回值。
啥都没取到就直接返回 null 。

总结：①当实例化一个HashMap时，系统会创建一个长度为Capacity的Entry数组，这个长度在哈希表中被称为容量(Capacity)，在这个数组中可以存放元素的位置我们称之为“桶” (bucket)，每个bucket都有自己的索引，系统可以根据索引快速的查找bucket中的元素。

② 每个bucket中存储一个元素，即一个Entry对象，但每一个Entry对象可以带一个引用变量，用于指向下一个元素，因此，在一个桶中，就有可能生成一个Entry链。
而且新添加的元素作为链表的head。

③扩容：当HashMap中元素超过临界值（数组大小*负载因子，默认16*0.75=12）时,数组大小扩容为原来的两倍。然后重新计算每个元素在数组中的位置（rehash()）并且复制数据（transfer()），这是一个比较消耗性能的操作，可以预设容量大小提高性能。

④put的过程：

map.put(k1,v1);

首先调用k1所在类的hashCode()计算hash值，经过HashMap中的某种算法计算后，得到在Entry数组中存放位置（bucketIndex）；

然后判断，

case1：如果此位置上的数据为空，则new Entry（h,k,v,e）放入此位置；

如果此位置上的数据不为空，意味着此位置上存在一个或多个数据(以链表形式存在)，比较k1和已存在的一个或者多个hash值：

case2：如果k1的哈希值与已经存在的数据的哈希值都不相同，此时k1-v1添加成功；

case3：如果k1的哈希值和已经存在的某个数据的哈希值相同，继续调用k1所在类的equals()方法比较，

equals()方法返回true，则添加成功（七上八下，ps：后面解释），否则添加失败。

⑤get的过程：

map.get(k1);

首先也是根据 key 计算出hashcode，然后定位到具体的桶中，根据k1中的equals()方法判断有没有相同的key，有就返回对应的value，没有返回空。

JDK1.8：底层是数组+链表+红黑树

在JDK1.7中，当 Hash 冲突严重时，在桶上形成的链表会变的越来越长，这样在查询时的效率就会越来越低；时间复杂度为 O(N)。

在这里插入图片描述

关键变量：

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;

/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;

transient Node<K,V>[] table;//原来为Entry

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

UNTREEIFY_THRESHOLD： Bucket中红黑树存储的Node小于该默认值，转化为链表，默认6。
MIN_TREEIFY_CAPACITY：桶中的Node被树化时最小的hash表容量，默认64。

当桶中Node的数量大到需要变红黑树时，若hash表容量小于MIN_TREEIFY_CAPACITY时，此时应执行resize扩容操作，这个MIN_TREEIFY_CAPACITY的值至少是TREEIFY_THRESHOLD的4倍。

put的过程：

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {
    Node<K,V>[] tab; 
    Node<K,V> p; 
    int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)							   //①
        n = (tab = resize()).length;											   
    if ((p = tab[i = (n - 1) & hash]) == null)									   //②
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))				   //③
            e = p;
        else if (p instanceof TreeNode)											   //④
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {								   //⑤
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st		   //⑥
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))        //⑦
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key							   //⑧
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold) 													   //⑨
        resize();
    afterNodeInsertion(evict);
    return null;
}

判断当前桶是否为空，空的就需要初始化（resize 中会判断是否进行初始化）。
根据当前 key 的 hashcode 定位到具体的桶中并判断是否为空，为空表明没有 Hash 冲突就直接在当前位置创建一个新桶即可。
如果当前桶有值（ Hash 冲突），那么就要比较当前桶中的 key、key 的 hashcode 与写入的 key 是否相等，相等就赋值给 e,在第 8 步的时候会统一进行赋值及返回。
如果当前桶为红黑树，那就要按照红黑树的方式写入数据。
如果是个链表，就需要将当前的 key、value 封装成一个新节点写入到当前桶的后面（形成链表）。
接着判断当前链表的大小是否大于预设的阈值，大于时就要转换为红黑树。
如果在遍历过程中找到 key 相同时直接退出遍历。
如果 e != null 就相当于存在相同的 key,那就需要将值覆盖。
最后判断是否需要进行扩容。

扩容的过程：

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
 
//如果当前数组容量< MIN_TREEIFY_CAPACITY,则扩容数组，否则转换成红黑树
final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

当HashMap中的其中一个链的对象个数如果达到了8个，此时如果capacity没有达到64，那么HashMap会先扩容解决，如果已经达到了64，那么这个链会变成树，结点类型由Node变成TreeNode类型。当然，如果当映射关系被移除后，下次resize方法时判断树的结点个数低于6个，也会把树再转为链表。

get的过程：

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

首先将 key hash 之后取得所定位的桶。
如果桶为空则直接返回 null 。
否则判断桶的第一个位置(有可能是链表、红黑树)的 key 是否为查询的 key，是就直接返回 value。
如果第一个不匹配，则判断它的下一个是红黑树还是链表。
红黑树就按照树的查找方式返回值。
不然就按照链表的方式遍历匹配返回值。

从这两个核心方法（get/put）可以看出 1.8 中对大链表做了优化，修改为红黑树之后查询效率直接提高到了 O(logn)。

总结：和JDK1.7不同的是

1.new HashMap():底层没有创建一个长度为16的数组，当首次调用map.put()时才创建

2.数组为Node类型，在jdk7中称为Entry类型

3.形成链表结构时，新添加的key-value对在链表的尾部（七上八下，ps：后面解释）

4.当数组指定索引位置的链表长度>8时，且map中的数组的长度> 64时，此索引位置上的所有key-value对使用红黑树进行存储，节点由Node类型变成TreeNode类型，当红黑树节点<6时则从红黑树再转化为链表。

ps：最后关于七上八下的解释，在JDK7中元素的添加过程是新元素指向旧元素，新元素赋值在索引位置上，而在JDK8中添加元素是旧元素指向新元素，旧元素位置不动，所以方便记忆，就叫做七上八下。

Mr丶杨先森

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Java之HashMap的底层实现原理

HashMap的底层实现原理最近在准备校招，正好遇到这个问题，所以做个学习总结吧，HashMap在JDK7和JDK8中底层实现是不一样的，具体如下：JDK1.7：底层是数组+链表主要变量：/** * The default initial capacity - MUST be a power of two. */static final int DEFAULT_INITIAL_CAPACITY = 16;/** * The maximum capacity, used if a h
复制链接

扫一扫