剖析 HashMap 和 HashSet

Nancy_G

于 2020-11-16 10:20:28 发布

阅读量219

点赞数 4

分类专栏： Java源码文章标签： java hashmap

本文链接：https://blog.csdn.net/sinat_41653656/article/details/109555938

版权

Java源码专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1. 剖析 HashMap

1.1 Map 接口

Map 有键和值的概念，一个键映射到一个值，Map 按照键存储和访问值，键不能重复，给同一个键重复赋值会覆盖原来的值。数组、ArrayList、LinkedList 可以看作一种特殊的 Map，键为索引，值为对象。
Java 8 中 Map 接口的定义如下所示：

public interface Map<K,V> {
    // Query Operations
    int size();
    boolean isEmpty();
    boolean containsKey(Object key);
    boolean containsValue(Object value);
    V get(Object key);

    // Modification Operations
    V put(K key, V value);
    V remove(Object key);

    // Bulk Operations
    void putAll(Map<? extends K, ? extends V> m);
    void clear();

    // Views
    Set<K> keySet();//获取Map中键的集合
    Collection<V> values();//获取Map中所有值的集合
    Set<Map.Entry<K, V>> entrySet();//获取Map中的所有的键值对

    interface Entry<K,V> {//嵌套接口，表示一条键值对
        K getKey();
        V getValue();
        V setValue(V value);
        boolean equals(Object o);
        int hashCode();
        public static <K extends Comparable<? super K>, V> Comparator<Map.Entry<K,V>> comparingByKey() {
            return (Comparator<Map.Entry<K, V>> & Serializable)
                (c1, c2) -> c1.getKey().compareTo(c2.getKey());
        }
        public static <K, V extends Comparable<? super V>> Comparator<Map.Entry<K,V>> comparingByValue() {
            return (Comparator<Map.Entry<K, V>> & Serializable)
                (c1, c2) -> c1.getValue().compareTo(c2.getValue());
        }

        public static <K, V> Comparator<Map.Entry<K, V>> comparingByKey(Comparator<? super K> cmp) {
            Objects.requireNonNull(cmp);
            return (Comparator<Map.Entry<K, V>> & Serializable)
                (c1, c2) -> cmp.compare(c1.getKey(), c2.getKey());
        }
        public static <K, V> Comparator<Map.Entry<K, V>> comparingByValue(Comparator<? super V> cmp) {
            Objects.requireNonNull(cmp);
            return (Comparator<Map.Entry<K, V>> & Serializable)
                (c1, c2) -> cmp.compare(c1.getValue(), c2.getValue());
        }
    }

    // Comparison and hashing
    boolean equals(Object o);
    int hashCode();

    // Defaultable methods
    default V getOrDefault(Object key, V defaultValue) {
        V v;
        return (((v = get(key)) != null) || containsKey(key))
            ? v : defaultValue;
    }

    default void forEach(BiConsumer<? super K, ? super V> action) {
        Objects.requireNonNull(action);
        for (Map.Entry<K, V> entry : entrySet()) {
            K k;
            V v;
            try {
                k = entry.getKey();
                v = entry.getValue();
            } catch(IllegalStateException ise) {
                // this usually means the entry is no longer in the map.
                throw new ConcurrentModificationException(ise);
            }
            action.accept(k, v);
        }
    }

    default void replaceAll(BiFunction<? super K, ? super V, ? extends V> function) {
        Objects.requireNonNull(function);
        for (Map.Entry<K, V> entry : entrySet()) {
            K k;
            V v;
            try {
                k = entry.getKey();
                v = entry.getValue();
            } catch(IllegalStateException ise) {
                // this usually means the entry is no longer in the map.
                throw new ConcurrentModificationException(ise);
            }

            // ise thrown from function is not a cme.
            v = function.apply(k, v);

            try {
                entry.setValue(v);
            } catch(IllegalStateException ise) {
                // this usually means the entry is no longer in the map.
                throw new ConcurrentModificationException(ise);
            }
        }
    }
    default V putIfAbsent(K key, V value) {
        V v = get(key);
        if (v == null) {
            v = put(key, value);
        }
        return v;
    }
    default boolean remove(Object key, Object value) {
        Object curValue = get(key);
        if (!Objects.equals(curValue, value) ||
            (curValue == null && !containsKey(key))) {
            return false;
        }
        remove(key);
        return true;
    }
    default boolean replace(K key, V oldValue, V newValue) {
        Object curValue = get(key);
        if (!Objects.equals(curValue, oldValue) ||
            (curValue == null && !containsKey(key))) {
            return false;
        }
        put(key, newValue);
        return true;
    }
    default V replace(K key, V value) {
        V curValue;
        if (((curValue = get(key)) != null) || containsKey(key)) {
            curValue = put(key, value);
        }
        return curValue;
    }

    default V computeIfAbsent(K key,
            Function<? super K, ? extends V> mappingFunction) {
        Objects.requireNonNull(mappingFunction);
        V v;
        if ((v = get(key)) == null) {
            V newValue;
            if ((newValue = mappingFunction.apply(key)) != null) {
                put(key, newValue);
                return newValue;
            }
        }
        return v;
    }
    default V computeIfPresent(K key,
            BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        Objects.requireNonNull(remappingFunction);
        V oldValue;
        if ((oldValue = get(key)) != null) {
            V newValue = remappingFunction.apply(key, oldValue);
            if (newValue != null) {
                put(key, newValue);
                return newValue;
            } else {
                remove(key);
                return null;
            }
        } else {
            return null;
        }
    }
    default V compute(K key,
            BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        Objects.requireNonNull(remappingFunction);
        V oldValue = get(key);

        V newValue = remappingFunction.apply(key, oldValue);
        if (newValue == null) {
            // delete mapping
            if (oldValue != null || containsKey(key)) {
                // something to remove
                remove(key);
                return null;
            } else {
                // nothing to do. Leave things as they were.
                return null;
            }
        } else {
            // add or replace old mapping
            put(key, newValue);
            return newValue;
        }
    }
    default V merge(K key, V value,
            BiFunction<? super V, ? super V, ? extends V> remappingFunction) {
        Objects.requireNonNull(remappingFunction);
        Objects.requireNonNull(value);
        V oldValue = get(key);
        V newValue = (oldValue == null) ? value :
                   remappingFunction.apply(oldValue, value);
        if(newValue == null) {
            remove(key);
        } else {
            put(key, newValue);
        }
        return newValue;
    }
}

Set 是一个接口，表示的是数学中的基本集合概念，即没有重复的元素集合。Java 8 中 Set 定义为：

public interface Set<E> extends Collection<E>{
}

它扩展了 Collection，但是没有定义任何新的方法，不过，它要求所有的实现者都必须确保 Set 的语义描述，即不能有重复元素。
Map 中的键是没有重复的，所以 keySet() 返回了一个 Set。keySet、values 和 entrySet() 三个有一个共同的特点：它们返回的都是视图，不是复制的值，基于返回值的修改会直接修改 Map 自身。比如：

map.keySet().clear();

会删除所有的键值对。

1.2 HashMap

HashMap 是一个利用哈希表原理来存储元素的集合。遇到冲突时，HashMap 是采用的链地址法来解决，在 JDK1.7 中，HashMap 是由数组+链表构成的。但是在 JDK1.8 中，HashMap 是由数组+链表+红黑树构成，新增了红黑树作为底层数据结构，结构变得复杂了，但是效率也变的更高效。下面我们来具体介绍在 JDK1.8 中 HashMap 是如何实现的。

在这里插入图片描述

HashMap 中的字段属性如下：

//序列化和反序列化时，通过该字段进行版本一致性验证
private static final long serialVersionUID = 362498820763181265L;
//默认 HashMap 集合初始容量为16（必须是 2 的倍数）
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
//集合的最大容量，如果通过带参构造指定的最大容量超过此数，默认还是使用此数
static final int MAXIMUM_CAPACITY = 1 << 30;
//默认的填充因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//当桶(bucket)上的结点数大于这个值时会转成红黑树(JDK1.8新增)
static final int TREEIFY_THRESHOLD = 8;
//当桶(bucket)上的节点数小于这个值时会转成链表(JDK1.8新增)
static final int UNTREEIFY_THRESHOLD = 6;
/**(JDK1.8新增)
 * 当集合中的容量大于这个值时，表中的桶才能进行树形化 ，否则桶内元素太多时会扩容，
 * 而不是树形化 为了避免进行扩容、树形化选择的冲突，这个值不能小于 4 * TREEIFY_THRESHOLD
 */
static final int MIN_TREEIFY_CAPACITY = 64;

注意：后面三个字段是 JDK1.8 新增的，主要是用来进行红黑树和链表的互相转换。

HashMap 有如下构造方法：

public HashMap() //默认构造方法

public HashMap(int initialCapacity) 

public HashMap(int initialCapacity, float loadFactor) 
  
//以一个已知的Map构造，复制其中所有的键值对到当前Map 
public HashMap(Map<? extends K, ? extends V> m)

1.3 HashMap 实现原理

1.3.1 内部组成

HashMap 内部主要有以下成员：

   /**
     * The table, initialized on first use, and resized as necessary. When allocated, length is always a power of two.
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     */
    transient int modCount;

	/**
     * The next size value at which to resize (capacity * load factor).
     */
    int threshold;

    /**
     * The load factor for the hash table.
     */
    final float loadFactor;

table 是一个 Node 类型的数组，其中的每个元素指向一个单向链表，链表中的每个结点表示一个键值对，Node 是一个内部类，它的实例变量和构造方法代码如下：

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;//hash是key的hash值，直接存储hash值是为了比较的时候加快计算
    final K key;
    V value;
    Node<K,V> next;//指向下一个Node结点

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
}

table 开始为空，随着键值对的添加进行扩容。我们知道JDK 1.8中 HashMap 是由数组+链表+红黑树构成，向 HashMap 中插入元素时，如果HashMap 集合的元素已经大于最大承载容量threshold（capacity * loadFactor），这里的threshold不是数组的最大长度。那么必须扩大数组的长度(2倍扩容)，Java中数组是无法自动扩容的，我们采用的方法是用一个更大的数组代替这个小的数组，就好比以前是用小桶装水，现在小桶装不下了，我们使用一个更大的桶。

JDK1.8融入了红黑树的机制，比较复杂，这里我们先介绍 JDK1.7的扩容源码，便于理解，然后再介绍JDK1.8的源码。

//参数 newCapacity 为新数组的大小
    void resize(int newCapacity) {
        Entry[] oldTable = table;//引用扩容前的 Entry 数组
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {//扩容前的数组大小如果已经达到最大(2^30)了
            threshold = Integer.MAX_VALUE;///修改阈值为int的最大值(2^31-1)，这样以后就不会扩容了
            return;
        }

        Entry[] newTable = new Entry[newCapacity];//初始化一个新的Entry数组
        transfer(newTable, initHashSeedAsNeeded(newCapacity));//将数组元素转移到新数组里面
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);//修改阈值
    }
    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {//遍历数组
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);//重新计算每个元素在数组中的索引位置
                e.next = newTable[i];//标记下一个元素，添加是链表头添加
                newTable[i] = e;//将元素放在链上
                e = next;//访问下一个 Entry 链上的元素
            }
        }
    }

通过方法我们可以看到，JDK1.7 中首先是创建一个新的大容量数组，然后依次重新计算原集合所有元素的索引，然后重新赋值。如果数组某个位置发生了 hash 冲突，使用的是单链表的头插入方法，同一位置的新元素总是放在链表的头部，这样与原集合链表对比，扩容之后的可能就是倒序的链表了。

下面我们再看看 JDK1.8 的扩容代码：

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;//原数组如果为null，则长度赋值为0
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {//如果原数组长度 > 0
        if (oldCap >= MAXIMUM_CAPACITY) {//数组大小如果已经大于等于最大值(2^30)
            threshold = Integer.MAX_VALUE;//修改阈值为int的最大值(2^31-1)，这样以后就不会扩容了
            return oldTab;
        }
        //原数组长度大于等于初始化长度16，并且原数组长度扩大1倍也小于2^30次方
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold (阈值扩大一倍)
	}
    else if (oldThr > 0) // 旧阈值大于0，则将新容量直接等于旧阈值
        newCap = oldThr;
    else {               // 旧阈值 = 0 并且 原数组长度 = 0（集合未进行初始化）
       	newCap = DEFAULT_INITIAL_CAPACITY;//数组长度初始化为 16
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);//阈值=16*0.75=12
    }
    //计算新的阈值上限
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
    	//把每个bucket都移动到新的bucket中
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;//原数据j位置置为null，便于垃圾回收
                 //如果节点为空，则直接计算在新数组中的位置，放入即可
                if (e.next == null)//数组没有下一个引用（不是链表）
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)//红黑树，拆分树节点
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;//保存在原有索引的链表
                    Node<K,V> hiHead = null, hiTail = null;//保存在新索引的链表
                    Node<K,V> next;
                    do {
                        next = e.next;
                        //哈希值和原数组长度进行&操作，为0则在原数组的索引位置，非0则在原数组索引位置+原数组长度的新位置
                        //原索引
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        //原索引+oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    //原索引放到bucket里
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                     //原索引+oldCap放到bucket里
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

该方法分为两部分，首先是计算新桶数组的容量 newCap 和新阈值 newThr，然后将原集合的元素重新映射到新集合中。

在这里插入图片描述
　相比于JDK1.7，1.8使用的是2次幂的扩展(指长度扩为原来2倍)，所以，元素的位置要么是在原位置，要么是在原位置再移动2次幂的位置。我们在扩充HashMap的时候，不需要像JDK1.7的实现那样重新计算hash，只需要看看原来的hash值新增的那个bit是1还是0就好了，是0的话索引没变，是1的话索引变成“原索引+oldCap”。有一点注意区别，JDK1.7中rehash的时候，旧链表迁移新链表的时候，如果在新表的数组索引位置相同，则链表元素会倒置，但JDK1.8不会倒置。

hash冲突的几种情况：

两个节点的key值相同（hash值一定相同），导致冲突
两个节点的key值不同，由于hash函数的局限性导致hash值相同，导致冲突
两个节点的key值不同，hash值不同，但hash值对数组长度取模后相同，导致冲突

1.3.2 构造方法

/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
	this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial capacity
* (16) and the specified load factor.
*/
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

/**
* Constructs a new <tt>HashMap</tt> with the same mappings as the
* specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
* default load factor (0.75) and an initial capacity sufficient to
* hold the mappings in the specified <tt>Map</tt>.
*/
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

1.3.3 保存键值对

下面，我们看一下 HashMap 是如何把键值对保存起来的，代码为：

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
* @param onlyIfAbsent true 表示不要更改现有值
* @param evict false表示table处于创建模式
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)//判断键值对数组 table 是否为空
        n = (tab = resize()).length;
    //根据键值key计算hash值得到插入的数组索引i
    if ((p = tab[i = (n - 1) & hash]) == null)
        //找到key值对应的位置并且是第一个，直接插入
        tab[i] = newNode(hash, key, value, null);//tab[i] = null,直接新建节点添加
    else {//tab[i] != null
        Node<K,V> e; K k;
        //判断table[i]的首个元素是否和key一样，如果相同直接覆盖value
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //判断table[i] 是否为TreeNode，即table[i] 是否是红黑树，如果是红黑树，则直接在树中插入键值对
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //如果不是TreeNode的话，即为链表，然后遍历链表table[i]
        else {
            for (int binCount = 0; ; ++binCount) {
            	//链表的尾端也没有找到key值相同的节点，则生成一个新的Node
            	//并且判断链表的节点个数是不是到达转换成红黑树的上界达到，则转换成红黑树
                if ((e = p.next) == null) {
                	//创建链表节点并插入尾部
                    p.next = newNode(hash, key, value, null);
                    //判断链表长度是否到达8，达到8的话把链表转换为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) // binCount >= 7
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

putVal 方法中的 hash 方法是用来计算 key 的 hash 值的，HashMap 的哈希算法为：

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

i = (table.length - 1) & hash;//这一步是在后面添加元素putVal()方法中进行位置的确定

主要分为三步：

①、取 hashCode 值： key.hashCode()

②、高位参与运算：h>>>16

③、取模运算：(n-1) & hash

这里获取 hashCode() 方法的值是变量，但是我们知道，对于任意给定的对象，只要它的 hashCode() 返回值相同，那么程序调用 hash(Object key) 所计算得到的 hash 码值总是相同的。

为了让数组元素分布均匀，我们首先想到的是把获得的 hash 码对数组长度取模运算( hash%length)，但是计算机都是二进制进行操作，取模运算相对开销还是很大的，那该如何优化呢？

HashMap 使用的方法很巧妙，它通过 hash & (table.length -1)来得到该对象的保存位，我们知道 HashMap 底层数组的长度总是2的n次方，这是HashMap在速度上的优化。当 length 总是2的n次方时，hash & (length-1)运算等价于对 length 取模，也就是 hash%length，但是&比%具有更高的效率。比如 n % 32 = n & (32 -1)

下面，我们再接着回到 putValue 方法中，其实现过程如下：
①、判断键值对数组 table 是否为空或为null，否则执行resize()进行扩容；
②、根据键值key计算hash值得到插入的数组索引 i，如果table[i]==null，直接新建节点添加，转向⑥，如果table[i]不为空，转向③；
③、判断table[i]的首个元素是否和key一样，如果相同直接覆盖value，否则转向④，这里的相同指的是hashCode以及equals；
④、判断table[i] 是否为TreeNode，即table[i] 是否是红黑树，如果是红黑树，则直接在树中插入键值对，否则转向⑤；
⑤、遍历table[i]，判断链表长度是否大于8，大于8的话把链表转换为红黑树，在红黑树中执行插入操作，否则进行链表的插入操作；遍历过程中若发现key已经存在直接覆盖value即可；
⑥、插入成功后，判断实际存在的键值对数量size是否超过了最大容量threshold，如果超过，进行扩容。
⑦、如果新插入的key不存在，则返回null，如果新插入的key存在，则返回原key对应的value值（注意新插入的value会覆盖原value值）。
其中，要注意这两行代码：

if (++size > threshold)//超过最大容量，进行扩容
    resize();

这里有个考点，我们知道 HashMap 是由数组+链表+红黑树（JDK1.8）组成，如果在添加元素时，发生冲突，会将冲突的数放在链表上，当链表长度超过8时，会自动转换成红黑树。在插入元素的时候，因为 JDK1.7 是用单链表进行的纵向延伸，采用头插法就是能够提高插入的效率，但是也会容易出现逆序且环形链表死循环问题。但是在 JDK1.8 之后是因为加入了红黑树使用尾插法，能够避免出现逆序且链表死循环的问题。

1.3.4 查找方法

首先通过 key 找到计算索引，找到桶位置，先检查第一个节点，如果是则返回，如果不是，则遍历其后面的链表或者红黑树。是树节点的话，就以树节点的方式遍历整棵树来查找，不是的话那就说明存储结构是链表，以遍历链表的方式查找。其余情况全部返回 null。

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        //查看第一个元素
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

containsKey 方法的逻辑与get是类似的，具体代码为：

public boolean containsKey(Object key) {
    return getNode(hash(key), key) != null;
}

HashMap 可以方便高效的按照键进行操作，但如果要根据值进行操作，则需要遍历，containsValue 方法的代码为：

public boolean containsValue(Object value) {
    Node<K,V>[] tab; V v;
    if ((tab = table) != null && size > 0) {
    	//遍历桶
        for (int i = 0; i < tab.length; ++i) {
        	//遍历桶中的每个节点元素
            for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                if ((v = e.value) == value ||
                    (value != null && value.equals(v)))
                    return true;
            }
        }
    }
    return false;
}

1.3.5 根据键删除键值对

HashMap 删除元素首先是要找到桶的位置，然后如果是链表，则进行链表遍历，找到需要删除的元素后，进行删除；如果是红黑树，也是进行树的遍历，找到元素删除后，进行平衡调节，注意，当红黑树的节点数小于 6 时，会转化成链表。

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
     //(n - 1) & hash 找到桶的位置
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        //如果键的值与链表第一个节点相等，则将 node 指向该节点
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        //如果桶节点存在下一个节点
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)//结点为红黑树
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);//找到需要删除的红黑树节点
            else {//结点为链表
                do {//遍历链表，找到待删除的节点
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
         //删除节点，并进行调节红黑树平衡
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

1.3.6 遍历元素

首先构造一个 HashMap 集合：

HashMap<String, Integer> map = new HashMap<>();
map.put("a", 10);
map.put("b", 20);
map.put("c", 30);

①、分别获取 key 集合和 value 集合。

for(String key : map.keySet()){
	System.out.println(key);
}
for(Integer value : map.values()){
	System.out.println(value);
}

②、获取 key 集合，然后遍历key集合，根据key分别得到相应value

Set<String> keySet = map.keySet();
for(String str : keySet){//增强for循环实际的底层就是Iterator实现的
	System.out.println(str+": "+map.get(str));
}

③、得到 Entry 集合，然后遍历 Entry

Set<Map.Entry<String, Integer>> entrySet = map.entrySet();
for(Map.Entry<String, Integer> entry: entrySet){
	System.out.println(entry.getKey()+": "+entry.getValue());
}

④、迭代（使用Iterator）

Iterator<Map.Entry<String, Integer>> ite = map.entrySet().iterator();
while(ite.hasNext()){
	Map.Entry<String, Integer> mapEntry = ite.next();
	System.out.println(mapEntry.getKey()+": "+mapEntry.getValue());
}

基本上使用第三种方法是性能最好的，
第一种遍历方法在我们只需要 key 集合或者只需要 value 集合时使用；
第二种方法效率很低，不推荐使用；
第四种方法效率也挺好，关键是在遍历的过程中我们可以对集合中的元素进行删除。

1.3 HashMap 总结

①、基于JDK1.8的HashMap是由数组+链表+红黑树组成，当链表长度超过 8 时会自动转换成红黑树，当红黑树节点个数小于 6 时，又会转化成链表。相对于早期版本的 JDK HashMap 的实现，新增了红黑树作为底层数据结构，在数据量较大且哈希碰撞较多时，能够极大的增加检索的效率。
②、允许 key 和 value 都为 null。允许 key 重复，key 重复会被覆盖，value 允许重复。
③、不是线程安全的。
④、无序（遍历HashMap得到元素的顺序不是按照插入的顺序）
⑤、根据键保存和获取值的效率比较高，为O(1)。

2. 剖析 HashSet

在上面我们提到过 Set 接口，Map 接口的两个方法 keySet 和 entrySet 返回的都是 Set，这里介绍 Set 接口的一个重要实现类 HashSet。与 HashMap 类似，字面上，HashSet 由两个单词构成，Hash 和 Set。其中，Set 表示接口，实现 Set 的方式也有多种，各有特点，HashSet 实现的方式利用了 Hash，下面，我们来看一下 HashSet 的用法，然后看实现原理，最后总结分析 HashSet 的特点。

2.1 Set 接口

Set 表示的是没有重复元素、且不保证顺序的容器接口，它扩展了 Collection，但没有定义任何新的方法，不过，对于其中的一些方法，它有自己的规范。Set 接口的完整定义如下：

public interface Set<E> extends Collection<E> {
    // Query Operations
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);

    // Modification Operations
    boolean add(E e);
    boolean remove(Object o);

    // Bulk Operations
    boolean containsAll(Collection<?> c);
    boolean addAll(Collection<? extends E> c);
    boolean retainAll(Collection<?> c);
    boolean removeAll(Collection<?> c);
    void clear();

    // Comparison and hashing
    boolean equals(Object o);
    int hashCode();
    @Override
    default Spliterator<E> spliterator() {
        return Spliterators.spliterator(this, Spliterator.DISTINCT);
    }
}

2.1 HashSet

与 HashMap 类似， HashSet 的构造方法有：

public HashSet() 
public HashSet(int initialCapacity) 
public HashSet(int initialCapacity, float loadFactor)
HashSet(int initialCapacity, float loadFactor, boolean dummy) 
public HashSet(Collection<? extends E> c)

initialCapacity 和 loadFactor 的含义和 HashMap 中的一样。

与 HashMap 类似，HashSet 也要求元素重写 hashCode 和 equals 方法，且对于两个对象，如果 equals 相同，则 hashCode 也必须相同，如果是自定义类，则需要注意这一点。

HashSet 有很多应用场景，比如：
（1）排重，如果对排重后的元素没有顺序要求，那么HashSet 可以很方便的用于排重。
（2）保存特殊值，根据是否为特殊值判断是否进行特殊处理
（3）集合运算。

2.3 HashSet 实现原理

2.3.1 HashSet 内部组成

HashSet 内部是用 HashMap 实现的，它的内部有一个 HashMap 的实例变量，如下所示：

private transient HashMap<E,Object> map;

我们知道，Map 有键和值，HashSet 相当于只有键，值都是相同的固定值，这个值的定义如下：

private static final Object PRESENT = new Object();

2.3.2 HashSet 构造方法

HashSet 中有五个构造方法：

/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance 
* has default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
    map = new HashMap<>();
}
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has the specified initial capacity and 
* default load factor (0.75).
*/
public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}  
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* the specified initial capacity and the specified load factor.
*/
public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<>(initialCapacity, loadFactor);
}
/**
* Constructs a new, empty linked hash set.  (This package private constructor is only used
* by LinkedHashSet.) The backing HashMap instance is a LinkedHashMap with the specified initial
* capacity and the specified load factor.
*/
HashSet(int initialCapacity, float loadFactor, boolean dummy) {//不是对外开放的构造方法
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}
/**
 * Constructs a new set containing the elements in the specified
 * collection.  The <tt>HashMap</tt> is created with default load factor
 * (0.75) and an initial capacity sufficient to contain the elements in
 * the specified collection.    
*/  
public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

我们可以看到，HashSet 的构造方法主要是调用了 HashMap 的构造方法。
接受 Collection 参数的构造方法稍微不一样，其中代码为：

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

c.size()/.75f 用于计算 initialCapacity，0.75f 是 loadFactor 的默认值。
另外，HashSet 中比 HashMap 多了一个带参构造方法（用于LinkedHashSet实现，不对外开放），传入HashMap的初始容量和加载因子，dummy参数用来和上一个构造函数作区分，表明实例化一个LinkedHashMap：

HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}

该方法调用了 LinkedHashMap 的构造方法：

public LinkedHashMap(int initialCapacity, float loadFactor) {
    super(initialCapacity, loadFactor);//调用了HashMap的构造方法
    accessOrder = false;
}

2.3.3 增加元素

我们来看一下 add 方法：

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

就是调用 map 的 put 方法，元素 e 用于键，值就是固定的 PRESENT，put 返回 null 表示原来没有对应的键，添加成功了。HashMap 中的一个键只会保存一份，所以重复添加 HashMap 不会有变化。

2.3.3 查找方法

检查是否包含元素，代码为：

public boolean contains(Object o) {
    return map.containsKey(o);
}

就是检查 map 中是否包含对应的键。

2.3.4 删除元素

public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
}

就是调用 map 的 remove 方法，返回值为 PRESENT 表示原来有对应的键且删除了。

2.3.5 遍历元素

HashSet 中迭代器的代码为：

public Iterator<E> iterator() {
    return map.keySet().iterator();
}

就是返回 map 的 keySet 的迭代器。

其实际应用如下：
首先构造一个 HashSet 集合：

HashSet<String> set = new HashSet<>();
set.add("a");
set.add("b");
set.add("c");

①、使用 Iterator 遍历

Iterator<String> ite = set.iterator();
	while(ite.hasNext()){
		System.out.println(ite.next());
	}
}

②、使用增强 for 循环遍历

for(String key: set){
	System.out.println(key);
}

2.3 HashSet 总结

①、内部实现利用了HashMap
②、没有重复元素
③、可以高效的添加、删除元素、判断元素是否存在，效率为 O(1)。
④、没有顺序
⑤、HashSet不是线程安全的
⑥、集合元素可以是null

Nancy_G

关注

4
点赞
踩
2

收藏

觉得还不错? 一键收藏
4
评论
剖析 HashMap 和 HashSet

剖析 HashMap 和 HashSet1. 剖析 HashMap1.1 Map 接口1.2 HashMap1.3 HashMap 实现原理1.3.1 内部组成1.3.2 默认构造方法1.3.3 保存键值对1.3.4 查找方法1.3.5 根据键删除键值对1.3.6 遍历元素1.3 HashMap 总结1. 剖析 HashMap1.1 Map 接口Map 有键和值的概念，一个键映射到一个值，Map 按照键存储和访问值，键不能重复，给同一个键重复赋值会覆盖原来的值。数组、Arr
复制链接

扫一扫