java基础之HashSet、HashMap详解

最新推荐文章于 2024-03-15 17:12:24 发布

zhao_xinhu

最新推荐文章于 2024-03-15 17:12:24 发布

阅读量674

点赞数 1

分类专栏： Java 文章标签： java set map 源码分析

本文链接：https://blog.csdn.net/zhao_xinhu/article/details/82740652

版权

Java 专栏收录该内容

19 篇文章 1 订阅

订阅专栏

我们都知道java集合中有两个重要的对象HashSet和HashMap，为什么处于这么重要的位置呢，首先set集合中我们存放的是一个没有重复对象的集合，这给我们编程提供了非常方便的操作，我们不用担心set集合中会有两个重复的对象，但是也会有缺点，我们遍历会存在一定麻烦；然后就是我们的map，我们的map存放的key-value的形式了，跟我们对象中的属性和属性值类似的东西，那不知道大多数朋友们知道其中的原理了吗？我们今天通过源码来分析下我们的set和map集合。

既然使用，我们还是从初始化开始（Set set = new HashSet()），然后add(new Object());

    //HashSet中维护着一个map？？
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    //add方法也是调用的map对象中的put方法(e作为key,PRESENT = new Object()作为value)
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

我们可以看到我们初始化Set时会同时初始化了map，我们add添加对象时，调用的map对象中的put方法，因为我们的map是key为唯一的key-value形式，所以我们的hashSet就是依照这个关系来保证了对象唯一的。所以我们重点去看下我们的hashMap的put方法如何保证了key的唯一性。

下面是map初始化时初始化的部分代码，除了赋予loadFactor值外，其他都为默认、table为null、entySet为null等

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    //map维护的Node数组
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    //map存放的key对应的set集合
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    //map集合的大小
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    //修改此map的次数
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;   


     /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    //初始化时，只初始化了loadFactor，其他都为默认
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

初始化之后是我们的put方法。拿我们的set.add()方法去看，我们重点看我们key是如何保证唯一的。

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    //set中的add方法，很明显我们可以看到调用的是map.put方法
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

    //=====上面是set中的代码，下面是map中的方法，贴到一块了============

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    //map插入键值对执行的方法
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */

    //调用key.hashCode()方法 并且于该值无符号右移16位 异或取值
    /**
     *曾经我一度想知道这个值是多少，但是发现就算知道也没什么用
     *我们只需要知道同一个对象调用这个方法之后，返回的int值是一样的就够了
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    //这是我们要看的重点方法了
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

在分析putVal方法之前，我先去网上找了一张map的结构图，这样更清晰一点：

可以看到我们的map的结构是数组和单链表共同组合成的一种结构，数组也就是我们的table，然而我们的table的类型是Node类型的节点

Node结构：我们可以清楚的看到其中存放的有一个hash，key，value还有一个关键的next指向链接起来的后面的节点（从而构成了单链表）

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    //这个是我们的Node节点的结构，我们可以清楚的看到next指向的就是下个节点，从而构成单链表
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

我们现在逐句来看下我们的putVal方法都进行了什么操作：

首次插入初始化table数组

    //声明变量没什么看的
    Node<K,V>[] tab; Node<K,V> p; int n, i;

    //首先判断该table数组是否为null，首次插入key-value时table为null
    if ((tab = table) == null || (n = tab.length) == 0)

        //初始化我们的table，默认长度为16，talbe = (Node<K,V>[])new Node[16]
        //感兴趣的朋友可以去看下resize方法详细步骤
         n = (tab = resize()).length;

判断table数组对应位置是否为null，如果为null，插入该位置对应单链表的首个值

我们的n为16，一个int的hash值与（16 - 1）进行&操作不管是什么值，我们只需要知道肯定是个不大于15的值就行，为什么是15？（因为我们刚才初始化table数组时长度为16（下标0-15），所以我们保证不超过这个值就行，不是很明白的可以回过头看map的结构图）

举例：比如我们插入的key的hash为0000 1000，这个数与15进行&操作

tab[8] == null?我们第一次插入值肯定为null啊，毋庸置疑，此时我们new了一个next->null的Node节点赋给了tab[8]，有人说不是从下标0开始的吗？map没有规定一定要从下标为0的开始，只要是0-15随意

       if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

执行完if语句之后，此时我们的table就有键值对了，tab在下标为8的位置有一个只有一个key-value(我们拿set做例子，自然key就是我们的key，但是value只是一个new Object而已，下面的例子都是这样的key-value不做解释了)节点的单链表

第二次插入key-value

我们接着看我们的else语句，假设我们第二次插入的key-value，并且同样也插入table下标为8位置：

    else {
            Node<K,V> e; K k;

            //我们从if条件中可以看到p节点即为table下标8位置的节点（也就是我们首次插入的节点Node）
            //我们可以看到这个判断条件是去比较将要插入的key于p节点（也就是该位置的单链表的第一个节点）的key的hash是否相同
            //如果相同则将p赋予e，否则看我们的else
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;

            //这个条件可以忽略，有兴趣的朋友可以研究下TreeNode和Node的区别
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

            //我们直接来看如果key的hash不相同的情况
            else {

                //我们可以看到此循环的目的是拿到node.next->null的节点（也就是talbe下标为8的单链表的最后一个节点）
                for (int binCount = 0; ; ++binCount) {
                    //如果循环结束直接链接在最后
                    if ((e = p.next) == null) {

                        //找到node.next -> null的节点是将next-> 新够成的节点newNode
                        //构成单链表
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //否则就判断此节点于目标节点（new节点）key是否相同（hash是否相同）
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }

总结一下上面的代码：

首先我们先判断此单链表的第一个节点是否与目标节点（新节点）的key是同对象（也就是hash是否相同），如果相同拿出此节点赋予e变量，如果不相同，则循环单链表，如果有key相同的拿出相同的节点赋予e变量，如果循环结束都没有则链接在此链表的最后。

也就是如果此链表有key相同的将链表中已有的节点拿出来，没有key相同的链接在此链表最后

拿到key对象相同的node节点，次节点可能为null（不存在key相同的node），替换老value值

    //这个e变量也就是上面拿到的重复key的node，如果没有则为null
    //如果e不为null，新的value值将覆盖oldvalue，最终返回
    if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }

上面代码就是如果e不为null，说明存在key相同的键，新value替换老value并返回

        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;

最后这些就没什么重要的了，如果没有重复的key，则返回null。

zhao_xinhu

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
java基础之HashSet、HashMap详解

我们都知道java集合中有两个重要的对象HashSet和HashMap，为什么处于这么重要的位置呢，首先set集合中我们存放的是一个没有重复对象的集合，这给我们编程提供了非常方便的操作，我们不用担心set集合中会有两个重复的对象，但是也会有缺点，我们遍历会存在一定麻烦；然后就是我们的map，我们的map存放的key-value的形式了，跟我们对象中的属性和属性值类似的东西，那不知道大多数朋友们知道...
复制链接

扫一扫