java基础之HashSet、HashMap详解

我们都知道java集合中有两个重要的对象HashSet和HashMap,为什么处于这么重要的位置呢,首先set集合中我们存放的是一个没有重复对象的集合,这给我们编程提供了非常方便的操作,我们不用担心set集合中会有两个重复的对象,但是也会有缺点,我们遍历会存在一定麻烦;然后就是我们的map,我们的map存放的key-value的形式了,跟我们对象中的属性和属性值类似的东西,那不知道大多数朋友们知道其中的原理了吗?我们今天通过源码来分析下我们的set和map集合。

既然使用,我们还是从初始化开始(Set set = new HashSet()),然后add(new Object());

    //HashSet中维护着一个map??
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    //add方法也是调用的map对象中的put方法(e作为key,PRESENT = new Object()作为value)
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

我们可以看到我们初始化Set时会同时初始化了map,我们add添加对象时,调用的map对象中的put方法,因为我们的map是key为唯一的key-value形式,所以我们的hashSet就是依照这个关系来保证了对象唯一的。所以我们重点去看下我们的hashMap的put方法如何保证了key的唯一性。

下面是map初始化时初始化的部分代码,除了赋予loadFactor值外,其他都为默认、table为null、entySet为null等

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    //map维护的Node数组
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    //map存放的key对应的set集合
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    //map集合的大小
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    //修改此map的次数
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;   


     /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    //初始化时,只初始化了loadFactor,其他都为默认
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

初始化之后是我们的put方法。拿我们的set.add()方法去看,我们重点看我们key是如何保证唯一的。

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    //set中的add方法,很明显我们可以看到调用的是map.put方法
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

    //=====上面是set中的代码,下面是map中的方法,贴到一块了============

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    //map插入键值对执行的方法
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */

    //调用key.hashCode()方法 并且于该值无符号右移16位 异或取值
    /**
     *曾经我一度想知道这个值是多少,但是发现就算知道也没什么用
     *我们只需要知道同一个对象调用这个方法之后,返回的int值是一样的就够了
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    //这是我们要看的重点方法了
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

在分析putVal方法之前,我先去网上找了一张map的结构图,这样更清晰一点:

可以看到我们的map的结构是数组单链表共同组合成的一种结构,数组也就是我们的table,然而我们的table的类型是Node类型的节点

Node结构:我们可以清楚的看到其中存放的有一个hash,key,value还有一个关键的next指向链接起来的后面的节点(从而构成了单链表) 

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    //这个是我们的Node节点的结构,我们可以清楚的看到next指向的就是下个节点,从而构成单链表
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

 我们现在逐句来看下我们的putVal方法都进行了什么操作:

首次插入初始化table数组

    //声明变量没什么看的
    Node<K,V>[] tab; Node<K,V> p; int n, i;

    //首先判断该table数组是否为null,首次插入key-value时table为null
    if ((tab = table) == null || (n = tab.length) == 0)

        //初始化我们的table,默认长度为16,talbe = (Node<K,V>[])new Node[16]
        //感兴趣的朋友可以去看下resize方法详细步骤
         n = (tab = resize()).length;

判断table数组对应位置是否为null,如果为null,插入该位置对应单链表的首个值

 我们的n为16,一个int的hash值与(16 - 1)进行&操作不管是什么值,我们只需要知道肯定是个不大于15的值就行,为什么是15?(因为我们刚才初始化table数组时长度为16(下标0-15),所以我们保证不超过这个值就行,不是很明白的可以回过头看map的结构图)

举例:比如我们插入的key的hash为0000 1000,这个数与15进行&操作

tab[8] == null?我们第一次插入值肯定为null啊,毋庸置疑,此时我们new了一个next->null的Node节点赋给了tab[8],有人说不是从下标0开始的吗?map没有规定一定要从下标为0的开始,只要是0-15随意

       if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

执行完if语句之后,此时我们的table就有键值对了,tab在下标为8的位置有一个只有一个key-value(我们拿set做例子,自然key就是我们的key,但是value只是一个new Object而已,下面的例子都是这样的key-value不做解释了)节点的单链表

 第二次插入key-value

我们接着看我们的else语句,假设我们第二次插入的key-value,并且同样也插入table下标为8位置:

    else {
            Node<K,V> e; K k;

            //我们从if条件中可以看到p节点即为table下标8位置的节点(也就是我们首次插入的节点Node)
            //我们可以看到这个判断条件是去比较将要插入的key于p节点(也就是该位置的单链表的第一个节点)的key的hash是否相同
            //如果相同则将p赋予e,否则看我们的else
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;

            //这个条件可以忽略,有兴趣的朋友可以研究下TreeNode和Node的区别
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

            //我们直接来看如果key的hash不相同的情况
            else {

                //我们可以看到此循环的目的是拿到node.next->null的节点(也就是talbe下标为8的单链表的最后一个节点)
                for (int binCount = 0; ; ++binCount) {
                    //如果循环结束直接链接在最后
                    if ((e = p.next) == null) {

                        //找到node.next -> null的节点是将next-> 新够成的节点newNode
                        //构成单链表
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //否则就判断此节点于目标节点(new节点)key是否相同(hash是否相同)
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }

总结一下上面的代码:

首先我们先判断此单链表的第一个节点是否与目标节点(新节点)的key是同对象(也就是hash是否相同),如果相同拿出此节点赋予e变量,如果不相同,则循环单链表,如果有key相同的拿出相同的节点赋予e变量,如果循环结束都没有则链接在此链表的最后。

也就是如果此链表有key相同的将链表中已有的节点拿出来,没有key相同的链接在此链表最后

拿到key对象相同的node节点,次节点可能为null(不存在key相同的node),替换老value值

    //这个e变量也就是上面拿到的重复key的node,如果没有则为null
    //如果e不为null,新的value值将覆盖oldvalue,最终返回
    if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }

上面代码就是如果e不为null,说明存在key相同的键,新value替换老value并返回

        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;

最后这些就没什么重要的了,如果没有重复的key,则返回null。

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值