Java8 - WeakHashMap源码

最新推荐文章于 2021-06-21 09:54:51 发布

纽西兰牛小扒

最新推荐文章于 2021-06-21 09:54:51 发布

阅读量4.5k

点赞数 1

分类专栏： java 文章标签： java 源码

本文链接：https://blog.csdn.net/u013124587/article/details/53048848

版权

java 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

一、概述

在 WeakHashMap 的源码中有一大段介绍，这次就不贴出来了，总结了一下，内容大致如下：

这是一个基于弱键实现的，支持null值和null键的 Map 类，当一个键不再被使用时，它便会自动地从 WeakHashMap 中删除。一旦某个键值对被删除了，那么你再也无法从这个容器中查找到这个键值对，你大可不必为这种情况感到惊讶，因为这就是这个容器的特性。当然，该容器的行为是由垃圾收集器决定的，我们并不知道何时会进行GC，所以也无法猜测到容器中不再被使用的键值对何时会被清除。
另外有一点需要注意的是，WeakHashMap 中的值对象由普通的强引用所持有，因此应该确保值对象不会直接或间接地强引用其自身的键，因为这会导致键无法被自动删除。

在继续往下看之前，有必要先了解一下 Java 中的几种引用：

强引用（Strong Reference）：对于具有强引用的对象来说，它们不会被 JVM 进行回收，即使内存空间不足，JVM 宁愿抛出 OOM 也不会动它们一根汗毛。强引用是最常使用，我们平时所使用的引用，都是强引用，例如 String s = new String(“Hello”)。
软引用（Soft Reference ）：对于具有软引用的对象来说，如果当前内存充足的话，垃圾收集器就不会管它们，不过一旦内存不足了，它们就会被垃圾收集器回收掉。
弱引用（Weak Reference）：对于具有弱引用的对象来说，它是最没有人权的，一旦发生GC，它将会是第一个目标，不管当前内存是否充足。
虚引用（Phantom Reference）：虚引用与其他三种不同，它必须与 ReferenceQueue 一起使用才有意义。

想要更加详细的了解，可以先看看第五节。

二、属性

//哈希表默认的容量大小
private static final int DEFAULT_INITIAL_CAPACITY = 16;

//哈希表的容量上限
private static final int MAXIMUM_CAPACITY = 1 << 30;

//默认的负载因子
private static final float DEFAULT_LOAD_FACTOR = 0.75f;

//哈希表
Entry<K,V>[] table;

//哈希表中键值对的个数
private int size;

//扩容的阈值（通过当前容量和负载因子计算得到）
private int threshold;

//负载因子
private final float loadFactor;

//引用队列，用于清除弱键
private final ReferenceQueue<Object> queue = new ReferenceQueue<>();

//null键所使用的对象
private static final Object NULL_KEY = new Object();

除了以上属性之外，还有一个必须了解的就是 Entry 类的定义：

private static class Entry<K,V> extends WeakReference<Object> implements Map.Entry<K,V> {
    V value;
    final int hash;
    Entry<K,V> next;

    Entry(Object key, V value,
          ReferenceQueue<Object> queue,
          int hash, Entry<K,V> next) {
        super(key, queue);
        this.value = value;
        this.hash  = hash;
        this.next  = next;
    }

    @SuppressWarnings("unchecked")
    public K getKey() {
        return (K) WeakHashMap.unmaskNull(get());
    }

    ...
}

Entry 是 WeakHashMap 的一个静态内部类，它继承自 WeakReference 类。我们可以发现在 Entry 中没有 key 这个属性，在 Entry 的构造方法中，它通过调用 super(key, queue)，将 key 交给了父类来管理，从而把 key 转变为一个弱键。

那么对于一个 Entry 对象来说，它是如何获取到 key 的呢？我们可以看到在 Entry 类中有一个 getKey 方法，该方法中调用到了 get 方法，而这个 get 方法是在 Reference 类（WeakReference 的父类）定义的，它返回的正是 super 中所传的 key 对象。

三、方法

在进入正题之前，先来看一下容器内部几个比较简单的方法：

//把null键替换成NULL_KEY对象
private static Object maskNull(Object key) {
    return (key == null) ? NULL_KEY : key;
}

//与上面方法相反
static Object unmaskNull(Object key) {
    return (key == NULL_KEY) ? null : key;
}

//哈希值的计算方式
final int hash(Object k) {
    int h = k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

//获取哈希表中键值对的个数
public int size() {
    if (size == 0)
        return 0;
    expungeStaleEntries();
    return size;
}

1、get方法

WeakHashMap 采用拉链法来处理冲突，并没有跟 HashMap 一样使用红黑树，所以它的 get 方法跟普通的哈希表一样，比较简单。其中要注意的就只有 e.get() 的 get 方法，它是在 Reference 类中定义的，如前文第二节中所说的一致。

public V get(Object key) {
    Object k = maskNull(key);
    //计算哈希值
    int h = hash(k);
    Entry<K,V>[] tab = getTable();
    //定位到key对应的桶
    int index = indexFor(h, tab.length);
    Entry<K,V> e = tab[index];
    //遍历桶上的节点
    while (e != null) {
        //其中e.get()中的get方法是在Reference类中定义的
        if (e.hash == h && eq(k, e.get()))
            return e.value;
        e = e.next;
    }
    return null;
}

2、put方法

put 方法的实现也与普通的哈希表差不多，这里就不多作解释，源码如下：

public V put(K key, V value) {
    Object k = maskNull(key);
    //计算哈希值
    int h = hash(k);
    Entry<K,V>[] tab = getTable();
    //计算key映射到哪个桶
    int i = indexFor(h, tab.length);

    //遍历桶上有没有与该key相同的对象
    for (Entry<K,V> e = tab[i]; e != null; e = e.next) {
        if (h == e.hash && eq(k, e.get())) {
            V oldValue = e.value;
            if (value != oldValue)
                e.value = value;
            return oldValue;
        }
    }

    modCount++;
    Entry<K,V> e = tab[i];
    //插入该键值对
    tab[i] = new Entry<>(k, value, queue, h, e);
    //键值对数量加一，判断是否需要进行扩容
    if (++size >= threshold)
        resize(tab.length * 2);
    return null;
}

如果看过我的前几篇关于 HashMap 的文章，那么这个扩容的方法也是非常好理解的。从 put 方法中我们可以看出 newCapacity 是当前哈希表的两倍，也就是说，每次扩容都扩成当前哈希表的两倍长度，其中 transfer 方法的作用是把当前哈希表中的键值对转移到新的哈希表中去，即进行 rehash 操作。扩容方法的源码如下：

void resize(int newCapacity) {
    Entry<K,V>[] oldTable = getTable();
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry<K,V>[] newTable = newTable(newCapacity);
    transfer(oldTable, newTable);
    table = newTable;

    if (size >= threshold / 2) {
        threshold = (int)(newCapacity * loadFactor);
    } else {
        expungeStaleEntries();
        transfer(newTable, oldTable);
        table = oldTable;
    }
}

3、remove方法

remove 方法的实现也与普通的哈希表差不多，这里就不多作解释，源码如下：

public V remove(Object key) {
    Object k = maskNull(key);
    //计算哈希值
    int h = hash(k);
    Entry<K,V>[] tab = getTable();
    //定位到key对应的桶
    int i = indexFor(h, tab.length);
    //获取该桶
    Entry<K,V> prev = tab[i];
    Entry<K,V> e = prev;

    //遍历桶上的键值对，找到对应的key
    while (e != null) {
        Entry<K,V> next = e.next;
        if (h == e.hash && eq(k, e.get())) {
            modCount++;
            size--;
            if (prev == e)
                tab[i] = next;
            else
                prev.next = next;
            return e.value;
        }
        prev = e;
        e = next;
    }

    return null;
}

4、expungeStaleEntries方法

看完了 WeakHashMap 中最常用的几个方法之后，我们并没有发现它有什么特别之处，那么它是怎么对弱键进行回收的呢？接下来这个方法才是 WeakHashMap 中最重要的方法，也是 WeakHashMap 的灵魂所在。

还记得前文中定义的 ReferenceQueue 吗？ReferenceQueue 是作为 GC 与管理 Reference 对象之间消息传递的一座桥梁，它使得我们可以对所监视的对象引用的可达性发生变化时，做一些相应的处理，WeakHashMap 正是通过这样实现的。一旦垃圾收集器把某个 key 回收了，那么该 key 对应的 Entry 对象就会被自动添加到这个队列里面，当然，何时添加，怎么添加，这个操作不归 WeakHashMap 管，对于 WeakHashMap 来说是透明的，无感知的。expungeStaleEntries 这个方法的作用就是从这个队列中获取 Entry 对象，并把这个对象从哈希表中移除。源码如下：

private void expungeStaleEntries() {
    //获取队列中的元素
    for (Object x; (x = queue.poll()) != null; ) {
        synchronized (queue) {
            //从这里我们可以知道队列中保存的是Entry对象
            @SuppressWarnings("unchecked")
                Entry<K,V> e = (Entry<K,V>) x;
            //计算对象e对应的桶
            int i = indexFor(e.hash, table.length);

            Entry<K,V> prev = table[i];
            Entry<K,V> p = prev;
            //遍历桶上的节点，找到并删除与e相等的对象
            while (p != null) {
                Entry<K,V> next = p.next;
                //此时entry对象中的key已经被回收了，所以不能通过e.get()方式来比较
                if (p == e) {
                    if (prev == e)
                        table[i] = next;
                    else
                        prev.next = next;
                    // Must not null out e.next;
                    // stale entries may be in use by a HashIterator
                    e.value = null; // Help GC
                    size--;
                    break;
                }
                prev = p;
                p = next;
            }
        }
    }
}

WeakHashMap 的 get，put，remove 等大部分方法都会间接或直接的调用到 expungeStaleEntries 方法，所以每次对哈希表进行操作的时候，它都会先去跟 referenceQueue 进行同步，从而保证一旦对象被 GC 回收掉，就把它从哈希表中移除。

四、总结

WeakHashMap 中的 Entry 对象继承了 WeakReference，它把 key 封装成一个弱引用对象。对于弱引用对象来说，其引用的存在不会阻止该对象被GC（或者可以这么说，其引用的存在不会改变该对象的可达性。也就是说如果一个对象不可达，那么为其添加弱引用之后，JVM还是会认为它是不可达的）。也就是说，当发生GC的时候，如果一个对象只被弱引用持有，那么这个对象还是会被当成垃圾回收掉的。例如下面这份代码：

public static void main(String[] args) {
    Map<String, String> map = new WeakHashMap<>();
    String hello = new String("hello");
    map.put(hello, "world");
    hello = null; // help GC
    System.gc();
    for(Map.Entry<String, String> entry : map.entrySet()) {
        System.out.println(entry.getKey() + "," + entry.getValue());
    }
}

如果使用的是 HashMap，那么上述代码会打印出 “hello,world”。但是如果使用的是WeakHashMap，那么上述代码什么都不会打印。在执行了 hello=null; 之后，”hello” 这个字符串对象就只有 WeakHashMap 对它持有的一个弱引用，在发生 GC 之后，此时容器里面的 “hello” 对象已经被删除掉了，现在这个map是一个空的容器，所以什么都不会打印。

所以对于 WeakHashMap 来说，当容器中的某个键没有被外部强引用时，在发生 GC 之后，这个键就会自动的从容器中删除。在 WeakHashMap 中，被 GC 回收掉的键会被添加到 WeakHashMap 的 ReferenceQueue 队列中。当下一次我们对 WeakHashMap 进行操作时，会先同步 table 和 queue（table 中保存了全部的键值对，而 queue 中保存着被 GC 回收掉的 key对于的 Entry 对象，同步它们就是删除 table 中被 GC 回收的键值对）。

五、补充

如果想要深入了解 WeakHashMap，那么就必须先了解 ReferenceQueue 和 WeakReference。

1、ReferenceQueue

Reference queues, to which registered reference objects are appended by the garbage collector after the appropriate reachability changes are detected.

首先来看看 ReferenceQueue 这个类，上面是源码中的一段描述，在源码中它是这样描述的：当检测到可达性发生改变后，垃圾收集器会将已经注册的引用对象添加到这个引用队列中。

仔细一看，这个类的实现很简单，它实现了一个队列的基本操作，不过它强制指定了 Reference 作为其泛型，并且队列是通过 Reference 作为节点，以链表结构的形式实现的，另外在其实现的操作中，还可以发现它与 sun.misc.VM 打交道。而且它的入队操作方法并不是一个 public 方法，从注释中我们可以看见，这个 enqueue 方法只被 Reference 类所使用。

boolean enqueue(Reference<? extends T> r); /* Called only by Reference class */

2、WeakReference

首先来看看 WeakReference 这个类，它的作用在第一节就已经介绍过了，我们直接看它的源码，它的源码非常短，整个类的源码就几行代码，如下：

public class WeakReference<T> extends Reference<T> {

    public WeakReference(T referent) {
        super(referent);
    }

    public WeakReference(T referent, ReferenceQueue<? super T> q) {
        super(referent, q);
    }

}

可以看见，WeakReference 非常简单，它是继承自 Reference 类，其中的两个构造方法都调用了父类的构造方法，并且没有再定义其他的方法了，所以接下来我们只能通过其父类来寻找真相了。

3、Reference

Abstract base class for reference objects. This class defines the operations common to all reference objects. Because reference objects are implemented in close cooperation with the garbage collector, this class may not be subclassed directly.

上面是 Reference 类中的一段注释，它提到了 Reference 对象的实现与垃圾收集器紧密相关。Reference 类中的属性如下：

//一个被GC特别关照的对象引用
private T referent;         /* Treated specially by GC */

//引用队列，这个队列并没有在创建对象时创建，只能通过构造函数传入，或者为ReferenceQueue.NULL
volatile ReferenceQueue<? super T> queue;

/* When active:   NULL
 *     pending:   this
 *    Enqueued:   next reference in queue (or this if last)
 *    Inactive:   this
 */
Reference next;

/* When active:   next element in a discovered reference list maintained by GC (or this if last)
 *     pending:   next element in the pending list (or null if last)
 *   otherwise:   NULL
 */
transient private Reference<T> discovered;  /* used by VM */

/* List of References waiting to be enqueued.  The collector adds
 * References to this list, while the Reference-handler thread removes
 * them.  This list is protected by the above lock object. The
 * list uses the discovered field to link its elements.
 */
private static Reference<Object> pending = null;

在源码中，我们还发现了 Reference 的 enqueue 方法实际上就是调用 ReferenceQueue 队列的方法。但是在注释中，它是这样解释的：这个方法仅由 Java 代码调用，垃圾收集器在往队列里添加引用对象时不会通过此方法，而是直接入队。这里的直接入队是指直接调用 ReferenceQueue 的 enqueue 方法，对应了前文的“这个 enqueue 方法只被 Reference 类所使用”。

/**
 * This method is invoked only by Java code; when the garbage collector
 * enqueues references it does so directly, without invoking this method.
 */
public boolean enqueue() {
    return this.queue.enqueue(this);
}

接下来，在 Reference 类中还定义了一个 ReferenceHandler，它继承了 Thread 类，并且 ReferenceHandler 这个类是由 Reference 类的静态代码块初始化的。也就是说一旦 Reference 类被加载，它就会启动 ReferenceHandler，并且这个线程的优先级被设置为最高的。从下面的代码中，我们大概可以看出，这个线程的作用就是把当前 pending 队列中的对象添加到 ReferenceQueue 中。

private static class ReferenceHandler extends Thread {

    ReferenceHandler(ThreadGroup g, String name) {
        super(g, name);
    }

    public void run() {
        for (;;) {
            Reference<Object> r;
            synchronized (lock) {
                if (pending != null) {
                    r = pending;
                    pending = r.discovered;
                    r.discovered = null;
                } else {
                    // The waiting on the lock may cause an OOME because it may try to allocate
                    // exception objects, so also catch OOME here to avoid silent exit of the
                    // reference handler thread.
                    //
                    // Explicitly define the order of the two exceptions we catch here
                    // when waiting for the lock.
                    //
                    // We do not want to try to potentially load the InterruptedException class
                    // (which would be done if this was its first use, and InterruptedException
                    // were checked first) in this situation.
                    //
                    // This may lead to the VM not ever trying to load the InterruptedException
                    // class again.
                    try {
                        try {
                            lock.wait();
                        } catch (OutOfMemoryError x) { }
                    } catch (InterruptedException x) { }
                    continue;
                }
            }

            // Fast path for cleaners
            if (r instanceof Cleaner) {
                ((Cleaner)r).clean();
                continue;
            }

            ReferenceQueue<Object> q = r.queue;
            if (q != ReferenceQueue.NULL) q.enqueue(r);
        }
    }
}

//静态构造代码块
static {
    ThreadGroup tg = Thread.currentThread().getThreadGroup();
    for (ThreadGroup tgn = tg;
         tgn != null;
         tg = tgn, tgn = tg.getParent());
    Thread handler = new ReferenceHandler(tg, "Reference Handler");
    /* If there were a special system-only priority greater than
     * MAX_PRIORITY, it would be used here
     */
    handler.setPriority(Thread.MAX_PRIORITY);
    handler.setDaemon(true);
    handler.start();
}

至此，结合 pending 属性的解释和 referenceQueue，我们可以大致推断出：一旦 Reference 类被加载，那么它就会启动一个线程来收集 JVM GC 时回收掉的引用，然后把这些引用放到引用队列中。而 GC 时回收掉的引用会先被放到 pending 这个队列中，然后再由这个线程将引用放到引用队列，所以 pending 是由 JVM 来赋值的。当我们提供一个引用队列给 Reference 时，那么我们就可以通过这个引用队列来获取 JVM GC 时的一些反馈，这个引用队列就相当于为 JVM GC 和 Reference 对象之间的消息传递提供了一个平台，而 WeakHashMap 正是借由此队列来实现的。

那么对于 WeakHashMap 中一个键值对的移除过程，大致可以概括如下：一旦 WeakHashMap 中的一个弱键被 GC 掉，那么 JVM 会把这个键值对对象添加到 Reference 类的 pending 队列中，之后 ReferenceHandler 线程就会对 pending 队列进行消费，把队列中的对象添加到 WeakHashMap 的引用队列中，在调用 WeakHashMap 中的方法时，一旦触发 expungeStaleEntries 方法，就会把引用队列中的对象与哈希表中的对象进行同步，移除那些 key 已经被回收掉的对象。

为什么需要同步呢？因为 Entry 中只有 key 是弱引用，其他属性都是强引用，key 会被 JVM 的 GC 回收掉，而其他属性不会，所以必须手动将其它属性的引用断开，才能让垃圾收集器对它们进行回收，否则就会导致内存泄露。并且引用队列中存储的是 Entry 对象， Entry 对象中的 key 已经被回收掉了，如果此时调用 get 方法来获取 key，那么只会获取到一个 null，所以在 expungeStaleEntries 中只能通过 == 来比较两个 entry 是否相等，而不能通过 key 和 equals() 来比较。