今日解析之通过源码分析 HashSet 为何没有get方法

最新推荐文章于 2024-04-14 10:38:27 发布

置顶猿码叔叔

最新推荐文章于 2024-04-14 10:38:27 发布

阅读量3.3k

点赞数 23

分类专栏： Java 文章标签： java

本文链接：https://blog.csdn.net/qq_42349895/article/details/115711195

版权

Java 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

一、HashSet 介绍

官方介绍
源码翻译

二、HashSet 的应用场景

三、HashSet 源码分析

四、HashSet 没有 get 方法的几大因素

一、HashSet 介绍

1、官方的 HashSet 介绍链接

2、源码翻译

/**
 * This class implements the <tt>Set</tt> interface, backed by a hash table
 * (actually a <tt>HashMap</tt> instance).  It makes no guarantees as to the
 * iteration order of the set; in particular, it does not guarantee that the
 * order will remain constant over time.  This class permits the <tt>null</tt>
 * element.
 *
 * 翻译：HashSet 实现了 Set 接口，受 HashTable 的支持(实际上是一个 HashMap 实例)。
 * 它的迭代顺序不能保证有序，特别是从长时间角度来说。不过它允许存入null元素
 *
 * <p>This class offers constant time performance for the basic operations
 * (<tt>add</tt>, <tt>remove</tt>, <tt>contains</tt> and <tt>size</tt>),
 * assuming the hash function disperses the elements properly among the
 * buckets.  Iterating over this set requires time proportional to the sum of
 * the <tt>HashSet</tt> instance's size (the number of elements) plus the
 * "capacity" of the backing <tt>HashMap</tt> instance (the number of
 * buckets).  Thus, it's very important not to set the initial capacity too
 * high (or the load factor too low) if iteration performance is important.
 *
 * 翻译：对于 ARCS( add(),remove(),contains() and size() ) 的基本使用 Set 集合可以表现出持 
 * 久的性能，前提是假定 hash 函数能够将元素在<桶中>分配的错落有致。对该集合的遍历时间要求与 
 * 该集合的大小加上底层支持它的 HashMap 容量即 Capacity 成比例。另外，如果注重迭代性能，最好 
 * 不要将其初始容量即 “initial capacity” 设置的过高或其加载因子即 “Load Factor” 设置的过低
 * 
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash set concurrently, and at least one of
 * the threads modifies the set, it <i>must</i> be synchronized externally.
 * This is typically accomplished by synchronizing on some object that
 * naturally encapsulates the set.
 *
 * 翻译：注意， HashSet 集合非[线程安全]。如果多个线程并发访问该集合，且至少有一个线程对其元 
 * 素进行修改，必须通过外部实现达到线程同步。常用的实现就是对某个封装了set集合的对象使用 
 * sychronized 关键字
 *
 * If no such object exists, the set should be "wrapped" using the
 * {@link Collections#synchronizedSet Collections.synchronizedSet}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the set:<pre>
 *   Set s = Collections.synchronizedSet(new HashSet(...));</pre>
 * 
 * 翻译：如果没有这种对象存在，应该使用 Collections.sychronizedSet 对其进行同步封装。对于创 
 * 建期间来说使用该方式避免突发性非线程安全的访问，是一个非常好的做法。
 *
 * <p>The iterators returned by this class's <tt>iterator</tt> method are
 * <i>fail-fast</i>: if the set is modified at any time after the iterator is
 * created, in any way except through the iterator's own <tt>remove</tt>
 * method, the Iterator throws a {@link ConcurrentModificationException}.
 * Thus, in the face of concurrent modification, the iterator fails quickly
 * and cleanly, rather than risking arbitrary, non-deterministic behavior at
 * an undetermined time in the future.
 *
 * 翻译：HashSet 集合有一个 iterator 方法会返回一个 Iterator 对象，该方法采用了 “fail-fast 
 * 机制”即“错误机制”：如果在 iterator 被创建之后的任何时间里对该集合进行修改操作，除了使用 
 * iterator 自己的 remove 方法外，都会抛出一个 “ConcurrentModificationException” 异 
 * 常。因此，面对并发修改，iterator 会快速失败且不影响数据， 
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 *
 * 翻译：注意，iterator 的 fail-fast 行为无法像他说的那样具有保证性，且更不可能在非线程安全
 * 的并发修改情况下保证，而抛出 ConcurrentModificationException 异常则是它尽可能发挥作用的 
 * 基本。因此，依赖该此异常来进行某些程序的维护与矫正会是一个不明智的选择：所以它仅被用于发 
 * 现某些问题。
 *
 * <p>This class is a member of the
 * <a href="{@docRoot}/../technotes/guides/collections/index.html">
 * Java Collections Framework</a>.
 *
 * @param <E> the type of elements maintained by this set
 *
 * @author  Josh Bloch
 * @author  Neal Gafter
 * @see     Collection
 * @see     Set
 * @see     TreeSet
 * @see     HashMap
 * @since   1.2
 */

public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    // Some Codes...
}

二、HashSet 的应用场景

1、元素去重

   /** 
    * 通过该构造函数创建一个新的HashSet实例,同时传入 Collction 或其子类引用比如List集合,即可将 
    * 数据去重, 比如 Set<E> aSet = new HashSet<>( aList ); 
    *
    */
    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }

2、取出 Map 的所有 Key 时，Map#keySet();

3、HashSet 拥有 Map 与 Collection 两个接口的特性，因为它的父类接口继承了 Collection 接口，同时元素存取又受 HashMap 的支持，所以可以根据项目要求来考量要不要使用 HashSet

三、HashSet 源码分析

首先，HashSet 底层的实现方式前面也讲过是受 HashMap 的支持，那这就奇怪了——Set 不是 KVP(key-value pairs) 数据类型，那么 HashMap 是如何支持这个集合来存储元素并对元素进行相关操作呢？

我们先看看 HashSet 的构造函数吧，代码如下：

   /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     * 翻译: 创建一个新且为空的Set实例; 支持该集合的 HashMap 实例拥有默认16的容量以及
     * 0.75的加载因子。换句话说就是创建了一个新的 HashMap 集合
     */
    public HashSet() {
        map = new HashMap<>();
    }

紧接着，也是 HashSet 中最最重要的两个成员变量。当你看到第二个变量的行注释中 “Dummy value” 时也许你不以为然，但其实它是一种借用 Map 集合来实现像 Set 这样的数据结构的关键。Dummy value 翻译过来就是虚拟值，对于 HashMap 本身来说 Dummy value 是“哑值”的意思，而对于 HashSet 集合来说 Dummy value 就是“虚拟值”的意思。

    /**
     * 用来代为存储 HashSet 元素的 map，Key(E)就是我们
     * 往Set集合添加的元素，Value(Object)则是它下面那个
     * 成员变量，<t>private static final Object PRESENT = new Object();</t>
     * 也是“Dummy value”，即“哑值”的意思
     */
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

距离答案似乎越来越近了，让我们再看看 HashSet 的 #add 方法。
我们发现 HashSet 的 #add 方法并没有做特殊处理，就是一个普通的 map 添加元素操作。只不过 key 是会变化的，而 value是 PRESENT 成员变量，一个空的 Object 对象，而 HashSet 通篇源码也没有对该局部变量做任何修改。举个栗子吧，假设 Set 集合有两个元素，分别为 “a” 和 "b"，那么支持它的 map 实例里的所有元素为{a=java.lang.Object@2eeeXXXX, b=java.lang.Object@2eeeXXX}。
代码如下：

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

然后在跳转到 HashMap 的 #put 方法，代码如下：

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        // some code
    }

为了更进一步证实，我写了一段测试代码，代码如下：

    public static void main(String[] args) {
        HashSet<String> set = new HashSet<>();
        set.add("a");
        set.add("b");
        set.add("c");
        set.add("d");

        printBackingMapInstanceOfHashSet(set);
    }

    private static void printBackingMapInstanceOfHashSet(HashSet<String> set) {
        Class<? extends HashSet> aClass = set.getClass();
        try {
            Field map = aClass.getDeclaredField("map");
            map.setAccessible(true);
            @SuppressWarnings("unchecked")
            HashMap<String, Object> mapField = (HashMap<String, Object>)map.get(set);

            Field present = aClass.getDeclaredField("PRESENT");
            present.setAccessible(true);
            Object presentField = present.get(mapField);

            mapField.forEach((k, v) -> {
                System.out.println(v + " 是否等于 " + presentField + " ? => " + v.equals(presentField));
            });
        } catch (NoSuchFieldException | IllegalAccessException e) {
            e.printStackTrace();
        }
    }

打印结果如下：

java.lang.Object@6f539caf 是否等于 java.lang.Object@6f539caf ? => true
java.lang.Object@6f539caf 是否等于 java.lang.Object@6f539caf ? => true
java.lang.Object@6f539caf 是否等于 java.lang.Object@6f539caf ? => true
java.lang.Object@6f539caf 是否等于 java.lang.Object@6f539caf ? => true

通过打印结果看出，map 集合里的 key，对应的是 Set 集合的所有元素，而 value 则是 PRESENT 成员变量的地址值。

四、总结(HashSet 没有 get 方法的几大因素)

1、最直接的原因是：

HashMap 底层数据结构是数组+链表+红黑树（java 1.8版本），如果不做特殊处理，无法通过索引直接获取元素，且索引一般作用于线性搜索，比如数组。
如果使用 HashMap 的 #get 方法获取元素，那么 Map#get 方法就失去了它通过 key 来寻找配偶的意义了。
问这个问题时，也许应该思考为何不使用 ArrayList 或者 HashMap 呢，偏偏选择这个底层用 HashMap 实现，而自己的父类却继承了 Collection 接口的 “四不像”，当然 HashSet 的设计自然有它的优势。

2、其次：

设计 HashSet 是为了过渡 Map 和 Collection，也就是说为了让这两种数据结构有一个更好的交流，同时又兼顾去重的作用，HashSet 才被设计而诞生，这样一来，拥有自己独特的获取元素方式势必要增加额外的代码来对HashMap实例进行操作，比如获取HashMap的 table 数组，然后获取相应的key或value, 。
作为一种去重工具而被设计诞生，Map 集合存储相同 key 的键值对时，会把原来的同样 key 的值给覆盖，因此 map 集合的 key 永远都不会重复，后来这一特性就被 java 设计师利用，最终在不耗费额外工程的前提下，HashSet 诞生了。