二十六、剖析 HashSet

最新推荐文章于 2024-09-08 20:08:56 发布

那时间总是跑得很潇洒

最新推荐文章于 2024-09-08 20:08:56 发布

阅读量972

点赞数 15

分类专栏： java系编程文章标签：开发语言 java

本文链接：https://blog.csdn.net/qq_31654025/article/details/136546272

版权

java系编程专栏收录该内容

27 篇文章 0 订阅

订阅专栏

本文详细分析了JavaHashSet，它是Set接口的实现，基于HashMap实现，具有无重复元素、高效添加/删除/查找特性。文章介绍了HashSet的构造方法、核心操作以及在去重、集合运算等方面的应用。

摘要由CSDN通过智能技术生成

剖析 HashSet

本文为书籍《Java编程的逻辑》¹和《剑指Java：核心原理与应用实践》²阅读笔记

HashSet实现了Set接口，实现的方式利用了Hash。

2.1 Set 接口

Set表示的是没有重复元素、且不保证顺序的容器接口，它扩展了Collection，但没有定义任何新的方法，不过，对于其中的一些方法，它有自己的规范。Set接口的完整定义如下代码所示。

public interface Set<E> extends Collection<E> {
    /**
     * Returns the number of elements in this set (its cardinality).  If this
     * set contains more than {@code Integer.MAX_VALUE} elements, returns
     * {@code Integer.MAX_VALUE}.
     */
    int size();

    /**
     * Returns {@code true} if this set contains no elements.
     */
    boolean isEmpty();

    /**
     * Returns {@code true} if this set contains the specified element.
     * More formally, returns {@code true} if and only if this set
     * contains an element {@code e} such that
     * {@code Objects.equals(o, e)}.
     */
    boolean contains(Object o);

    /**
     * Returns an iterator over the elements in this set.  The elements are
     * returned in no particular order (unless this set is an instance of some
     * class that provides a guarantee).
     */
    Iterator<E> iterator();

    /**
     * Returns an array containing all of the elements in this set.
     * If this set makes any guarantees as to what order its elements
     * are returned by its iterator, this method must return the
     * elements in the same order.
     */
    Object[] toArray();

    /**
     * Returns an array containing all of the elements in this set; the
     * runtime type of the returned array is that of the specified array.
     * If the set fits in the specified array, it is returned therein.
     * Otherwise, a new array is allocated with the runtime type of the
     * specified array and the size of this set.
     *
     * <p>If this set fits in the specified array with room to spare
     * (i.e., the array has more elements than this set), the element in
     * the array immediately following the end of the set is set to
     * {@code null}.  (This is useful in determining the length of this
     * set <i>only</i> if the caller knows that this set does not contain
     * any null elements.)
     *
     * <p>If this set makes any guarantees as to what order its elements
     * are returned by its iterator, this method must return the elements
     * in the same order.
     *
     * <p>Like the {@link #toArray()} method, this method acts as bridge between
     * array-based and collection-based APIs.  Further, this method allows
     * precise control over the runtime type of the output array, and may,
     * under certain circumstances, be used to save allocation costs.
     *
     * <p>Suppose {@code x} is a set known to contain only strings.
     * The following code can be used to dump the set into a newly allocated
     * array of {@code String}:
     *
     * <pre>
     *     String[] y = x.toArray(new String[0]);</pre>
     *
     * Note that {@code toArray(new Object[0])} is identical in function to
     * {@code toArray()}.
     */
    <T> T[] toArray(T[] a);


    // Modification Operations

    /**
     * Adds the specified element to this set if it is not already present
     * (optional operation).  More formally, adds the specified element
     * {@code e} to this set if the set contains no element {@code e2}
     * such that
     * {@code Objects.equals(e, e2)}.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.  In combination with the
     * restriction on constructors, this ensures that sets never contain
     * duplicate elements.
     *
     * <p>The stipulation above does not imply that sets must accept all
     * elements; sets may refuse to add any particular element, including
     * {@code null}, and throw an exception, as described in the
     * specification for {@link Collection#add Collection.add}.
     * Individual set implementations should clearly document any
     * restrictions on the elements that they may contain.
     */
    boolean add(E e);


    /**
     * Removes the specified element from this set if it is present
     * (optional operation).  More formally, removes an element {@code e}
     * such that
     * {@code Objects.equals(o, e)}, if
     * this set contains such an element.  Returns {@code true} if this set
     * contained the element (or equivalently, if this set changed as a
     * result of the call).  (This set will not contain the element once the
     * call returns.)
     */
    boolean remove(Object o);


    // Bulk Operations

    /**
     * Returns {@code true} if this set contains all of the elements of the
     * specified collection.  If the specified collection is also a set, this
     * method returns {@code true} if it is a <i>subset</i> of this set.
     */
    boolean containsAll(Collection<?> c);

    /**
     * Adds all of the elements in the specified collection to this set if
     * they're not already present (optional operation).  If the specified
     * collection is also a set, the {@code addAll} operation effectively
     * modifies this set so that its value is the <i>union</i> of the two
     * sets.  The behavior of this operation is undefined if the specified
     * collection is modified while the operation is in progress.
     */
    boolean addAll(Collection<? extends E> c);

    /**
     * Retains only the elements in this set that are contained in the
     * specified collection (optional operation).  In other words, removes
     * from this set all of its elements that are not contained in the
     * specified collection.  If the specified collection is also a set, this
     * operation effectively modifies this set so that its value is the
     * <i>intersection</i> of the two sets.
     */
    boolean retainAll(Collection<?> c);

    /**
     * Removes from this set all of its elements that are contained in the
     * specified collection (optional operation).  If the specified
     * collection is also a set, this operation effectively modifies this
     * set so that its value is the <i>asymmetric set difference</i> of
     * the two sets.
     */
    boolean removeAll(Collection<?> c);

    /**
     * Removes all of the elements from this set (optional operation).
     * The set will be empty after this call returns.
     *
     * @throws UnsupportedOperationException if the {@code clear} method
     *         is not supported by this set
     */
    void clear();


    // Comparison and hashing

    /**
     * Compares the specified object with this set for equality.  Returns
     * {@code true} if the specified object is also a set, the two sets
     * have the same size, and every member of the specified set is
     * contained in this set (or equivalently, every member of this set is
     * contained in the specified set).  This definition ensures that the
     * equals method works properly across different implementations of the
     * set interface.
     */
    boolean equals(Object o);

    /**
     * Returns the hash code value for this set.  The hash code of a set is
     * defined to be the sum of the hash codes of the elements in the set,
     * where the hash code of a {@code null} element is defined to be zero.
     * This ensures that {@code s1.equals(s2)} implies that
     * {@code s1.hashCode()==s2.hashCode()} for any two sets {@code s1}
     * and {@code s2}, as required by the general contract of
     * {@link Object#hashCode}.
     */
    int hashCode();

    /**
     * Creates a {@code Spliterator} over the elements in this set.
     *
     * <p>The {@code Spliterator} reports {@link Spliterator#DISTINCT}.
     * Implementations should document the reporting of additional
     * characteristic values.
     *
     * @implSpec
     * The default implementation creates a
     * <em><a href="Spliterator.html#binding">late-binding</a></em> spliterator
     * from the set's {@code Iterator}.  The spliterator inherits the
     * <em>fail-fast</em> properties of the set's iterator.
     * <p>
     * The created {@code Spliterator} additionally reports
     * {@link Spliterator#SIZED}.
     *
     * @implNote
     * The created {@code Spliterator} additionally reports
     * {@link Spliterator#SUBSIZED}.
     */
    @Override
    default Spliterator<E> spliterator()

    /**
     * Returns an unmodifiable set containing zero elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    @SuppressWarnings("unchecked")
    static <E> Set<E> of()

    /**
     * Returns an unmodifiable set containing one element.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1)

    /**
     * Returns an unmodifiable set containing two elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2)

    /**
     * Returns an unmodifiable set containing three elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3)

    /**
     * Returns an unmodifiable set containing four elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4)eturn new ImmutableCollections.SetN<>(e1, e2, e3, e4);
    }

    /**
     * Returns an unmodifiable set containing five elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5)

    /**
     * Returns an unmodifiable set containing six elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6)

    /**
     * Returns an unmodifiable set containing seven elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7)

    /**
     * Returns an unmodifiable set containing eight elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8)

    /**
     * Returns an unmodifiable set containing nine elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8, E e9)

    /**
     * Returns an unmodifiable set containing ten elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8, E e9, E e10)

    /**
     * Returns an unmodifiable set containing an arbitrary number of elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    @SafeVarargs
    @SuppressWarnings("varargs")
    static <E> Set<E> of(E... elements)
    }

    /**
     * Returns an <a href="#unmodifiable">unmodifiable Set</a> containing the elements
     * of the given Collection. The given Collection must not be null, and it must not
     * contain any null elements. If the given Collection contains duplicate elements,
     * an arbitrary element of the duplicates is preserved. If the given Collection is
     * subsequently modified, the returned Set will not reflect such modifications.
     *
     * @implNote
     * If the given Collection is an <a href="#unmodifiable">unmodifiable Set</a>,
     * calling copyOf will generally not create a copy.
     * @since 10
     */
    @SuppressWarnings("unchecked")
    static <E> Set<E> copyOf(Collection<? extends E> coll)
}

HashSet的构造方法有：

public HashSet(int initialCapacity);
public HashSet(int initialCapacity, float loadFactor);
public HashSet(Collection<? extends E> c);
public HashSet();

initialCapacity和loadFactor的含义与HashMap中的是一样的。

HashSet的使用比较简单，如下代码所示，hello被添加了两次，但只会保存一份。

    @Test
    public void testContractor() {
        HashSet<String> set = new HashSet<String>();
        set.add("hello");
        set.add("world");
        set.addAll(Arrays.asList(new String[] { "hello", "你好" }));
        String[] stringArray = new String[] { "hello", "world", "你好" };
        for (String s : stringArray) {
            set.remove(s);
        }
        assertTrue(set.isEmpty());
    }

与HashMap类似，HashSet要求元素重写hashCode和equals方法，且对于两个对象，如果equals相同，则hashCode也必须相同，如果元素是自定义的类，需要注意这一点。

HashSet有很多应用场景，比如：

排重，如果对排重后的元素没有顺序要求，则HashSet可以方便地用于排重；
保存特殊值，Set可以用于保存各种特殊值，程序处理用户请求或数据记录时，根据是否为特殊值判断是否进行特殊处理，比如保存IP地址的黑名单或白名单；
集合运算，使用Set可以方便地进行数学集合中的运算，如交集、并集等运算，这些运算有一些很现实的意义。比如，用户标签计算，每个用户都有一些标签，两个用户的标签交集就表示他们的共同特征，交集大小除以并集大小可以表示他们的相似程度。

2.2 基本原理

HashSet内部是用HashMap实现的，它内部有一个HashMap实例变量，如下所示：

private transient HashMap<E,Object> map;

我们知道，HashMap有键和值，HashSet相当于只有键，值都是相同的固定值，这个值的定义为：

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

理解了这个内部组成，它的实现方法也就比较容易理解了，我们来看下代码。HashSet的构造方法，主要就是调用了对应的HashMap的构造方法，比如：

    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }

接受Collection参数的构造方法稍微不一样，代码为：

    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }

也很容易理解，c.size()/.75f用于计算initialCapacity，0.75f是loadFactor的默认值。

我们看add方法的代码：

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

就是调用map的put方法，元素e用于键，值就是固定值PRESENT，put返回null表示原来没有对应的键，添加成功了。HashMap中一个键只会保存一份，所以重复添加HashMap不会变化。

检查是否包含元素，代码为：

    public boolean contains(Object o) {
        return map.containsKey(o);
    }

就是检查map中是否包含对应的键。

删除元素的代码为：

    public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }

就是调用map的remove方法，返回值为PRESENT表示原来有对应的键且删除成功了。

迭代器的代码为

    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }

就是返回map的keySet的迭代器。

2.3 小结

HashSet实现了Set接口，内部实现利用了HashMap，有如下特点：

没有重复元素；
可以高效地添加、删除元素、判断元素是否存在，效率都为 $O (1)$ ；
没有顺序。

HashSet可以方便高效地实现去重、集合运算等功能。如果要保持添加的顺序，可以使用HashSet的一个子类LinkedHashSet。Set还有一个重要的实现类TreeSet，它可以排序。

马俊昌.Java编程的逻辑[M].北京:机械工业出版社,2018. ↩︎
尚硅谷教育.剑指Java：核心原理与应用实践[M].北京:电子工业出版社,2023. ↩︎

那时间总是跑得很潇洒

关注

15
点赞
踩
23

收藏

觉得还不错? 一键收藏
0
评论
二十六、剖析 HashSet

java HashSet
复制链接

扫一扫

专栏目录