二十六、剖析 HashSet

本文详细分析了JavaHashSet,它是Set接口的实现,基于HashMap实现,具有无重复元素、高效添加/删除/查找特性。文章介绍了HashSet的构造方法、核心操作以及在去重、集合运算等方面的应用。
摘要由CSDN通过智能技术生成

剖析 HashSet

本文为书籍《Java编程的逻辑》1和《剑指Java:核心原理与应用实践》2阅读笔记

HashSet实现了Set接口,实现的方式利用了Hash

2.1 Set 接口

Set表示的是没有重复元素、且不保证顺序的容器接口,它扩展了Collection,但没有定义任何新的方法,不过,对于其中的一些方法,它有自己的规范。Set接口的完整定义如下代码所示。

public interface Set<E> extends Collection<E> {
    /**
     * Returns the number of elements in this set (its cardinality).  If this
     * set contains more than {@code Integer.MAX_VALUE} elements, returns
     * {@code Integer.MAX_VALUE}.
     */
    int size();

    /**
     * Returns {@code true} if this set contains no elements.
     */
    boolean isEmpty();

    /**
     * Returns {@code true} if this set contains the specified element.
     * More formally, returns {@code true} if and only if this set
     * contains an element {@code e} such that
     * {@code Objects.equals(o, e)}.
     */
    boolean contains(Object o);

    /**
     * Returns an iterator over the elements in this set.  The elements are
     * returned in no particular order (unless this set is an instance of some
     * class that provides a guarantee).
     */
    Iterator<E> iterator();

    /**
     * Returns an array containing all of the elements in this set.
     * If this set makes any guarantees as to what order its elements
     * are returned by its iterator, this method must return the
     * elements in the same order.
     */
    Object[] toArray();

    /**
     * Returns an array containing all of the elements in this set; the
     * runtime type of the returned array is that of the specified array.
     * If the set fits in the specified array, it is returned therein.
     * Otherwise, a new array is allocated with the runtime type of the
     * specified array and the size of this set.
     *
     * <p>If this set fits in the specified array with room to spare
     * (i.e., the array has more elements than this set), the element in
     * the array immediately following the end of the set is set to
     * {@code null}.  (This is useful in determining the length of this
     * set <i>only</i> if the caller knows that this set does not contain
     * any null elements.)
     *
     * <p>If this set makes any guarantees as to what order its elements
     * are returned by its iterator, this method must return the elements
     * in the same order.
     *
     * <p>Like the {@link #toArray()} method, this method acts as bridge between
     * array-based and collection-based APIs.  Further, this method allows
     * precise control over the runtime type of the output array, and may,
     * under certain circumstances, be used to save allocation costs.
     *
     * <p>Suppose {@code x} is a set known to contain only strings.
     * The following code can be used to dump the set into a newly allocated
     * array of {@code String}:
     *
     * <pre>
     *     String[] y = x.toArray(new String[0]);</pre>
     *
     * Note that {@code toArray(new Object[0])} is identical in function to
     * {@code toArray()}.
     */
    <T> T[] toArray(T[] a);


    // Modification Operations

    /**
     * Adds the specified element to this set if it is not already present
     * (optional operation).  More formally, adds the specified element
     * {@code e} to this set if the set contains no element {@code e2}
     * such that
     * {@code Objects.equals(e, e2)}.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.  In combination with the
     * restriction on constructors, this ensures that sets never contain
     * duplicate elements.
     *
     * <p>The stipulation above does not imply that sets must accept all
     * elements; sets may refuse to add any particular element, including
     * {@code null}, and throw an exception, as described in the
     * specification for {@link Collection#add Collection.add}.
     * Individual set implementations should clearly document any
     * restrictions on the elements that they may contain.
     */
    boolean add(E e);


    /**
     * Removes the specified element from this set if it is present
     * (optional operation).  More formally, removes an element {@code e}
     * such that
     * {@code Objects.equals(o, e)}, if
     * this set contains such an element.  Returns {@code true} if this set
     * contained the element (or equivalently, if this set changed as a
     * result of the call).  (This set will not contain the element once the
     * call returns.)
     */
    boolean remove(Object o);


    // Bulk Operations

    /**
     * Returns {@code true} if this set contains all of the elements of the
     * specified collection.  If the specified collection is also a set, this
     * method returns {@code true} if it is a <i>subset</i> of this set.
     */
    boolean containsAll(Collection<?> c);

    /**
     * Adds all of the elements in the specified collection to this set if
     * they're not already present (optional operation).  If the specified
     * collection is also a set, the {@code addAll} operation effectively
     * modifies this set so that its value is the <i>union</i> of the two
     * sets.  The behavior of this operation is undefined if the specified
     * collection is modified while the operation is in progress.
     */
    boolean addAll(Collection<? extends E> c);

    /**
     * Retains only the elements in this set that are contained in the
     * specified collection (optional operation).  In other words, removes
     * from this set all of its elements that are not contained in the
     * specified collection.  If the specified collection is also a set, this
     * operation effectively modifies this set so that its value is the
     * <i>intersection</i> of the two sets.
     */
    boolean retainAll(Collection<?> c);

    /**
     * Removes from this set all of its elements that are contained in the
     * specified collection (optional operation).  If the specified
     * collection is also a set, this operation effectively modifies this
     * set so that its value is the <i>asymmetric set difference</i> of
     * the two sets.
     */
    boolean removeAll(Collection<?> c);

    /**
     * Removes all of the elements from this set (optional operation).
     * The set will be empty after this call returns.
     *
     * @throws UnsupportedOperationException if the {@code clear} method
     *         is not supported by this set
     */
    void clear();


    // Comparison and hashing

    /**
     * Compares the specified object with this set for equality.  Returns
     * {@code true} if the specified object is also a set, the two sets
     * have the same size, and every member of the specified set is
     * contained in this set (or equivalently, every member of this set is
     * contained in the specified set).  This definition ensures that the
     * equals method works properly across different implementations of the
     * set interface.
     */
    boolean equals(Object o);

    /**
     * Returns the hash code value for this set.  The hash code of a set is
     * defined to be the sum of the hash codes of the elements in the set,
     * where the hash code of a {@code null} element is defined to be zero.
     * This ensures that {@code s1.equals(s2)} implies that
     * {@code s1.hashCode()==s2.hashCode()} for any two sets {@code s1}
     * and {@code s2}, as required by the general contract of
     * {@link Object#hashCode}.
     */
    int hashCode();

    /**
     * Creates a {@code Spliterator} over the elements in this set.
     *
     * <p>The {@code Spliterator} reports {@link Spliterator#DISTINCT}.
     * Implementations should document the reporting of additional
     * characteristic values.
     *
     * @implSpec
     * The default implementation creates a
     * <em><a href="Spliterator.html#binding">late-binding</a></em> spliterator
     * from the set's {@code Iterator}.  The spliterator inherits the
     * <em>fail-fast</em> properties of the set's iterator.
     * <p>
     * The created {@code Spliterator} additionally reports
     * {@link Spliterator#SIZED}.
     *
     * @implNote
     * The created {@code Spliterator} additionally reports
     * {@link Spliterator#SUBSIZED}.
     */
    @Override
    default Spliterator<E> spliterator()

    /**
     * Returns an unmodifiable set containing zero elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    @SuppressWarnings("unchecked")
    static <E> Set<E> of()

    /**
     * Returns an unmodifiable set containing one element.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1)

    /**
     * Returns an unmodifiable set containing two elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2)

    /**
     * Returns an unmodifiable set containing three elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3)

    /**
     * Returns an unmodifiable set containing four elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4)eturn new ImmutableCollections.SetN<>(e1, e2, e3, e4);
    }

    /**
     * Returns an unmodifiable set containing five elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5)

    /**
     * Returns an unmodifiable set containing six elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6)

    /**
     * Returns an unmodifiable set containing seven elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7)

    /**
     * Returns an unmodifiable set containing eight elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8)

    /**
     * Returns an unmodifiable set containing nine elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8, E e9)

    /**
     * Returns an unmodifiable set containing ten elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8, E e9, E e10)

    /**
     * Returns an unmodifiable set containing an arbitrary number of elements.
     * See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
     * @since 9
     */
    @SafeVarargs
    @SuppressWarnings("varargs")
    static <E> Set<E> of(E... elements)
    }

    /**
     * Returns an <a href="#unmodifiable">unmodifiable Set</a> containing the elements
     * of the given Collection. The given Collection must not be null, and it must not
     * contain any null elements. If the given Collection contains duplicate elements,
     * an arbitrary element of the duplicates is preserved. If the given Collection is
     * subsequently modified, the returned Set will not reflect such modifications.
     *
     * @implNote
     * If the given Collection is an <a href="#unmodifiable">unmodifiable Set</a>,
     * calling copyOf will generally not create a copy.
     * @since 10
     */
    @SuppressWarnings("unchecked")
    static <E> Set<E> copyOf(Collection<? extends E> coll)
}

HashSet的构造方法有:

public HashSet(int initialCapacity);
public HashSet(int initialCapacity, float loadFactor);
public HashSet(Collection<? extends E> c);
public HashSet();

initialCapacityloadFactor的含义与HashMap中的是一样的。

HashSet的使用比较简单,如下代码所示,hello被添加了两次,但只会保存一份。

    @Test
    public void testContractor() {
        HashSet<String> set = new HashSet<String>();
        set.add("hello");
        set.add("world");
        set.addAll(Arrays.asList(new String[] { "hello", "你好" }));
        String[] stringArray = new String[] { "hello", "world", "你好" };
        for (String s : stringArray) {
            set.remove(s);
        }
        assertTrue(set.isEmpty());
    }

HashMap类似,HashSet要求元素重写hashCodeequals方法,且对于两个对象,如果equals相同,则hashCode也必须相同,如果元素是自定义的类,需要注意这一点。

HashSet有很多应用场景,比如:

  1. 排重,如果对排重后的元素没有顺序要求,则HashSet可以方便地用于排重;
  2. 保存特殊值,Set可以用于保存各种特殊值,程序处理用户请求或数据记录时,根据是否为特殊值判断是否进行特殊处理,比如保存IP地址的黑名单或白名单;
  3. 集合运算,使用Set可以方便地进行数学集合中的运算,如交集、并集等运算,这些运算有一些很现实的意义。比如,用户标签计算,每个用户都有一些标签,两个用户的标签交集就表示他们的共同特征,交集大小除以并集大小可以表示他们的相似程度。

2.2 基本原理

HashSet内部是用HashMap实现的,它内部有一个HashMap实例变量,如下所示:

private transient HashMap<E,Object> map;

我们知道,HashMap有键和值,HashSet相当于只有键,值都是相同的固定值,这个值的定义为:

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

理解了这个内部组成,它的实现方法也就比较容易理解了,我们来看下代码。HashSet的构造方法,主要就是调用了对应的HashMap的构造方法,比如:

    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }

接受Collection参数的构造方法稍微不一样,代码为:

    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }

也很容易理解,c.size()/.75f用于计算initialCapacity0.75floadFactor的默认值。

我们看add方法的代码:

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

就是调用mapput方法,元素e用于键,值就是固定值PRESENTput返回null表示原来没有对应的键,添加成功了。HashMap中一个键只会保存一份,所以重复添加HashMap不会变化。

检查是否包含元素,代码为:

    public boolean contains(Object o) {
        return map.containsKey(o);
    }

就是检查map中是否包含对应的键。

删除元素的代码为:

    public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }

就是调用mapremove方法,返回值为PRESENT表示原来有对应的键且删除成功了。

迭代器的代码为

    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }

就是返回mapkeySet的迭代器。

2.3 小结

HashSet实现了Set接口,内部实现利用了HashMap,有如下特点:

  1. 没有重复元素;
  2. 可以高效地添加、删除元素、判断元素是否存在,效率都为 O ( 1 ) O(1) O(1)
  3. 没有顺序。

HashSet可以方便高效地实现去重、集合运算等功能。如果要保持添加的顺序,可以使用HashSet的一个子类LinkedHashSetSet还有一个重要的实现类TreeSet,它可以排序。


  1. 马俊昌.Java编程的逻辑[M].北京:机械工业出版社,2018. ↩︎

  2. 尚硅谷教育.剑指Java:核心原理与应用实践[M].北京:电子工业出版社,2023. ↩︎

  • 15
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值