剖析 HashSet
本文为书籍《Java编程的逻辑》1和《剑指Java:核心原理与应用实践》2阅读笔记
HashSet
实现了Set
接口,实现的方式利用了Hash
。
2.1 Set 接口
Set
表示的是没有重复元素、且不保证顺序的容器接口,它扩展了Collection
,但没有定义任何新的方法,不过,对于其中的一些方法,它有自己的规范。Set
接口的完整定义如下代码所示。
public interface Set<E> extends Collection<E> {
/**
* Returns the number of elements in this set (its cardinality). If this
* set contains more than {@code Integer.MAX_VALUE} elements, returns
* {@code Integer.MAX_VALUE}.
*/
int size();
/**
* Returns {@code true} if this set contains no elements.
*/
boolean isEmpty();
/**
* Returns {@code true} if this set contains the specified element.
* More formally, returns {@code true} if and only if this set
* contains an element {@code e} such that
* {@code Objects.equals(o, e)}.
*/
boolean contains(Object o);
/**
* Returns an iterator over the elements in this set. The elements are
* returned in no particular order (unless this set is an instance of some
* class that provides a guarantee).
*/
Iterator<E> iterator();
/**
* Returns an array containing all of the elements in this set.
* If this set makes any guarantees as to what order its elements
* are returned by its iterator, this method must return the
* elements in the same order.
*/
Object[] toArray();
/**
* Returns an array containing all of the elements in this set; the
* runtime type of the returned array is that of the specified array.
* If the set fits in the specified array, it is returned therein.
* Otherwise, a new array is allocated with the runtime type of the
* specified array and the size of this set.
*
* <p>If this set fits in the specified array with room to spare
* (i.e., the array has more elements than this set), the element in
* the array immediately following the end of the set is set to
* {@code null}. (This is useful in determining the length of this
* set <i>only</i> if the caller knows that this set does not contain
* any null elements.)
*
* <p>If this set makes any guarantees as to what order its elements
* are returned by its iterator, this method must return the elements
* in the same order.
*
* <p>Like the {@link #toArray()} method, this method acts as bridge between
* array-based and collection-based APIs. Further, this method allows
* precise control over the runtime type of the output array, and may,
* under certain circumstances, be used to save allocation costs.
*
* <p>Suppose {@code x} is a set known to contain only strings.
* The following code can be used to dump the set into a newly allocated
* array of {@code String}:
*
* <pre>
* String[] y = x.toArray(new String[0]);</pre>
*
* Note that {@code toArray(new Object[0])} is identical in function to
* {@code toArray()}.
*/
<T> T[] toArray(T[] a);
// Modification Operations
/**
* Adds the specified element to this set if it is not already present
* (optional operation). More formally, adds the specified element
* {@code e} to this set if the set contains no element {@code e2}
* such that
* {@code Objects.equals(e, e2)}.
* If this set already contains the element, the call leaves the set
* unchanged and returns {@code false}. In combination with the
* restriction on constructors, this ensures that sets never contain
* duplicate elements.
*
* <p>The stipulation above does not imply that sets must accept all
* elements; sets may refuse to add any particular element, including
* {@code null}, and throw an exception, as described in the
* specification for {@link Collection#add Collection.add}.
* Individual set implementations should clearly document any
* restrictions on the elements that they may contain.
*/
boolean add(E e);
/**
* Removes the specified element from this set if it is present
* (optional operation). More formally, removes an element {@code e}
* such that
* {@code Objects.equals(o, e)}, if
* this set contains such an element. Returns {@code true} if this set
* contained the element (or equivalently, if this set changed as a
* result of the call). (This set will not contain the element once the
* call returns.)
*/
boolean remove(Object o);
// Bulk Operations
/**
* Returns {@code true} if this set contains all of the elements of the
* specified collection. If the specified collection is also a set, this
* method returns {@code true} if it is a <i>subset</i> of this set.
*/
boolean containsAll(Collection<?> c);
/**
* Adds all of the elements in the specified collection to this set if
* they're not already present (optional operation). If the specified
* collection is also a set, the {@code addAll} operation effectively
* modifies this set so that its value is the <i>union</i> of the two
* sets. The behavior of this operation is undefined if the specified
* collection is modified while the operation is in progress.
*/
boolean addAll(Collection<? extends E> c);
/**
* Retains only the elements in this set that are contained in the
* specified collection (optional operation). In other words, removes
* from this set all of its elements that are not contained in the
* specified collection. If the specified collection is also a set, this
* operation effectively modifies this set so that its value is the
* <i>intersection</i> of the two sets.
*/
boolean retainAll(Collection<?> c);
/**
* Removes from this set all of its elements that are contained in the
* specified collection (optional operation). If the specified
* collection is also a set, this operation effectively modifies this
* set so that its value is the <i>asymmetric set difference</i> of
* the two sets.
*/
boolean removeAll(Collection<?> c);
/**
* Removes all of the elements from this set (optional operation).
* The set will be empty after this call returns.
*
* @throws UnsupportedOperationException if the {@code clear} method
* is not supported by this set
*/
void clear();
// Comparison and hashing
/**
* Compares the specified object with this set for equality. Returns
* {@code true} if the specified object is also a set, the two sets
* have the same size, and every member of the specified set is
* contained in this set (or equivalently, every member of this set is
* contained in the specified set). This definition ensures that the
* equals method works properly across different implementations of the
* set interface.
*/
boolean equals(Object o);
/**
* Returns the hash code value for this set. The hash code of a set is
* defined to be the sum of the hash codes of the elements in the set,
* where the hash code of a {@code null} element is defined to be zero.
* This ensures that {@code s1.equals(s2)} implies that
* {@code s1.hashCode()==s2.hashCode()} for any two sets {@code s1}
* and {@code s2}, as required by the general contract of
* {@link Object#hashCode}.
*/
int hashCode();
/**
* Creates a {@code Spliterator} over the elements in this set.
*
* <p>The {@code Spliterator} reports {@link Spliterator#DISTINCT}.
* Implementations should document the reporting of additional
* characteristic values.
*
* @implSpec
* The default implementation creates a
* <em><a href="Spliterator.html#binding">late-binding</a></em> spliterator
* from the set's {@code Iterator}. The spliterator inherits the
* <em>fail-fast</em> properties of the set's iterator.
* <p>
* The created {@code Spliterator} additionally reports
* {@link Spliterator#SIZED}.
*
* @implNote
* The created {@code Spliterator} additionally reports
* {@link Spliterator#SUBSIZED}.
*/
@Override
default Spliterator<E> spliterator()
/**
* Returns an unmodifiable set containing zero elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
@SuppressWarnings("unchecked")
static <E> Set<E> of()
/**
* Returns an unmodifiable set containing one element.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1)
/**
* Returns an unmodifiable set containing two elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2)
/**
* Returns an unmodifiable set containing three elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3)
/**
* Returns an unmodifiable set containing four elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3, E e4)eturn new ImmutableCollections.SetN<>(e1, e2, e3, e4);
}
/**
* Returns an unmodifiable set containing five elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5)
/**
* Returns an unmodifiable set containing six elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6)
/**
* Returns an unmodifiable set containing seven elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7)
/**
* Returns an unmodifiable set containing eight elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8)
/**
* Returns an unmodifiable set containing nine elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8, E e9)
/**
* Returns an unmodifiable set containing ten elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
static <E> Set<E> of(E e1, E e2, E e3, E e4, E e5, E e6, E e7, E e8, E e9, E e10)
/**
* Returns an unmodifiable set containing an arbitrary number of elements.
* See <a href="#unmodifiable">Unmodifiable Sets</a> for details.
* @since 9
*/
@SafeVarargs
@SuppressWarnings("varargs")
static <E> Set<E> of(E... elements)
}
/**
* Returns an <a href="#unmodifiable">unmodifiable Set</a> containing the elements
* of the given Collection. The given Collection must not be null, and it must not
* contain any null elements. If the given Collection contains duplicate elements,
* an arbitrary element of the duplicates is preserved. If the given Collection is
* subsequently modified, the returned Set will not reflect such modifications.
*
* @implNote
* If the given Collection is an <a href="#unmodifiable">unmodifiable Set</a>,
* calling copyOf will generally not create a copy.
* @since 10
*/
@SuppressWarnings("unchecked")
static <E> Set<E> copyOf(Collection<? extends E> coll)
}
HashSet
的构造方法有:
public HashSet(int initialCapacity);
public HashSet(int initialCapacity, float loadFactor);
public HashSet(Collection<? extends E> c);
public HashSet();
initialCapacity
和loadFactor
的含义与HashMap
中的是一样的。
HashSet
的使用比较简单,如下代码所示,hello
被添加了两次,但只会保存一份。
@Test
public void testContractor() {
HashSet<String> set = new HashSet<String>();
set.add("hello");
set.add("world");
set.addAll(Arrays.asList(new String[] { "hello", "你好" }));
String[] stringArray = new String[] { "hello", "world", "你好" };
for (String s : stringArray) {
set.remove(s);
}
assertTrue(set.isEmpty());
}
与HashMap
类似,HashSet
要求元素重写hashCode
和equals
方法,且对于两个对象,如果equals
相同,则hashCode
也必须相同,如果元素是自定义的类,需要注意这一点。
HashSet
有很多应用场景,比如:
- 排重,如果对排重后的元素没有顺序要求,则
HashSet
可以方便地用于排重; - 保存特殊值,
Set
可以用于保存各种特殊值,程序处理用户请求或数据记录时,根据是否为特殊值判断是否进行特殊处理,比如保存IP
地址的黑名单或白名单; - 集合运算,使用
Set
可以方便地进行数学集合中的运算,如交集、并集等运算,这些运算有一些很现实的意义。比如,用户标签计算,每个用户都有一些标签,两个用户的标签交集就表示他们的共同特征,交集大小除以并集大小可以表示他们的相似程度。
2.2 基本原理
HashSet
内部是用HashMap
实现的,它内部有一个HashMap
实例变量,如下所示:
private transient HashMap<E,Object> map;
我们知道,HashMap
有键和值,HashSet
相当于只有键,值都是相同的固定值,这个值的定义为:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
理解了这个内部组成,它的实现方法也就比较容易理解了,我们来看下代码。HashSet
的构造方法,主要就是调用了对应的HashMap
的构造方法,比如:
public HashSet(int initialCapacity, float loadFactor) {
map = new HashMap<>(initialCapacity, loadFactor);
}
接受Collection
参数的构造方法稍微不一样,代码为:
public HashSet(Collection<? extends E> c) {
map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
也很容易理解,c.size()/.75f
用于计算initialCapacity
,0.75f
是loadFactor
的默认值。
我们看add
方法的代码:
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
就是调用map
的put
方法,元素e
用于键,值就是固定值PRESENT
,put
返回null
表示原来没有对应的键,添加成功了。HashMap
中一个键只会保存一份,所以重复添加HashMap
不会变化。
检查是否包含元素,代码为:
public boolean contains(Object o) {
return map.containsKey(o);
}
就是检查map
中是否包含对应的键。
删除元素的代码为:
public boolean remove(Object o) {
return map.remove(o)==PRESENT;
}
就是调用map
的remove
方法,返回值为PRESENT
表示原来有对应的键且删除成功了。
迭代器的代码为
public Iterator<E> iterator() {
return map.keySet().iterator();
}
就是返回map
的keySet
的迭代器。
2.3 小结
HashSet
实现了Set
接口,内部实现利用了HashMap
,有如下特点:
- 没有重复元素;
- 可以高效地添加、删除元素、判断元素是否存在,效率都为 O ( 1 ) O(1) O(1);
- 没有顺序。
HashSet
可以方便高效地实现去重、集合运算等功能。如果要保持添加的顺序,可以使用HashSet
的一个子类LinkedHashSet
。Set
还有一个重要的实现类TreeSet
,它可以排序。