A Set contains no duplicate elements. That is one of the major reasons to use a set. There are 3 commonly used implementations of Set: HashSet, TreeSet and LinkedHashSet. When and which to use is an important question. In brief, if you need a fast set, you should use HashSet; if you need a sorted set, then TreeSet should be used; if you need a set that can be store the insertion order, LinkedHashSet should be used.
1. Set Interface
Set interface extends Collection interface. In a set, no duplicates are allowed. Every element in a set must be unique. You can simply add elements to a set, and duplicates will be removed automatically.
2. HashSet vs. TreeSet vs. LinkedHashSet
HashSet is Implemented using a hash table. Elements are not ordered. The add, remove, and containsmethods have constant time complexity O(1).
TreeSet is implemented using a tree structure(red-black tree in algorithm book). The elements in a set are sorted, but the add, remove, and contains methods has time complexity of O(log (n)). It offers several methods to deal with the ordered set like first(), last(), headSet(), tailSet(), etc.
LinkedHashSet is between HashSet and TreeSet. It is implemented as a hash table with a linked list running through it, so it provides the order of insertion. The time complexity of basic methods is O(1).
3. TreeSet Example
Output is sorted as follows:
Tree set data: 12 34 45 63
Now let's define a Dog class as follows:
Let's add some dogs to TreeSet like the following:
Compile ok, but run-time error occurs:
Exception in thread "main" java.lang.ClassCastException: collection.Dog cannot be cast to java.lang.Comparable at java.util.TreeMap.put(Unknown Source) at java.util.TreeSet.add(Unknown Source) at collection.TestTreeSet.main(TestTreeSet.java:22)
Because TreeSet is sorted, the Dog object need to implement java.lang.Comparable's compareTo()method like the following:
The output is:
1 2 3
4. HashSet Example
Output:
5 3 2 1 4
Note the order is not certain.
5. LinkedHashSet Example
The order of the output is certain and it is the insertion order:
2 1 3 5 4
6. Performance testing
The following method tests the performance of the three class on add() method.
From the output below, we can clearly wee that HashSet is the fastest one.
HashSet: 2244768 TreeSet: 3549314 LinkedHashSet: 2263320
* The test is not precise, but can reflect the basic idea that TreeSet is much slower because it is sorted.