一、 发现问题
有这样一个任务:对一堆学生按照成绩进行排序。为了能够快速的获得有序结合,我选择了TreeSet这个有序数据结构来帮我完成这个任务。有两点让我认为TreeSet能够帮我快速获得有序的学生集合:
(1)TreeSet基于红黑树实现,而红黑树是一个平衡二叉树,也就说,它的排序时间复杂度是
nlogn
;
(2)在插入的初期
logn
较小。
也就是说,我认为使用TreeSet优于在得到全部学生后对全部学生使用一个
nlogn
的算法排序。无论如何,我按照的想法实现了。
学生类:
package com.liyuncong.learn.test.sortedset;
public class Student implements Comparable<Student> {
private String studentNumber;
private String name;
private int score;
public String getStudentNumber() {
return studentNumber;
}
public void setStudentNumber(String studentNumber) {
this.studentNumber = studentNumber;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getScore() {
return score;
}
public void setScore(int score) {
this.score = score;
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
result = prime * result + score;
result = prime * result + ((studentNumber == null) ? 0 : studentNumber.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Student other = (Student) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
if (score != other.score)
return false;
if (studentNumber == null) {
if (other.studentNumber != null)
return false;
} else if (!studentNumber.equals(other.studentNumber))
return false;
return true;
}
/**
* Student的自然序
*/
@Override
public int compareTo(Student o) {
return this.studentNumber.compareTo(o.getStudentNumber());
}
@Override
public String toString() {
return "Student [studentNumber=" + studentNumber + ", name=" + name + ", score=" + score + "]";
}
}
对学生排序:
package com.liyuncong.learn.test.sortedset;
import java.util.Comparator;
import java.util.TreeSet;
public class SortStudentTest {
public static void main(String[] args) {
Student student1 = new Student();
student1.setStudentNumber("1");
student1.setName("张三");
student1.setScore(90);
Student student2 = new Student();
student2.setStudentNumber("2");
student2.setName("李四");
student2.setScore(80);
Student student3 = new Student();
student3.setStudentNumber("3");
student3.setName("王二麻子");
student3.setScore(90);
TreeSet<Student> treeSet = new TreeSet<>(new Comparator<Student>() {
@Override
public int compare(Student o1, Student o2) {
return o1.getScore() - o2.getScore();
}
});
treeSet.add(student3);
treeSet.add(student2);
treeSet.add(student1);
for (Student student : treeSet) {
System.out.println(student);
}
}
}
排序输出:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
信心满满的实现了自己的想法,结果却有点出乎意料。放进集合三个对象,出来却只有两个。
二、找到原因
从程序输出看到,“张三”没有被成功添加进去。按照Java Set的规范,只有当集合中已经有某个元素时(通过equal方法判断),再次添加这个元素才不会被添加;可是,添加“张三”时,集合中并没有和他相等的元素。为了一探究竟,打算进入源码中看看。首先看TreeSet的add方法:
/**
* Adds the specified element to this set if it is not already present.
* More formally, adds the specified element {@code e} to this set if
* the set contains no element {@code e2} such that
* <tt>(e==null ? e2==null : e.equals(e2))</tt>.
* If this set already contains the element, the call leaves the set
* unchanged and returns {@code false}.
*
* @param e element to be added to this set
* @return {@code true} if this set did not already contain the specified
* element
* @throws ClassCastException if the specified object cannot be compared
* with the elements currently in this set
* @throws NullPointerException if the specified element is null
* and this set uses natural ordering, or its comparator
* does not permit null elements
*/
public boolean add(E e) {
return m.put(e, PRESENT)==null;
}
从add的方法的注释中,看到,TreeSet是遵守Set的规范的——通过equal方法判断重复元素。但这里没有具体的实现,继续看源码。add方法是调用m的put方法往集合中添加元素。m是什么?
/**
* The backing map.
*/
private transient NavigableMap<E,Object> m;
public TreeSet() {
this(new TreeMap<E,Object>());
}
原来m是一个TreeMap,TreeSet和HashSet一样,基于对应的Map实现。现在来看看TreeMap的put方法:
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
*
* @return the previous value associated with {@code key}, or
* {@code null} if there was no mapping for {@code key}.
* (A {@code null} return can also indicate that the map
* previously associated {@code null} with {@code key}.)
* @throws ClassCastException if the specified key cannot be compared
* with the keys currently in the map
* @throws NullPointerException if the specified key is null
* and this map uses natural ordering, or its comparator
* does not permit null keys
*/
public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
compare(key, key); // type (and possibly null) check
root = new Entry<>(key, value, null);
size = 1;
modCount++;
return null;
}
int cmp;
Entry<K,V> parent;
// split comparator and comparable paths
Comparator<? super K> cpr = comparator;
if (cpr != null) {
do {
parent = t;
cmp = cpr.compare(key, t.key);
if (cmp < 0)
t = t.left;
else if (cmp > 0)
t = t.right;
else
return t.setValue(value);
} while (t != null);
}
else {
if (key == null)
throw new NullPointerException();
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) key;
do {
parent = t;
cmp = k.compareTo(t.key);
if (cmp < 0)
t = t.left;
else if (cmp > 0)
t = t.right;
else
return t.setValue(value);
} while (t != null);
}
Entry<K,V> e = new Entry<>(key, value, parent);
if (cmp < 0)
parent.left = e;
else
parent.right = e;
fixAfterInsertion(e);
size++;
modCount++;
return null;
}
原来,是通过Comparator的compare方法(或者Comparable接口的compareTo)判断元素的相等性。这违背了Set接口的规范,我觉得我发现了Java类库的一个bug。不过,我得先解决问题。
三、解决问题
知道了问题所在,我只需要对Comparator做个简单的修改,就能实现最初的目标:
TreeSet<Student> treeSet2 = new TreeSet<>(new Comparator<Student>() {
@Override
public int compare(Student o1, Student o2) {
int result = o1.getScore() - o2.getScore();
return result == 0 ? 1 : result;
}
});
也就是说,通过Comparator比较的两个元素永远不可能相等。再跑一下上面的排序,结果正常了:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
Student [studentNumber=1, name=张三, score=90]
四、进一步思考
问题是解决了,但是还没完。我可是发现了Java类库的一个bug。不过,在告诉大家这个bug之前,我得做足准备,进一步确认,免得闹笑话。于是看了这几个接口或者类的文档:Collection、Set、SortedSet、NavigableSet、TreeSet、TreeMap、Comparable和Object,因为TreeMap的红黑树是基于《算法导论》中的介绍实现的(TreeMap的一段注释:Algorithms are adaptations of those in Cormen, Leiserson, and Rivest’s Introduction to Algorithms),所以也简单复习了一下其中对二叉搜索树和红黑树的介绍,当然也看了下网上一些博客对我遇到的问题的介绍。好了,我感觉有有资格来说这件事儿了。
站在TreeSet的add方法的角度来看,这确实是一个bug;但是站在整个SortedSet的角度来看,这只是一个设计缺陷。因为在SortedMap的文档中已经说明了这个问题:
* <p>Note that the ordering maintained by a sorted set (whether or not an
* explicit comparator is provided) must be <i>consistent with equals</i> if
* the sorted set is to correctly implement the <tt>Set</tt> interface. (See
* the <tt>Comparable</tt> interface or <tt>Comparator</tt> interface for a
* precise definition of <i>consistent with equals</i>.) This is so because
* the <tt>Set</tt> interface is defined in terms of the <tt>equals</tt>
* operation, but a sorted set performs all element comparisons using its
* <tt>compareTo</tt> (or <tt>compare</tt>) method, so two elements that are
* deemed equal by this method are, from the standpoint of the sorted set,
* equal. The behavior of a sorted set <i>is</i> well-defined even if its
* ordering is inconsistent with equals; it just fails to obey the general
* contract of the <tt>Set</tt> interface.
“precise definition of consistent with equals”是指:
* The natural ordering for a class <tt>C</tt> is said to be <i>consistent
* with equals</i> if and only if <tt>e1.compareTo(e2) == 0</tt> has
* the same boolean value as <tt>e1.equals(e2)</tt>
TreeMap的注释声明了,如果“不一致”,会违背Set的规范,具体点说,就是会违背通过equals方法判断重复对象的规范。文档已经说明了,所以,上面遇到的问题不能认为是一个bug。但是可以像《effective Java》中提到的一些点一样,我认为这是一个设计缺陷,得出这个结论是基于下面三点:
(1)不管是Comparator还是Comparable,目的都是用于对对象排序,从它们的文档中可以看到:
Comparable
This interface imposes a total ordering on the objects of each class that
* implements it.
Comparator
* A comparison function, which imposes a <i>total ordering</i> on some
* collection of objects
而SortedSet不仅让它们用于排序,还用它们代替equals方法来判断对象相等,这违背了单一职责原则,使得设计显得丑陋。
(2)Comparabl的文档中并没有强制满足与equals的一致性:
* It is strongly recommended (though not required) that natural orderings be
* consistent with equals. This is so because sorted sets (and sorted maps)
* without explicit comparators behave "strangely" when they are used with
* elements (or keys) whose natural ordering is inconsistent with equals. In
* particular, such a sorted set (or sorted map) violates the general contract
* for set (or map), which is defined in terms of the <tt>equals</tt>
* method.<p>
(3) 站在一个程序之外的视角来看,要求两个对象相等是两个对象某一方面比较相等的充要条件,这本身就是不合理的。
最后,我来猜测一下,为什么类库的设计人员要这么做?在二叉搜索树中,要搜索一个key,比较的也只是这个key,根据key的有序存储方式,可以快速找到这个key对应的对象;这个key,对应的也就是Comparator中用于比较的元素。如果在实现中,用于比较相等的不再是key,那么二叉搜索树的存储优势就不在了。(其实,个人感觉,这个问题是可以解决的,比如,假设Java中相等的对象是Comparator比较结果为0的充分条件,就依旧可以使用key进行搜索,通过equals比较相等,并不会增加算法复杂度)