从误用TreeSet到思考Java有序集合对相等和顺序比较一致性的要求

一、 发现问题
有这样一个任务:对一堆学生按照成绩进行排序。为了能够快速的获得有序结合,我选择了TreeSet这个有序数据结构来帮我完成这个任务。有两点让我认为TreeSet能够帮我快速获得有序的学生集合:
(1)TreeSet基于红黑树实现,而红黑树是一个平衡二叉树,也就说,它的排序时间复杂度是 nlogn
(2)在插入的初期 logn 较小。
也就是说,我认为使用TreeSet优于在得到全部学生后对全部学生使用一个 nlogn 的算法排序。无论如何,我按照的想法实现了。
学生类:

package com.liyuncong.learn.test.sortedset;

public class Student implements Comparable<Student> {
    private String studentNumber;
    private String name;
    private int score;

    public String getStudentNumber() {
        return studentNumber;
    }

    public void setStudentNumber(String studentNumber) {
        this.studentNumber = studentNumber;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getScore() {
        return score;
    }

    public void setScore(int score) {
        this.score = score;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        result = prime * result + score;
        result = prime * result + ((studentNumber == null) ? 0 : studentNumber.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Student other = (Student) obj;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        if (score != other.score)
            return false;
        if (studentNumber == null) {
            if (other.studentNumber != null)
                return false;
        } else if (!studentNumber.equals(other.studentNumber))
            return false;
        return true;
    }

    /**
     * Student的自然序
     */
    @Override
    public int compareTo(Student o) {
        return this.studentNumber.compareTo(o.getStudentNumber());
    }

    @Override
    public String toString() {
        return "Student [studentNumber=" + studentNumber + ", name=" + name + ", score=" + score + "]";
    }

}

对学生排序:

package com.liyuncong.learn.test.sortedset;

import java.util.Comparator;
import java.util.TreeSet;

public class SortStudentTest {
    public static void main(String[] args) {
        Student student1 = new Student();
        student1.setStudentNumber("1");
        student1.setName("张三");
        student1.setScore(90);
        Student student2 = new Student();
        student2.setStudentNumber("2");
        student2.setName("李四");
        student2.setScore(80);
        Student student3 = new Student();
        student3.setStudentNumber("3");
        student3.setName("王二麻子");
        student3.setScore(90);

        TreeSet<Student> treeSet = new TreeSet<>(new Comparator<Student>() {

            @Override
            public int compare(Student o1, Student o2) {
                return o1.getScore() - o2.getScore();
            }
        });
        treeSet.add(student3);
        treeSet.add(student2);
        treeSet.add(student1);
        for (Student student : treeSet) {
            System.out.println(student);
        }
    }
}

排序输出:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
信心满满的实现了自己的想法,结果却有点出乎意料。放进集合三个对象,出来却只有两个。
二、找到原因
从程序输出看到,“张三”没有被成功添加进去。按照Java Set的规范,只有当集合中已经有某个元素时(通过equal方法判断),再次添加这个元素才不会被添加;可是,添加“张三”时,集合中并没有和他相等的元素。为了一探究竟,打算进入源码中看看。首先看TreeSet的add方法:

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element {@code e} to this set if
     * the set contains no element {@code e2} such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.
     *
     * @param e element to be added to this set
     * @return {@code true} if this set did not already contain the specified
     *         element
     * @throws ClassCastException if the specified object cannot be compared
     *         with the elements currently in this set
     * @throws NullPointerException if the specified element is null
     *         and this set uses natural ordering, or its comparator
     *         does not permit null elements
     */
    public boolean add(E e) {
        return m.put(e, PRESENT)==null;
    }

从add的方法的注释中,看到,TreeSet是遵守Set的规范的——通过equal方法判断重复元素。但这里没有具体的实现,继续看源码。add方法是调用m的put方法往集合中添加元素。m是什么?

    /**
     * The backing map.
     */
    private transient NavigableMap<E,Object> m;
        public TreeSet() {
        this(new TreeMap<E,Object>());
    }

原来m是一个TreeMap,TreeSet和HashSet一样,基于对应的Map实现。现在来看看TreeMap的put方法:

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     *
     * @return the previous value associated with {@code key}, or
     *         {@code null} if there was no mapping for {@code key}.
     *         (A {@code null} return can also indicate that the map
     *         previously associated {@code null} with {@code key}.)
     * @throws ClassCastException if the specified key cannot be compared
     *         with the keys currently in the map
     * @throws NullPointerException if the specified key is null
     *         and this map uses natural ordering, or its comparator
     *         does not permit null keys
     */
    public V put(K key, V value) {
        Entry<K,V> t = root;
        if (t == null) {
            compare(key, key); // type (and possibly null) check

            root = new Entry<>(key, value, null);
            size = 1;
            modCount++;
            return null;
        }
        int cmp;
        Entry<K,V> parent;
        // split comparator and comparable paths
        Comparator<? super K> cpr = comparator;
        if (cpr != null) {
            do {
                parent = t;
                cmp = cpr.compare(key, t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        else {
            if (key == null)
                throw new NullPointerException();
            @SuppressWarnings("unchecked")
                Comparable<? super K> k = (Comparable<? super K>) key;
            do {
                parent = t;
                cmp = k.compareTo(t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        Entry<K,V> e = new Entry<>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;
        fixAfterInsertion(e);
        size++;
        modCount++;
        return null;
    }

原来,是通过Comparator的compare方法(或者Comparable接口的compareTo)判断元素的相等性。这违背了Set接口的规范,我觉得我发现了Java类库的一个bug。不过,我得先解决问题。
三、解决问题
知道了问题所在,我只需要对Comparator做个简单的修改,就能实现最初的目标:

        TreeSet<Student> treeSet2 = new TreeSet<>(new Comparator<Student>() {

            @Override
            public int compare(Student o1, Student o2) {
                int result = o1.getScore() - o2.getScore();
                return result == 0 ? 1 : result;
            }
        });

也就是说,通过Comparator比较的两个元素永远不可能相等。再跑一下上面的排序,结果正常了:
Student [studentNumber=2, name=李四, score=80]
Student [studentNumber=3, name=王二麻子, score=90]
Student [studentNumber=1, name=张三, score=90]
四、进一步思考
问题是解决了,但是还没完。我可是发现了Java类库的一个bug。不过,在告诉大家这个bug之前,我得做足准备,进一步确认,免得闹笑话。于是看了这几个接口或者类的文档:Collection、Set、SortedSet、NavigableSet、TreeSet、TreeMap、Comparable和Object,因为TreeMap的红黑树是基于《算法导论》中的介绍实现的(TreeMap的一段注释:Algorithms are adaptations of those in Cormen, Leiserson, and Rivest’s Introduction to Algorithms),所以也简单复习了一下其中对二叉搜索树和红黑树的介绍,当然也看了下网上一些博客对我遇到的问题的介绍。好了,我感觉有有资格来说这件事儿了。
站在TreeSet的add方法的角度来看,这确实是一个bug;但是站在整个SortedSet的角度来看,这只是一个设计缺陷。因为在SortedMap的文档中已经说明了这个问题:

 * <p>Note that the ordering maintained by a sorted set (whether or not an
 * explicit comparator is provided) must be <i>consistent with equals</i> if
 * the sorted set is to correctly implement the <tt>Set</tt> interface.  (See
 * the <tt>Comparable</tt> interface or <tt>Comparator</tt> interface for a
 * precise definition of <i>consistent with equals</i>.)  This is so because
 * the <tt>Set</tt> interface is defined in terms of the <tt>equals</tt>
 * operation, but a sorted set performs all element comparisons using its
 * <tt>compareTo</tt> (or <tt>compare</tt>) method, so two elements that are
 * deemed equal by this method are, from the standpoint of the sorted set,
 * equal.  The behavior of a sorted set <i>is</i> well-defined even if its
 * ordering is inconsistent with equals; it just fails to obey the general
 * contract of the <tt>Set</tt> interface.

“precise definition of consistent with equals”是指:

 * The natural ordering for a class <tt>C</tt> is said to be <i>consistent
 * with equals</i> if and only if <tt>e1.compareTo(e2) == 0</tt> has
 * the same boolean value as <tt>e1.equals(e2)</tt> 

TreeMap的注释声明了,如果“不一致”,会违背Set的规范,具体点说,就是会违背通过equals方法判断重复对象的规范。文档已经说明了,所以,上面遇到的问题不能认为是一个bug。但是可以像《effective Java》中提到的一些点一样,我认为这是一个设计缺陷,得出这个结论是基于下面三点:
(1)不管是Comparator还是Comparable,目的都是用于对对象排序,从它们的文档中可以看到:
Comparable

 This interface imposes a total ordering on the objects of each class that
 * implements it.

Comparator

 * A comparison function, which imposes a <i>total ordering</i> on some
 * collection of objects

而SortedSet不仅让它们用于排序,还用它们代替equals方法来判断对象相等,这违背了单一职责原则,使得设计显得丑陋。
(2)Comparabl的文档中并没有强制满足与equals的一致性:

 * It is strongly recommended (though not required) that natural orderings be
 * consistent with equals.  This is so because sorted sets (and sorted maps)
 * without explicit comparators behave "strangely" when they are used with
 * elements (or keys) whose natural ordering is inconsistent with equals.  In
 * particular, such a sorted set (or sorted map) violates the general contract
 * for set (or map), which is defined in terms of the <tt>equals</tt>
 * method.<p>

(3) 站在一个程序之外的视角来看,要求两个对象相等是两个对象某一方面比较相等的充要条件,这本身就是不合理的。
最后,我来猜测一下,为什么类库的设计人员要这么做?在二叉搜索树中,要搜索一个key,比较的也只是这个key,根据key的有序存储方式,可以快速找到这个key对应的对象;这个key,对应的也就是Comparator中用于比较的元素。如果在实现中,用于比较相等的不再是key,那么二叉搜索树的存储优势就不在了。(其实,个人感觉,这个问题是可以解决的,比如,假设Java中相等的对象是Comparator比较结果为0的充分条件,就依旧可以使用key进行搜索,通过equals比较相等,并不会增加算法复杂度)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值