JDK源码阅读计划(Day5) Collections

最新推荐文章于 2021-04-26 19:31:10 发布

小凯Alex

最新推荐文章于 2021-04-26 19:31:10 发布

阅读量166

点赞数

分类专栏：分布式系统

本文链接：https://blog.csdn.net/weixin_38499215/article/details/105807010

版权

分布式系统专栏收录该内容

13 篇文章 0 订阅

订阅专栏

JDK11

Collections

Collection与Collectioons的区别我想很多面试题都会遇到

最大的不同就是前者是这个接口，是List和Set这两个子接口的顶级接口。后者是个容器工具方法类。


/*
     * Tuning parameters for algorithms - Many of the List algorithms have
     * two implementations, one of which is appropriate for RandomAccess
     * lists, the other for "sequential."  Often, the random access variant
     * yields better performance on small sequential access lists.  The
     * tuning parameters below determine the cutoff point for what constitutes
     * a "small" sequential access list for each algorithm.  The values below
     * were empirically determined to work well for LinkedList. Hopefully
     * they should be reasonable for other sequential access List
     * implementations.  Those doing performance work on this code would
     * do well to validate the values of these parameters from time to time.
     * (The first word of each tuning parameter name is the algorithm to which
     * it applies.)
     */
    private static final int BINARYSEARCH_THRESHOLD   = 5000;
    private static final int REVERSE_THRESHOLD        =   18;
    private static final int SHUFFLE_THRESHOLD        =    5;
    private static final int FILL_THRESHOLD           =   25;
    private static final int ROTATE_THRESHOLD         =  100;
    private static final int COPY_THRESHOLD           =   10;
    private static final int REPLACEALL_THRESHOLD     =   11;
    private static final int INDEXOFSUBLIST_THRESHOLD =   35;

上来就给了很多调参的参数,因为很多List的算法通常有两种实现，一种适用于随机访问的线性表，一种适用于顺序访问。通常来说对于小数据集来说，随机访问性能会比顺序访问要性能更好。

以上参数就是一个cutoff point，这里的每个值代表了该操作使用随机访问的数据的阈值。说白了就是根据这些阈值来决定到底是用随机访问实现还是顺序访问实现。

然后聊一聊提供了哪些方法:

sort

 public static <T extends Comparable<? super T>> void sort(List<T> list) {
      list.sort(null);
  }

 public static <T> void sort(List<T> list, Comparator<? super T> c) {
      list.sort(c);
  }

1.第一种要求列表的对象实现了Comparable接口
2.第二种要求sort函数传入一个Comparator参数

但最终还是调用List.java中的sort方法对对象数组的元素进行排序

   default void sort(Comparator<? super E> c) {
        Object[] a = this.toArray();
        Arrays.sort(a, (Comparator) c);
        ListIterator<E> i = this.listIterator();
        for (Object e : a) {
            i.next();
            i.set((E) e);
        }
    }

binarySearch

public static <T> int binarySearch(List<? extends Comparable<? super T>> list, T key) {
        if(list instanceof RandomAccess || list.size()<BINARYSEARCH_THRESHOLD) {
            return Collections.indexedBinarySearch(list, key);
        } else {
            return Collections.iteratorBinarySearch(list, key);
        }
    }

前提:

1.列表排好序
2.列表对象实现了Comparable接口

处理逻辑：

1.如果list实现了RandomAccess接口即支持随机访问，或者小于阈值则调用

 private static <T> int indexedBinarySearch(List<? extends Comparable<? super T>> list, T key) {
        int low = 0;
        int high = list.size() - 1;
        
        while(low<=high) {
            // >>> 无符号移位
            int mid = (low + high) >>> 1;
            
            Comparable<? super T> midVal = list.get(mid);
            
            int cmp = midVal.compareTo(key);
            
            if(cmp<0) {
                low = mid + 1;
            } else if(cmp>0) {
                high = mid - 1;
            } else {
                return mid; // key found
            }
        }
        
        return -(low + 1);  // key not found
    }

2.否则调用

// 二分查找(使用内部比较器)：在list中查找key，返回其索引
    private static <T> int iteratorBinarySearch(List<? extends Comparable<? super T>> list, T key) {
        int low = 0;
        int high = list.size() - 1;
        
        ListIterator<? extends Comparable<? super T>> i = list.listIterator();
        
        while(low<=high) {
            int mid = (low + high) >>> 1;
            
            //与第一种方法的核心区别在于这里调用了get方法
            Comparable<? super T> midVal = get(i, mid);
            
            int cmp = midVal.compareTo(key);
            
            if(cmp<0) {
                low = mid + 1;
            } else if(cmp>0) {
                high = mid - 1;
            } else {
                return mid; // key found
            }
        }
        
        return -(low + 1);  // key not found
    }

我们看到基于迭代的binarySearch调用了list的get方法（因为不能直接通过索引定位某元素），这个方法时间复杂度为O(N),因为需要遍历找到这个元素。所以基于iterator的方法性能相对较差。

下面给出get方法的源码

// 从指定的迭代器中获取索引为index的元素
    private static <T> T get(ListIterator<? extends T> i, int index) {
        T obj = null;
        
        int pos = i.nextIndex();
        
        if(pos<=index) {
            do {
                obj = i.next();
            } while(pos++<index);
        } else {
            do {
                obj = i.previous();
            } while(--pos>index);
        }
        
        return obj;
    }

reverse

public static void reverse(List<?> list) {
        int size = list.size();
        
        if(size<REVERSE_THRESHOLD || list instanceof RandomAccess) {
            for(int i = 0, mid = size >> 1, j = size - 1; i<mid; i++, j--) {
                // 交换list中i和j的处的元素
                swap(list, i, j);
            }
        } else {
            
            /* instead of using a raw type here, it's possible to capture the wildcard but it will require a call to a supplementary private method */
            //iterator的反转写的很巧妙  
            ListIterator fwd = list.listIterator();
            ListIterator rev = list.listIterator(size);
          
            for(int i = 0, mid = list.size() >> 1; i<mid; i++) {
                Object tmp = fwd.next();
                fwd.set(rev.previous());
                rev.set(tmp);
            }
        }
    }

rotate方法


public static void rotate(List<?> list, int distance) {
        if(list instanceof RandomAccess || list.size()<ROTATE_THRESHOLD) {
            rotate1(list, distance);
        } else {
            rotate2(list, distance);
        }
    }

其实这就是个移位操作，如果distance大于0，表明右移distance位，否则左移distance位

如果支持随机访问或者小于阈值100

调用第一种方法

private static <T> void rotate1(List<T> list, int distance) {
        int size = list.size();
        if(size == 0) {
            return;
        }
        
        distance = distance % size;
        // 如果dis<0 等效于 右移 size-distance位
        if(distance<0) {
            distance += size;
        }
        
        if(distance == 0) {
            return;
        }
        
        // 这里的移位很是巧妙
        // src list : [1, 2, 3, 4, 5, 6, 7, 8, 9]
		// rotate list : [8, 9, 1, 2, 3, 4, 5, 6, 7] 
        // 接着上面右移两位理解下
        for(int cycleStart = 0, nMoved = 0; nMoved != size; cycleStart++) {
            T displaced = list.get(cycleStart);
            int i = cycleStart;
            do {
               // 
                i += distance;
                if(i >= size) {
                    i -= size;
                }
                displaced = list.set(i, displaced);
                nMoved++;
            } while(i != cycleStart);
        }
    }

否则调用第二种方法

private static void rotate2(List<?> list, int distance) {
        int size = list.size();
        if(size == 0) {
            return;
        }
        
        int mid = -distance % size;
        if(mid<0) {
            mid += size;
        }
        
        if(mid == 0) {
            return;
        }
        
        reverse(list.subList(0, mid));
        reverse(list.subList(mid, size));
        reverse(list);
    }

假如是旋转k位,那么对size-k之前部分旋转，之后部分再旋转，然后全局旋转
在这里插入图片描述

// 同步容器(线程安全)
static class SynchronizedCollection<E> implements Collection<E>, Serializable {
private static final long serialVersionUID = 3053995032091335093L;
  
final Collection<E> c;  // Backing Collection
final Object mutex;     // Object on which to synchronize

容器的所有操作都会用synchronized修饰，除了迭代方法和返回数据流的方法。

除了同步容器还有只读容器:

static class UnmodifiableCollection<E> implements Collection<E>, Serializable {
        private static final long serialVersionUID = 1820017752578914078L; 
        //关键是容器用final修饰
        final Collection<? extends E> c;

还有可以动态检查加入元素是否符合预设类型的类型安全容器

static class CheckedCollection<E> implements Collection<E>, Serializable {
        private static final long serialVersionUID = 1578914078182001775L;
        
        private E[] zeroLengthElementArray; // Lazily initialized
        
        final Collection<E> c;
        final Class<E> type;


//核心方法1:

// 类型检查，如果o非空，则其必须为预设类型的实例
 @SuppressWarnings("unchecked")
 E typeCheck(Object o) {
     if(o != null && !type.isInstance(o)) {
         throw new ClassCastException(badElementMsg(o));
     }
     return (E) o;
 }

Collection<E> checkedCopyOf(Collection<? extends E> coll) {
  Object[] a;
     
     try {
         // 获取零容量数组
         E[] z = zeroLengthElementArray();
         
         a = coll.toArray(z);
         
         // Defend against coll violating the toArray contract
         if(a.getClass() != z.getClass()) {
             // 类型转换
             a = Arrays.copyOf(a, a.length, z.getClass());
         }
     } catch(ArrayStoreException ignore) {
         // To get better and consistent diagnostics,
         // we call typeCheck explicitly on each element.
         // We call clone() to defend against coll retaining a
         // reference to the returned array and storing a bad
         // element into it after it has been type checked.
         // 抛异常的话，可以浅克隆一个列表然后依次对列表中元素做类型检查
         a = coll.toArray().clone();
         for(Object o : a) {
             typeCheck(o);
         }
     }
     // A slight abuse of the type system, but safe here.
     return (Collection<E>) Arrays.asList(a);
 }

SingletonList

private static class SingletonList<E> extends AbstractList<E> implements RandomAccess, Serializable {
        private static final long serialVersionUID = 3093736618740652951L;
       //注意这个元素是不可变的 
        private final E element;

返回一个包含指定对象的不可变列表
还有一些类似的SingletonSet,SingletonMap就不赘述了

AsLIFOQueue

Deque转成后进先出（LIFO）队列

// 栈式(后入先出)队列，具体实现为入队/出队操作在头部进行
    static class AsLIFOQueue<E> extends AbstractQueue<E> implements Queue<E>, Serializable {
        private static final long serialVersionUID = 1802017725587941708L;
        
        private final Deque<E> q;
        
        AsLIFOQueue(Deque<E> q) {
            this.q = q;
        }
        
        public boolean offer(E e) {
            return q.offerFirst(e);
        }
        
        public boolean add(E e) {
            q.addFirst(e);
            return true;
        }
        
        public E poll() {
            return q.pollFirst();
        }
        
        public E remove() {
            return q.removeFirst();
        }

SetFromMap

// 来自Map的Set(只使用了keySet)
    private static class SetFromMap<E> extends AbstractSet<E> implements Set<E>, Serializable {
        private static final long serialVersionUID = 2454657854757543876L;
        
        private final Map<E, Boolean> m;  // The backing map
        private transient Set<E> s;       // Its keySet
        
        SetFromMap(Map<E, Boolean> map) {
            if(!map.isEmpty()) {
                throw new IllegalArgumentException("Map is non-empty");
            }
            m = map;
            s = map.keySet();
        }

因为JDK内部是没有定义ConcurrentHashSet的，可以尝试调用以下方法:

public static <E> Set<E> newSetFromMap(Map<E, Boolean> map) {
     return new SetFromMap<>(map);
 }
 //基于指定的Map对象创建一个新的Set对象，它持有这个Map对象的引用，并且可以保持Map的顺序，并发和性能特征
Set<String> concurrentSet = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());

逆序比较器，两种实现，分别传入内部比较器和外部比较器

public static <T> Comparator<T> reverseOrder() {
        return (Comparator<T>) ReverseComparator.REVERSE_ORDER;
}

用法如下:

Collections.sort(list, Collections.reverseOrder());
//可以传入一个外部自定义的Comparator
Collections.sort(list, Collections.reverseOrder(new Comparator()))

disjoint

若两个集合有交集则返回true
优化点：
让contains是数据量较大的集合
让itetare是数据量较小的集合,遍历起来比较快

如果c1是Set类的话，contains方法时间复杂度会少于O(N/2),Set类的contains方法本质上是HashMap上的getNode(),根据数组找到相应的槽然后遍历这个槽上的链表or红黑树，效率是比遍历整个集合要快的,遍历整个集合平均时间复杂度为O(N/2)。

而非Set类的contains方法调用的是以下这个区间搜索。

// 在[start, end)之间正序搜索元素o，返回首个匹配的索引
    int indexOfRange(Object o, int start, int end) {
        Object[] es = elementData;
        if(o == null) {
            for(int i = start; i<end; i++) {
                if(es[i] == null) {
                    return i;
                }
            }
        } else {
            for(int i = start; i<end; i++) {
                if(o.equals(es[i])) {
                    return i;
                }
            }
        }
        return -1;
    }

public static boolean disjoint(Collection<?> c1, Collection<?> c2) {
        // The collection to be used for contains().
        // Preference is given to the collection who's contains() has lower O() complexity.
        Collection<?> contains = c2;
        
        // The collection to be iterated.
        // If the collections' contains() impl are of different O() complexity,
        // the collection with slower contains() will be used for iteration.
        // For collections who's contains() are of the same complexity then best performance is achieved by iterating the smaller collection.
        Collection<?> iterate = c1;
        
        // Performance optimization cases. The heuristics:
        //   1. Generally iterate over c1.
        //   2. If c1 is a Set then iterate over c2.
        //   3. If either collection is empty then result is always true.
        //   4. Iterate over the smaller Collection.
        if(c1 instanceof Set) {
            // Use c1 for contains as a Set's contains() is expected to perform
            // better than O(N/2)
            iterate = c2;
            contains = c1;
        } else if(!(c2 instanceof Set)) {
            // Both are mere Collections. Iterate over smaller collection.
            // Example: If c1 contains 3 elements and c2 contains 50 elements and assuming contains() requires ceiling(N/2) comparisons
            // then checking for all c1 elements in c2 would require 75 comparisons (3 * ceiling(50/2))
            // vs. checking all c2 elements in c1 requiring 100 comparisons (50 * ceiling(3/2)).
            int c1size = c1.size();
            int c2size = c2.size();
            if(c1size == 0 || c2size == 0) {
                // At least one collection is empty. Nothing will match.
                return true;
            }
            
            if(c1size>c2size) {
                iterate = c2;
                contains = c1;
            }
        }
        
        for(Object e : iterate) {
            if(contains.contains(e)) {
                // Found a common element. Collections are not disjoint.
                return false;
            }
        }
        
        // No common elements were found.
        return true;
    }

shuffle 随机排序

Knuth-Durstenfeld Shuffle 算法

保证list[i]洗牌后落到任何一个位置的概率都是 1/size

public static void shuffle(List<?> list, Random rnd) {
        int size = list.size();
        
        if(size<SHUFFLE_THRESHOLD || list instanceof RandomAccess) {
            for(int i = size; i>1; i--) {
                swap(list, i - 1, rnd.nextInt(i));
            }
        } else {
            Object[] arr = list.toArray();
            
            // Shuffle array
            for(int i = size; i>1; i--) {
                swap(arr, i - 1, rnd.nextInt(i));
            }
            
            /*
             * Dump array back into list
             * instead of using a raw type here, it's possible to capture the wildcard but it will require a call to a supplementary private method
             */
            ListIterator it = list.listIterator();
            for(Object e : arr) {
                it.next();
                it.set(e);
            }
        }
    }

十分fancy的swap写法

public static void swap(List<?> list, int i, int j) {
        
        /* instead of using a raw type here, it's possible to capture the wildcard but it will require a call to a supplementary private method */
        
        final List l = list;
        // L[j] = L[i];
        // L[i] = L[j]; ,l.set会返回旧元素
        l.set(i, l.set(j, l.get(i)));
    }

十分暴力的字符串匹配

寻找子集合在主集合中索引位置，这个索引只可能在[0,sourceSize-targetSize]中

public static int indexOfSubList(List<?> source, List<?> target) {
        int sourceSize = source.size();
        int targetSize = target.size();
        int maxCandidate = sourceSize - targetSize;
        
        if(sourceSize<INDEXOFSUBLIST_THRESHOLD || (source instanceof RandomAccess && target instanceof RandomAccess)) {
nextCand:
            for(int candidate = 0; candidate<=maxCandidate; candidate++) {
                for(int i = 0, j = candidate; i<targetSize; i++, j++) {
                    if(!eq(target.get(i), source.get(j))) {
                        continue nextCand;  // Element mismatch, try next cand
                    }
                }
                return candidate;  // All elements of candidate matched target
            }
        } else {  // Iterator version of above algorithm
            ListIterator<?> si = source.listIterator();
nextCand:
            for(int candidate = 0; candidate<=maxCandidate; candidate++) {
                ListIterator<?> ti = target.listIterator();
                for(int i = 0; i<targetSize; i++) {
                    if(!eq(ti.next(), si.next())) {
                        // Back up source iterator to next candidate
                        for(int j = 0; j<i; j++) {
                            si.previous();
                        }
                        continue nextCand;
                    }
                }
                return candidate;
            }
        }
        
        return -1;  // No candidate matched the target
    }

ref

https://www.jianshu.com/p/51ce612db017

小凯Alex

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JDK源码阅读计划(Day5) Collections

JDK11CollectionsCollection与Collectioons的区别我想很多面试题都会遇到最大的不同就是前者是这个接口，是List和Set这两个子接口的顶级接口。后者是个容器工具方法类。/* * Tuning parameters for algorithms - Many of the List algorithms have * two imple...
复制链接

扫一扫