JDK8之Collector

最新推荐文章于 2024-01-22 14:50:57 发布

巧笑情兮_美目盼兮

最新推荐文章于 2024-01-22 14:50:57 发布

阅读量570

点赞数 3

分类专栏： Javase Java8新特性分析文章标签： JDK JDK8 Collector

本文链接：https://blog.csdn.net/ITITII/article/details/85329637

版权

Javase 同时被 2 个专栏收录

16 篇文章

订阅专栏

Java8新特性分析

6 篇文章

订阅专栏

Collector

Collector是JDK8开始新增加进来的。关于Collector是什么？有什么用？接下来就是来对Collector的源码DOC进行解析：

A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed. Reduction operations can be performed either sequentially or in parallel.

一种可变汇聚操作，将输入元素累积到可变结果容器中，可选地在处理完所有输入元素之后将累积结果转换为最终表示。汇聚操作可以顺序地或并行地执行。这句话就把 `Collector` 是什么诠释的特别详细啦！

例如：1.把元素累加到集合当中去。2.使用StringBuilder拼接字符串。3.计算关于诸如sum、min、max或平均值之类的元素的汇总信息。

 A {@code Collector} is specified by four functions that work together to
 accumulate entries into a mutable result container, and optionally perform
 a final transform on the result.They are:
 <ul>
    <li>creation of a new result container ({@link #supplier()})</li>
    <li>incorporating a new data element into a result container ({@link #accumulator()})</li>
    <li>combining two result containers into one ({@link #combiner()})</li>
    <li>performing an optional final transform on the container ({@link #finisher()})</li>
  </ul>

一个Collector由以下四个函数指定将元素累积到可变结果容器中，并可选地执行对结果的最终转换：

supplier函数：创建一个结果容器。
accumulator函数：将新的数据元素加到结果容器中
combiner函数：将多个结果容器合并成一个容器
finisher函数：对容器执行可选的最终转换

通过这些DOC可以知道Collector有4个特别重要的方法，那么接下来就来看下这四个方法：

1.supplier

    /**
     * A function that creates and returns a new mutable result container.
     *
     * @return a function which returns a new, mutable result container
     */
    Supplier<A> supplier();

该方法需要返回一个Supplier对象，通过之前对Supplier的介绍，该对象就是一个提供者(也可以认为是一个生产者)，该方法会提供生产一个可变的结果容器(可变的结果容器就是我们常用的集合)，注意：该方法是一个泛型方法：泛型参数A代表生产出的结果容器类型。

2.accumulator

 /**
     * A function that folds a value into a mutable result container.
     *
     * @return a function which folds a value into a mutable result container
     */
    BiConsumer<A, T> accumulator();

该方法返回一个BiConsumer对象，该对象可以认为是消费者，而它的实际作用是把元素累加到supplier生产的结果容器中去。==注意：该方法同样式一个泛型方法，泛型参数T A分别代表元素的类型和结果容器的类型。

3.combiner

/**
     * A function that accepts two partial results and merges them.  The
     * combiner function may fold state from one argument into the other and
     * return that, or may return a new result container.
     *
     * @return a function which combines two partial results into a combined
     * result
     */
    BinaryOperator<A> combiner();

该方法返回一个BinaryOperator,该对象的作用是：接收两个T类型的参数，返回一个T类型的结果。因此就可以非常明确的理解到combiner的作用：将多个新的结果容器合并为一个结果容器。那么有一个问题出现啦：为什么会生产多个结果容器，把每个元素添加到结果容器中去，然后再把每个结果容器进行合并成一个容器。为什么不直接生成一个结果容器呢？

关于这个问题在下面会详细的讲解，这里就简单的说一下：因为，我们使用流的时候，流是有串行流和并行流的，那么并行流就是多线程操作的，因此在多线程操作的情况下，就可以生成多个结果容器，并把元素分配到每个新的结果容器中，分配结束后，就需要将所有结果容器合并为一个整体的结果容器。同样，该方法也是泛型方法，泛型参数A就是结果容器类型

4.finisher

/**
     * Perform the final transformation from the intermediate accumulation type
     * {@code A} to the final result type {@code R}.
     *
     * <p>If the characteristic {@code IDENTITY_TRANSFORM} is
     * set, this function may be presumed to be an identity transform with an
     * unchecked cast from {@code A} to {@code R}.
     *
     * @return a function which transforms the intermediate result to the final
     * result
     */
    Function<A, R> finisher();

该方法返回一个Function对象，该对象的作用就是接收一个输入元素类型A，返回一个结果类型R。这样就可以理解finisher方法啦，该方法，就做最终的转换，把结果容器转换为我们需要的最终结果R。该方法同样值得注意的是：该方法的DOC也说了，就是如果characteristic中设置的有IDENTITY_TRANSFORM属性，finisher方法将不会被调用。关于为什么？这个问题同样会在下面做出专门的解释。

但是，我们应该知道的是：如果我们需要的最终结果类型就是可变容器类型，就没有必要执行finisher方法，这样也是为了提高执行效率。
该方法同样是泛型方法，泛型参数A R分别代表结果容器类型和最终转换的结果类型。`

上面已经对Collector主要的方法做了全面的记录，那么接下来就对Collector中的一个特别的方法做出讲解：

/**
    * Returns a {@code Set} of {@code Collector.Characteristics} indicating
    * the characteristics of this Collector.  This set should be immutable.
    *
    * @return an immutable set of collector characteristics
    */
   Set<Characteristics> characteristics();

characteristics方法：返回一组不可变的指示收集器特性。也就是说该方法是返回对该收集器特有的特性。那么，收集器又有哪些特性呢？

/**
     * Characteristics indicating properties of a {@code Collector}, which can
     * be used to optimize reduction implementations.
     */
    enum Characteristics {
        /**
         * Indicates that this collector is <em>concurrent</em>, meaning that
         * the result container can support the accumulator function being
         * called concurrently with the same result container from multiple
         * threads.
         *
         * <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
         * then it should only be evaluated concurrently if applied to an
         * unordered data source.
         */
        CONCURRENT,

        /**
         * Indicates that the collection operation does not commit to preserving
         * the encounter order of input elements.  (This might be true if the
         * result container has no intrinsic order, such as a {@link Set}.)
         */
        UNORDERED,

        /**
         * Indicates that the finisher function is the identity function and
         * can be elided.  If set, it must be the case that an unchecked cast
         * from A to R will succeed.
         */
        IDENTITY_FINISH
    }

在Collector接口内部定义了一个枚举类Characteristics，该枚举类就是列出了收集的特性。那么接下来就是对该枚举类中的元素做出详细的介绍：

1.CONCURRENT

 /**
         * Indicates that this collector is <em>concurrent</em>, meaning that
         * the result container can support the accumulator function being
         * called concurrently with the same result container from multiple
         * threads.
         *
         * <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
         * then it should only be evaluated concurrently if applied to an
         * unordered data source.
         */
        CONCURRENT,

CONCURRENT属性标记该收集器是并发的。这意味着结果容器可以支持来自多个线程对其进行accumulator操作。也就是说该标记表示一个相同的结果容器可以被多个线程进行accumulator操作。因此，被CONCURRENT标记的收集器有一下特征：1.supplier会调用一次，只创建一个可变的容器。2.supplier创建的容器会在多线程环境下，被多个线程执行accumulator操作。3.combiner操作不会被调用。相反，如果没有CONCURRENT标记，combiner函数会被执行调用，并且supplier也会生成多个容器，一个线程生成一个容器。关于这些特性在下面自定义Collector实现的时候会去进行验证。

2.UNORDERED

/**
         * Indicates that the collection operation does not commit to preserving
         * the encounter order of input elements.  (This might be true if the
         * result container has no intrinsic order, such as a {@link Set}.)
         */
        UNORDERED,

UNORDERED属性表示的意思比较简单。指示收集操作不承诺保留输入元素的输入顺序。（如果结果容器没有内在顺序，比如Set）

3.IDENTITY_FINISH

        /**
         * Indicates that the finisher function is the identity function and
         * can be elided.  If set, it must be the case that an unchecked cast
         * from A to R will succeed.
         */
        IDENTITY_FINISH

IDENTITY_FINISH属性表示的是：如果设置了该特性，表示finisher函数不会被调用，在实现上可以被省略。并且设置了该特性，还必须保证从A类型到R类型可以被强制转换的。如果不能的话就会报ClassCastException异常。

上面算是对Collector做了全面的阐述，那么接下来就来尝试自定义一个收集器的实现：

public class MyCollector<T> implements Collector<T, Set<T>, Set<T>> {
    @Override
    public Supplier<Set<T>> supplier() {
        System.out.println("supplier invoke");
        return HashSet<T>::new;
    }
    @Override
    public BiConsumer<Set<T>, T> accumulator() {
        System.out.println("accumulator invoke");
        return (set,item)->{
            set.add(item);
        };
    }
    @Override
    public BinaryOperator<Set<T>> combiner() {
        System.out.println("combiner invoke");
        return (set1,set2)->{
            set1.addAll(set2);
            return set1;
        };
    }
    @Override
    public Function<Set<T>, Set<T>> finisher() {
        System.out.println("finisher invoke");
        return (set)->{
            return set;
        };
    }
    @Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));
    }
    public static void main(String[] args) {
        Set<String> set = new HashSet<>(Arrays.asList("nihao","hello","world","hello","welcome"));
        set.stream().collect(new MyCollector<String>());
    }
}//output:
//supplier invoke
//accumulator invoke
//combiner invoke
//finisher invoke

通过观察输出结果就可以证明Collector收集器的4个方法执行顺序。那么，我们在characteristics方法中添加一句输出信息：

    @Override
    public Set<Characteristics> characteristics() {
        System.out.println("characteristics invoke");
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));
    }
    //output:
    //supplier invoke
    //accumulator invoke
    //combiner invoke
    //characteristics invoke
    //characteristics invoke
    //finisher invoke

这次的输出结果就发生了一定的变化：为什么characteristics方法比finisher方法先调用?并且characteristics方法调用了两次？

关于这个问题的答案我们就可以去参考collect方法的实现：

    public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
        A container;
        if (isParallel()
                && (collector.characteristics().contains(Collector.Characteristics.CONCURRENT))
                && (!isOrdered() || collector.characteristics().contains(Collector.Characteristics.UNORDERED))) {
            container = collector.supplier().get();
            BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
            forEach(u -> accumulator.accept(container, u));
        }
        else {
            container = evaluate(ReduceOps.makeRef(collector));
        }
        return collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)
               ? (R) container
               : collector.finisher().apply(container);//通过这就可以非常直观的发现finisher方法发生在characteristics方法之后，并且finisher方法调用必须要IDENTITY_FINISH属性不存在。
    }

观察发现finisher方法发生在characteristics方法之后，并且finisher方法调用必须要IDENTITY_FINISH属性不存在。

接下来，把上面定义的收集器代码改写一下：

@Override
    public Supplier<Set<T>> supplier() {
        return ()->{
            System.out.println("supplier create container-------"+Thread.currentThread().getName());
            return new HashSet<T>();
        };
    }
    @Override
    public BiConsumer<Set<T>, T> accumulator() {
        return (set,item)->{
            System.out.println("accumulator:"+set+"------"+Thread.currentThread().getName());
            set.add(item);
        };
    }
    @Override
    public BinaryOperator<Set<T>> combiner() {
        return (set1,set2)->{
            System.out.println("combiner:"+ set1 + "----" + Thread.currentThread().getName());
            set1.addAll(set2);
            return set1;
        };
    }
    @Override
    public Function<Set<T>, Set<T>> finisher() {
        return (set)->{
            System.out.println("finisher:"+ set + "----" +Thread.currentThread().getName());
            return set;
        };
    }
    @Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));
    }
    public static void main(String[] args) {
        Set<String> set = new HashSet<>(Arrays.asList("nihao","hello","world","hello","welcome"));
        System.out.println(set.stream().collect(new MyCollector<String>()));
    }

1.characteristics方法提供UNORDERED属性的时候

使用串行流实现上面的代码，输出结果为：

supplier create container-------main
accumulator:[]------main
accumulator:[world]------main
accumulator:[world, nihao]------main
accumulator:[world, nihao, hello]------main
finisher:[world, nihao, hello, welcome]----main
[world, nihao, hello, welcome]

从输出结果可以看出：1.只创建了一个结果容器。2.只有一个线程(main)操作结果容器。3.没有执行combiner方法。4.执行了finisher方法

把流改为并行流：

public static void main(String[] args) {
        Set<String> set = new HashSet<>(Arrays.asList("nihao","hello","world","hello","welcome"));
        System.out.println(set.stream().parallel() .collect(new MyCollector<String>()));
    }

执行后的输出结果：

supplier create container-------ForkJoinPool.commonPool-worker-1
supplier create container-------main
supplier create container-------ForkJoinPool.commonPool-worker-2
accumulator:[]------ForkJoinPool.commonPool-worker-2
accumulator:[]------ForkJoinPool.commonPool-worker-1
accumulator:[]------main
supplier create container-------ForkJoinPool.commonPool-worker-2
combiner:[world]----ForkJoinPool.commonPool-worker-1
accumulator:[]------ForkJoinPool.commonPool-worker-2
combiner:[hello]----ForkJoinPool.commonPool-worker-2
combiner:[world, nihao]----ForkJoinPool.commonPool-worker-2
finisher:[world, nihao, hello, welcome]----main
[world, nihao, hello, welcome]

1.会创建多个结果容器。2.有多个线程执行。3.会执行combineer方法。3.会执行finisher方法。

通过输出结果就可以证明：当characteristics方法提供的set容器中添加收集器属性只有UNORDERED,那么收集器执行的时候就会执行finisher方法。如果在串行流中，只会创建一个结果容器，并且不会执行combiner方法。如果在并行流中，就会创建多个结果容器，会执行combiner方法。

2.characteristics方法提供IDENTITY_FINISH属性和的时候

@Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED,Characteristics.IDENTITY_FINISH));
    }

在串行流中执行的结果：

supplier create container-------main
accumulator:[]------main
accumulator:[world]------main
accumulator:[world, nihao]------main
accumulator:[world, nihao, hello]------main
[world, nihao, hello, welcome]

在并行流中执行的结果：

supplier create container-------main
supplier create container-------ForkJoinPool.commonPool-worker-2
supplier create container-------ForkJoinPool.commonPool-worker-1
supplier create container-------ForkJoinPool.commonPool-worker-3
accumulator:[]------main
accumulator:[]------ForkJoinPool.commonPool-worker-3
accumulator:[]------ForkJoinPool.commonPool-worker-1
accumulator:[]------ForkJoinPool.commonPool-worker-2
combiner:[hello]----ForkJoinPool.commonPool-worker-1
combiner:[world]----ForkJoinPool.commonPool-worker-2
combiner:[world, nihao]----ForkJoinPool.commonPool-worker-2
[world, nihao, hello, welcome]

通过观察结果：添加了IDENTITY_FINISH属性后，就不会调用收集器中的finisher方法。但是需要注意：如果添加了IDENTITY_FINISH属性，就意味着从A类型到R类型是通过强制类型转换的，而没有调用finisher方法。所以，必须保证A结果容器类型到R最终结果类型是可以进行强制转换的。不然的话，就会报ClassCastException异常。

3.characteristics方法提供CONCURRENT属性和的时候

@Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED,Characteristics.IDENTITY_FINISH,Characteristics.CONCURRENT));
    }

在串行流中执行的结果：

supplier create container-------main
accumulator:[]------main
accumulator:[world]------main
accumulator:[world, nihao]------main
accumulator:[world, nihao, hello]------main
[world, nihao, hello, welcome]

在并行流中执行的结果：

supplier create container-------main
accumulator:[]------main
accumulator:[hello]------main
accumulator:[hello, welcome]------main
accumulator:[hello]------ForkJoinPool.commonPool-worker-1
[world, nihao, hello, welcome]

通过观察结果：添加了CONCURRENT属性，就意味着在多线程操作下，supplier方法也只会提供一个结果容器，因此就不需要调用combiner方法。但是必须注意的是：由于是多个线程操作同一个结果容器，那么在accumulator方法中，就不要做一些线程不安全的操作，不然的话就会出现异常的结果甚至报错。