Java8 ParallelStream返回结果顺序问题

L.ZZ

已于 2023-02-23 17:30:33 修改

阅读量9.3k

点赞数 11

分类专栏： Java 文章标签： java 算法数据结构

于 2020-06-20 15:12:12 首次发布

本文链接：https://blog.csdn.net/lijingjingchn/article/details/106872480

版权

Java 专栏收录该内容

70 篇文章 3 订阅

订阅专栏

1. 前言

之前一直以为如果是并行流，那么返回的结果一定是乱序的。其实这是错误的。

Stream s = Stream.of("1","2","3","4","5","6","7");
s.parallel().collect(Collectors.toList()); //一定返回有序结果

2. 源码

是否有序跟并行流还是串行流没有关系，只跟Collector的特性Characteristics有关。

    enum Characteristics {
        /**
         * Indicates that this collector is <em>concurrent</em>, meaning that
         * the result container can support the accumulator function being
         * called concurrently with the same result container from multiple
         * threads.
         *
         * <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
         * then it should only be evaluated concurrently if applied to an
         * unordered data source. //即CONCURRENT的收集器只能用于无序源
         */
        CONCURRENT, //标记容器是线程安全的，如ConcurrentHashMap
 
        /**
         * Indicates that the collection operation does not commit to preserving
         * the encounter order of input elements.  (This might be true if the
         * result container has no intrinsic order, such as a {@link Set}.)
         */
        UNORDERED,
 
        /**
         * Indicates that the finisher function is the identity function and
         * can be elided.  If set, it must be the case that an unchecked cast
         * from A to R will succeed.
         */
        IDENTITY_FINISH
    }

而Collectors.toList()返回的收集器只是IDENTITY_FINISH的，见Collectors.toList()源码：

/**
     * Returns a {@code Collector} that accumulates the input elements into a
     * new {@code List}. There are no guarantees on the type, mutability,
     * serializability, or thread-safety of the {@code List} returned; if more
     * control over the returned {@code List} is required, use {@link #toCollection(Supplier)}.
     *
     * @param <T> the type of the input elements
     * @return a {@code Collector} which collects all the input elements into a
     * {@code List}, in encounter order
     */
    public static <T>
    Collector<T, ?, List<T>> toList() {
        return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
                                   (left, right) -> { left.addAll(right); return left; },
                                   CH_ID);
    }

所以s.parallel().collect(Collectors.toList())一定返回有序结果。

另外可以看下collect() 方法的实现：

    public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
        A container;
        //如果是并行流且收集器CONCURRENT是无序的
        if (isParallel()//
                && (collector.characteristics().contains(Collector.Characteristics.CONCURRENT))
                && (!isOrdered() || collector.characteristics().contains(Collector.Characteristics.UNORDERED))) {
            container = collector.supplier().get();
            BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
            forEach(u -> accumulator.accept(container, u));//此方法收集后的结果是无序的
        }
        else {
            container = evaluate(ReduceOps.makeRef(collector));//此方法收集后的结果有无序的，但仍可以是并行计算。
        }
        return collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)
               ? (R) container
               : collector.finisher().apply(container);
    }

    @Override
    public void forEach(Consumer<? super E_OUT> action) {
        if (!isParallel()) {
            sourceStageSpliterator().forEachRemaining(action);
        }
       else {
           super.forEach(action);
        }
    }

    @Override
    public void forEach(Consumer<? super P_OUT> action) {
        evaluate(ForEachOps.makeRef(action, false));     // boolean orderd：为false
    }


    public static <T> TerminalOp<T, Void> makeRef(Consumer<? super T> action,
                                                  boolean ordered) {
        Objects.requireNonNull(action);
        return new ForEachOp.OfRef<>(action, ordered);
    }

    final <R> R evaluate(TerminalOp<E_OUT, R> terminalOp) {
        assert getOutputShape() == terminalOp.inputShape();
        if (linkedOrConsumed)
            throw new IllegalStateException(MSG_STREAM_LINKED);
        linkedOrConsumed = true;
 
        return isParallel()//判断是否并行流，来决定是否并行计算（使用Spliterator），跟收集器是否是CONCURRENT无关
               ? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
               : terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));
    }

        @Override
        public <S> Void evaluateParallel(PipelineHelper<T> helper,
                                         Spliterator<S> spliterator) {
            if (ordered)//并行流仍然可以是顺序计算
                new ForEachOrderedTask<>(helper, spliterator, this).invoke();
            else
                new ForEachTask<>(helper, spliterator, helper.wrapSink(this)).invoke();
            return null;
        }

3. 总结

为了确保整个流（stream）中维持顺序，必须研究流的来源（documentation of the stream's source）、流的串/并行、所有的中间操作（intermediate operations）、所有的终止操作（terminal operation）是否维持顺序。

流的来源：如果数据源本身是无序的，那么讨论元素的执行顺序就没有意义；
流的串/并行：
串行流：对于串行的流，其数据源是有序的，如果中间操作中没有排序之类的影响顺序的操作，那么在最终操作中处理元素的顺序，和数据源中元素的顺序就是一致的；如果中间操作中有排序之类的操作，那么在最终操作中处理元素的顺序，和依次执行各个中间操作之后的元素顺序，是一致的。
并行流：对于并行的流，其数据源是有序的，但是其最终操作中处理元素的顺序依然是随机的；但是并行流可以通过foreachOrdered保证执行顺序和数据源中元素的顺序一致。

注意：
处理元素的顺序与最终结果的顺序不是同一个概念，处理过程中的处理顺序可以是无序的，但最终的结果任然可以是有序的。例如，如果您使用类似：List<…> result=inputList.parallelStream().map(…).filter(…).collect(Collectors.toList());
整个操作可能会受益于并行执行，但是无论您使用并行流还是顺序流，结果列表将始终处于正确的顺序。
中间操作：
中间操作除了sorted(),unsorted(),empty()都不影响结果顺序。
终止操作：
collect()方法之后的顺序跟具体收集器有关，如
1）Collectors.toSet()返回的收集器是UNORDERED，而toList()则不是。
2）foreach()：ForEach logs the elements in the order they arrive from each thread.list.stream().parallel().forEach(e -> logger.log(Level.INFO, e));
3）forEachOrdered()：forEachOrdered保证顺序，即使用于并行流。list.stream().parallel().forEachOrdered(e -> logger.log(Level.INFO, e));