1. 前言
之前一直以为如果是并行流,那么返回的结果一定是乱序的。其实这是错误的。
Stream s = Stream.of("1","2","3","4","5","6","7");
s.parallel().collect(Collectors.toList()); //一定返回有序结果
2. 源码
是否有序跟并行流还是串行流没有关系,只跟Collector
的特性Characteristics
有关。
enum Characteristics {
/**
* Indicates that this collector is <em>concurrent</em>, meaning that
* the result container can support the accumulator function being
* called concurrently with the same result container from multiple
* threads.
*
* <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
* then it should only be evaluated concurrently if applied to an
* unordered data source. //即CONCURRENT的收集器只能用于无序源
*/
CONCURRENT, //标记容器是线程安全的,如ConcurrentHashMap
/**
* Indicates that the collection operation does not commit to preserving
* the encounter order of input elements. (This might be true if the
* result container has no intrinsic order, such as a {@link Set}.)
*/
UNORDERED,
/**
* Indicates that the finisher function is the identity function and
* can be elided. If set, it must be the case that an unchecked cast
* from A to R will succeed.
*/
IDENTITY_FINISH
}
而Collectors.toList()
返回的收集器只是IDENTITY_FINISH
的,见Collectors.toList()
源码:
/**
* Returns a {@code Collector} that accumulates the input elements into a
* new {@code List}. There are no guarantees on the type, mutability,
* serializability, or thread-safety of the {@code List} returned; if more
* control over the returned {@code List} is required, use {@link #toCollection(Supplier)}.
*
* @param <T> the type of the input elements
* @return a {@code Collector} which collects all the input elements into a
* {@code List}, in encounter order
*/
public static <T>
Collector<T, ?, List<T>> toList() {
return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
(left, right) -> { left.addAll(right); return left; },
CH_ID);
}
所以s.parallel().collect(Collectors.toList())
一定返回有序结果。
另外可以看下collect()
方法的实现:
public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
A container;
//如果是并行流且收集器CONCURRENT是无序的
if (isParallel()//
&& (collector.characteristics().contains(Collector.Characteristics.CONCURRENT))
&& (!isOrdered() || collector.characteristics().contains(Collector.Characteristics.UNORDERED))) {
container = collector.supplier().get();
BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
forEach(u -> accumulator.accept(container, u));//此方法收集后的结果是无序的
}
else {
container = evaluate(ReduceOps.makeRef(collector));//此方法收集后的结果有无序的,但仍可以是并行计算。
}
return collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)
? (R) container
: collector.finisher().apply(container);
}
@Override
public void forEach(Consumer<? super E_OUT> action) {
if (!isParallel()) {
sourceStageSpliterator().forEachRemaining(action);
}
else {
super.forEach(action);
}
}
@Override
public void forEach(Consumer<? super P_OUT> action) {
evaluate(ForEachOps.makeRef(action, false)); // boolean orderd:为false
}
public static <T> TerminalOp<T, Void> makeRef(Consumer<? super T> action,
boolean ordered) {
Objects.requireNonNull(action);
return new ForEachOp.OfRef<>(action, ordered);
}
final <R> R evaluate(TerminalOp<E_OUT, R> terminalOp) {
assert getOutputShape() == terminalOp.inputShape();
if (linkedOrConsumed)
throw new IllegalStateException(MSG_STREAM_LINKED);
linkedOrConsumed = true;
return isParallel()//判断是否并行流,来决定是否并行计算(使用Spliterator),跟收集器是否是CONCURRENT无关
? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
: terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));
}
@Override
public <S> Void evaluateParallel(PipelineHelper<T> helper,
Spliterator<S> spliterator) {
if (ordered)//并行流仍然可以是顺序计算
new ForEachOrderedTask<>(helper, spliterator, this).invoke();
else
new ForEachTask<>(helper, spliterator, helper.wrapSink(this)).invoke();
return null;
}
3. 总结
为了确保整个流(stream
)中维持顺序,必须研究流的来源(documentation of the stream's source
)、流的串/并行、所有的中间操作(intermediate operations
)、所有的终止操作(terminal operation
)是否维持顺序。
-
流的来源:如果数据源本身是无序的,那么讨论元素的执行顺序就没有意义;
-
流的串/并行:
串行流:对于串行的流,其数据源是有序的,如果中间操作中没有排序之类的影响顺序的操作,那么在最终操作中处理元素的顺序,和数据源中元素的顺序就是一致的;如果中间操作中有排序之类的操作,那么在最终操作中处理元素的顺序,和依次执行各个中间操作之后的元素顺序,是一致的。
并行流:对于并行的流,其数据源是有序的,但是其最终操作中处理元素的顺序
依然是随机的;但是并行流可以通过foreachOrdered
保证执行顺序和数据源中元素的顺序一致。注意:
处理元素的顺序与最终结果的顺序不是同一个概念,处理过程中的处理顺序可以是无序的,但最终的结果任然可以是有序的。例如,如果您使用类似:List<…> result=inputList.parallelStream().map(…).filter(…).collect(Collectors.toList());
整个操作可能会受益于并行执行,但是无论您使用并行流还是顺序流,结果列表将始终处于正确的顺序。 -
中间操作:
中间操作除了sorted()
,unsorted()
,empty()
都不影响结果顺序。 -
终止操作:
collect()
方法之后的顺序跟具体收集器有关,如
1)Collectors.toSet()
返回的收集器是UNORDERED
,而toList()
则不是。
2)foreach()
:ForEach logs the elements in the order they arrive from each thread.list.stream().parallel().forEach(e -> logger.log(Level.INFO, e));
3)forEachOrdered()
:forEachOrdered保证顺序,即使用于并行流。list.stream().parallel().forEachOrdered(e -> logger.log(Level.INFO, e));
令见:
- 影响结果顺序与否的因素:https://www.baeldung.com/java-stream-ordering
- 一个非常好的解答:https://stackoverflow.com/questions/29216588/how-to-ensure-order-of-processing-in-java8-streams