Java之Stream学习(二)
上文谈到的Stream,主要是针对Stream定义以及对Stream业务场景应用和常见的操作上,并没有在Stream内部实现上深究,那么这次就浅浅的探究一下,Stream实现的原理。
1. Stream背后的执行者
Stream 是一个接口,只是定义了接口行为,所以并不是真正的实现者。那么最终的执行者是谁呢?
我们从Stream操作的对象Collection入手,发现是借助了StreamSupport 支持类来返回一个Stream流,最终追溯其实是ReferencePipeline实现类,准确的说是ReferencePipeline.Head 内部静态类实现的
ReferencePipeline是一个抽象类
// Collection
default Stream<E> stream() {
return StreamSupport.stream(spliterator(), false);
}
//StreamSupport类代码
/**从Spliterator中创造一个顺序或并行的流
* Creates a new sequential or parallel {@code Stream} from a
* {@code Spliterator}.
*
* <p>The spliterator is only traversed, split, or queried for estimated
* size after the terminal operation of the stream pipeline commences.
*
* <p>It is strongly recommended the spliterator report a characteristic of
* {@code IMMUTABLE} or {@code CONCURRENT}, or be
* <a href="../Spliterator.html#binding">late-binding</a>. Otherwise,
* {@link #stream(java.util.function.Supplier, int, boolean)} should be used
* to reduce the scope of potential interference with the source. See
* <a href="package-summary.html#NonInterference">Non-Interference</a> for
* more details.
*
* @param <T> the type of stream elements
* @param spliterator a {@code Spliterator} describing the stream elements
* @param parallel if {@code true} then the returned stream is a parallel
* stream; if {@code false} the returned stream is a sequential
* stream.
* @return a new sequential or parallel {@code Stream}
*/
public static <T> Stream<T> stream(Spliterator<T> spliterator, boolean parallel) {
Objects.requireNonNull(spliterator);
return new ReferencePipeline.Head<>(spliterator,
StreamOpFlag.fromCharacteristics(spliterator),
parallel);
}
我们这里借助IDEA工具查看一下类图
1.1 类中泛型
这里重点在说一下关于图片以及代码中的泛型。
-
Stream 中的 T 表示要操作的集合或数组的类元素,例如List 泛型就是String 和 Collection中的 E保持一致
-
ReferencePipeline<P_IN, P_OUT> 和 Head<E_IN, E_OUT> 泛型一致,毕竟 Head是实现类 , P_IN 对应就是Stream ,P_OUT就是 这个阶段的输出类型,注意是这个阶段,因为流总是执行后返回一个新的流
-
BaseStream<T, S> T 的类型和Stream 类型一致,S表示 Stream
1.2 Spliterator
前面文章中 看到借助了StreamSupport辅助类后,并不是直接生成Head, 而是接收一个 Spliterator 对象生成的。一个中间类,那么 Spliterator 作用是什么呢?
Spliterator :An object for traversing and partitioning elements of a source. 翻译过来就是 用于 遍历和划分源元素的对象
详细点就是 用于独立遍历元素对象,或者划分源元素作为新的 Spliterator (用于并行Stream) 具体看Java doc(看的头痛)
1.3 终端操作collect
collect 是 Stream的终端操作,执行它后 返回一个你想要的执行结果。我们看一下方法
注意看中文解释 以及参数泛型解释
/** 使用Collector 实现 可变的归并操作,对collect操作的解释
* Performs a <a href="package-summary.html#MutableReduction">mutable
* reduction</a> operation on the elements of this stream using a
* {@code Collector}. A {@code Collector}
* encapsulates the functions used as arguments to
* {@link #collect(Supplier, BiConsumer, BiConsumer)}, allowing for reuse of
* collection strategies and composition of collect operations such as
* multiple-level grouping or partitioning.
* 如果是并行流 Characteristics 要求...
* <p>If the stream is parallel, and the {@code Collector}
* is {@link Collector.Characteristics#CONCURRENT concurrent}, and
* either the stream is unordered or the collector is
* {@link Collector.Characteristics#UNORDERED unordered},
* then a concurrent reduction will be performed (see {@link Collector} for
* details on concurrent reduction.)
* 这是一个terminal 终端操作
* <p>This is a <a href="package-summary.html#StreamOps">terminal
* operation</a>.
* 涉及并行流 不解释
* <p>When executed in parallel, multiple intermediate results may be
* instantiated, populated, and merged so as to maintain isolation of
* mutable data structures. Therefore, even when executed in parallel
* with non-thread-safe data structures (such as {@code ArrayList}), no
* additional synchronization is needed for a parallel reduction.
*
* @apiNote 举例 将聚合的strings 归并(转换)成ArrayList
* The following will accumulate strings into an ArrayList:
* <pre>{@code
* List<String> asList = stringStream.collect(Collectors.toList());
* }</pre>
* 将 personStream 进行分类转化成map 按照 Person::getCity 来转化
* <p>The following will classify {@code Person} objects by city:
* <pre>{@code
* Map<String, List<Person>> peopleByCity
* = personStream.collect(Collectors.groupingBy(Person::getCity));
* }</pre>
* 两级转化 先根据state转化成map 然后 将value元素按照city去转化
* <p>The following will classify {@code Person} objects by state and city,
* cascading two {@code Collector}s together:
* <pre>{@code
* Map<String, Map<String, List<Person>>> peopleByStateAndCity
* = personStream.collect(Collectors.groupingBy(Person::getState,
* Collectors.groupingBy(Person::getCity)));
* }</pre>
* 参数 T 按照例子1 对照的就是 String
* @param <R> the type of the result 返回的结果类型 例如 梨子1 里面的 List<String>
* @param <A> the intermediate accumulation type of the {@code Collector} 中间累积的类型
* @param collector the {@code Collector} describing the reduction
* @return the result of the reduction
* @see #collect(Supplier, BiConsumer, BiConsumer)
* @see Collectors
*/
<R, A> R collect(Collector<? super T, A, R> collector);
2. Collector基础分析
上面简单介绍了stream在执行归并操作 collect 时 使用了 Collector 类,并对 Collector 作了简单的分析。下面在深入分析 Collector
先看源码 主要截取定义部分
2.1 源码基础分析
/** 一个可变的归并(减少)操作:将累积的元素放入一个可变的result容器中,中间经历可能的转化 并最终形成一个representation(表现)
* 上面的话 翻译过来就是 元素放入到一个容器 中间可能经历各种转化 并最终生成一种表现形式 可能是 Collection 例如 List 也可能经过聚合 返回 int
* 归并操作可以顺序执行或者并行
* A <a href="package-summary.html#Reduction">mutable reduction operation</a> that
* accumulates input elements into a mutable result container, optionally transforming
* the accumulated result into a final representation after all input elements
* have been processed. Reduction operations can be performed either sequentially
* or in parallel.
*
* <p>Examples of mutable reduction operations include:
* accumulating elements into a {@code Collection}; concatenating
* strings using a {@code StringBuilder}; computing summary information about
* elements such as sum, min, max, or average; computing "pivot table" summaries
* such as "maximum valued transaction by seller", etc. The class {@link Collectors}
* provides implementations of many common mutable reductions.
*
* <p>A {@code Collector} is specified by four functions that work together to
* accumulate entries into a mutable result container, and optionally perform
* a final transform on the result. They are: <ul>
* <li>creation of a new result container ({@link #supplier()})</li>
* <li>incorporating a new data element into a result container ({@link #accumulator()})</li>
* <li>combining two result containers into one ({@link #combiner()})</li>
* <li>performing an optional final transform on the container ({@link #finisher()})</li>
* </ul>
* Collectors 也有一系列characteristics 这是为了兼容并行以及Stream 不介绍
* <p>Collectors also have a set of characteristics, such as
* {@link Characteristics#CONCURRENT}, that provide hints that can be used by a
* reduction implementation to provide better performance.
* 一个归并操作顺序执行的实现是通过使用supplier函数式接口 创造一个result容器,调用方法处理每一个输入的元素
* 并行操作就是将输入元素分区 每部分都创建一个容器 然后将每部分content 使用combiner合并形成一个合并结果
* <p>A sequential implementation of a reduction using a collector would
* create a single result container using the supplier function, and invoke the
* accumulator function once for each input element. A parallel implementation
* would partition the input, create a result container for each partition,
* accumulate the contents of each partition into a subresult for that partition,
* and then use the combiner function to merge the subresults into a combined
* result.
*
* <p>To ensure that sequential and parallel executions produce equivalent
* results, the collector functions must satisfy an <em>identity</em> and an
* <a href="package-summary.html#Associativity">associativity</a> constraints.
*
* <p>The identity constraint says that for any partially accumulated result,
* combining it with an empty result container must produce an equivalent
* result. That is, for a partially accumulated result {@code a} that is the
* result of any series of accumulator and combiner invocations, {@code a} must
* be equivalent to {@code combiner.apply(a, supplier.get())}.
* 不关是并行或顺序 其结果必须拥有一致性
* <p>The associativity constraint says that splitting the computation must
* produce an equivalent result. That is, for any input elements {@code t1}
* and {@code t2}, the results {@code r1} and {@code r2} in the computation
* below must be equivalent:
* 此处介绍 code1 a1 即result容器 t1表示每个输入的元素,accumulator表示处理的function, 最后通过function 生成 结果R
* <pre>{@code
* A a1 = supplier.get();
* accumulator.accept(a1, t1);
* accumulator.accept(a1, t2); 顺序执行不拆分
* R r1 = finisher.apply(a1); // result without splitting
*
* A a2 = supplier.get();
* accumulator.accept(a2, t1);
* A a3 = supplier.get();
* accumulator.accept(a3, t2); 拆分 最后通过conbiner形成结果
* R r2 = finisher.apply(combiner.apply(a2, a3)); // result with splitting
* } </pre>
*
* <p>For collectors that do not have the {@code UNORDERED} characteristic,