Java之Stream学习(二)

最新推荐文章于 2024-08-19 17:15:00 发布

轨迹R

最新推荐文章于 2024-08-19 17:15:00 发布

阅读量337

点赞数

分类专栏： Java基础文章标签： java 学习算法

本文链接：https://blog.csdn.net/for2018/article/details/128264179

版权

本文深入探讨Java Stream的实现，从Stream背后的执行者ReferencePipeline到Spliterator的作用，再到终端操作collect的Collector接口。分析了Collector的源码、实现类及接口，最后进行了个人总结，指出尽管底层实现复杂，但理解其工作原理有助于提升应用能力和学习架构设计。

摘要由CSDN通过智能技术生成

Java之Stream学习(二)

上文谈到的Stream，主要是针对Stream定义以及对Stream业务场景应用和常见的操作上，并没有在Stream内部实现上深究，那么这次就浅浅的探究一下，Stream实现的原理。

1. Stream背后的执行者

Stream 是一个接口，只是定义了接口行为，所以并不是真正的实现者。那么最终的执行者是谁呢？

我们从Stream操作的对象Collection入手，发现是借助了StreamSupport 支持类来返回一个Stream流，最终追溯其实是ReferencePipeline实现类，准确的说是ReferencePipeline.Head 内部静态类实现的

ReferencePipeline是一个抽象类

// Collection    
default Stream<E> stream() {
   
        return StreamSupport.stream(spliterator(), false);
    }

    //StreamSupport类代码
	/**从Spliterator中创造一个顺序或并行的流
     * Creates a new sequential or parallel {@code Stream} from a
     * {@code Spliterator}.
     *
     * <p>The spliterator is only traversed, split, or queried for estimated
     * size after the terminal operation of the stream pipeline commences.
     *
     * <p>It is strongly recommended the spliterator report a characteristic of
     * {@code IMMUTABLE} or {@code CONCURRENT}, or be
     * <a href="../Spliterator.html#binding">late-binding</a>.  Otherwise,
     * {@link #stream(java.util.function.Supplier, int, boolean)} should be used
     * to reduce the scope of potential interference with the source.  See
     * <a href="package-summary.html#NonInterference">Non-Interference</a> for
     * more details.
     *
     * @param <T> the type of stream elements
     * @param spliterator a {@code Spliterator} describing the stream elements
     * @param parallel if {@code true} then the returned stream is a parallel
     *        stream; if {@code false} the returned stream is a sequential
     *        stream.
     * @return a new sequential or parallel {@code Stream}
     */
    public static <T> Stream<T> stream(Spliterator<T> spliterator, boolean parallel) {
   
        Objects.requireNonNull(spliterator);
        return new ReferencePipeline.Head<>(spliterator,
                                            StreamOpFlag.fromCharacteristics(spliterator),
                                            parallel);
    }

我们这里借助IDEA工具查看一下类图

在这里插入图片描述

1.1 类中泛型

这里重点在说一下关于图片以及代码中的泛型。

Stream 中的 T 表示要操作的集合或数组的类元素，例如List 泛型就是String 和 Collection中的 E保持一致
ReferencePipeline<P_IN, P_OUT> 和 Head<E_IN, E_OUT> 泛型一致，毕竟 Head是实现类 , P_IN 对应就是Stream ,P_OUT就是这个阶段的输出类型，注意是这个阶段，因为流总是执行后返回一个新的流
BaseStream<T, S> T 的类型和Stream 类型一致，S表示 Stream

1.2 Spliterator

前面文章中看到借助了StreamSupport辅助类后，并不是直接生成Head, 而是接收一个 Spliterator 对象生成的。一个中间类，那么 Spliterator 作用是什么呢？

Spliterator ：An object for traversing and partitioning elements of a source. 翻译过来就是用于 遍历和划分源元素的对象

详细点就是用于独立遍历元素对象，或者划分源元素作为新的 Spliterator (用于并行Stream) 具体看Java doc（看的头痛）

1.3 终端操作collect

collect 是 Stream的终端操作，执行它后返回一个你想要的执行结果。我们看一下方法

注意看中文解释以及参数泛型解释

	 /** 使用Collector 实现 可变的归并操作，对collect操作的解释
     * Performs a <a href="package-summary.html#MutableReduction">mutable
     * reduction</a> operation on the elements of this stream using a
     * {@code Collector}.  A {@code Collector}
     * encapsulates the functions used as arguments to
     * {@link #collect(Supplier, BiConsumer, BiConsumer)}, allowing for reuse of
     * collection strategies and composition of collect operations such as
     * multiple-level grouping or partitioning.
     *   如果是并行流 Characteristics 要求...
     * <p>If the stream is parallel, and the {@code Collector}
     * is {@link Collector.Characteristics#CONCURRENT concurrent}, and
     * either the stream is unordered or the collector is
     * {@link Collector.Characteristics#UNORDERED unordered},
     * then a concurrent reduction will be performed (see {@link Collector} for
     * details on concurrent reduction.)
     * 这是一个terminal 终端操作
     * <p>This is a <a href="package-summary.html#StreamOps">terminal
     * operation</a>.
     *  涉及并行流 不解释
     * <p>When executed in parallel, multiple intermediate results may be
     * instantiated, populated, and merged so as to maintain isolation of
     * mutable data structures.  Therefore, even when executed in parallel
     * with non-thread-safe data structures (such as {@code ArrayList}), no
     * additional synchronization is needed for a parallel reduction.
     *
     * @apiNote 举例 将聚合的strings 归并(转换)成ArrayList
     * The following will accumulate strings into an ArrayList:
     * <pre>{@code
     *     List<String> asList = stringStream.collect(Collectors.toList());
     * }</pre>
     *  将 personStream 进行分类转化成map 按照 Person::getCity 来转化
     * <p>The following will classify {@code Person} objects by city:
     * <pre>{@code
     *     Map<String, List<Person>> peopleByCity
     *         = personStream.collect(Collectors.groupingBy(Person::getCity));
     * }</pre>
     * 两级转化 先根据state转化成map 然后 将value元素按照city去转化
     * <p>The following will classify {@code Person} objects by state and city,
     * cascading two {@code Collector}s together:
     * <pre>{@code
     *     Map<String, Map<String, List<Person>>> peopleByStateAndCity
     *         = personStream.collect(Collectors.groupingBy(Person::getState,
     *                                                      Collectors.groupingBy(Person::getCity)));
     * }</pre>
     * 参数 T 按照例子1 对照的就是 String
     * @param <R> the type of the result 返回的结果类型 例如 梨子1 里面的 List<String>
     * @param <A> the intermediate accumulation type of the {@code Collector} 中间累积的类型 
     * @param collector the {@code Collector} describing the reduction
     * @return the result of the reduction
     * @see #collect(Supplier, BiConsumer, BiConsumer)
     * @see Collectors
     */
    <R, A> R collect(Collector<? super T, A, R> collector);

2. Collector基础分析

上面简单介绍了stream在执行归并操作 collect 时使用了 Collector 类，并对 Collector 作了简单的分析。下面在深入分析 Collector

先看源码主要截取定义部分

2.1 源码基础分析


/** 一个可变的归并(减少)操作:将累积的元素放入一个可变的result容器中，中间经历可能的转化 并最终形成一个representation(表现)
 * 上面的话 翻译过来就是 元素放入到一个容器 中间可能经历各种转化 并最终生成一种表现形式 可能是 Collection 例如 List 也可能经过聚合 返回 int
 * 归并操作可以顺序执行或者并行
 * A <a href="package-summary.html#Reduction">mutable reduction operation</a> that
 * accumulates input elements into a mutable result container, optionally transforming
 * the accumulated result into a final representation after all input elements
 * have been processed.  Reduction operations can be performed either sequentially
 * or in parallel.
 *
 * <p>Examples of mutable reduction operations include:
 * accumulating elements into a {@code Collection}; concatenating
 * strings using a {@code StringBuilder}; computing summary information about
 * elements such as sum, min, max, or average; computing "pivot table" summaries
 * such as "maximum valued transaction by seller", etc.  The class {@link Collectors}
 * provides implementations of many common mutable reductions.
 *
 * <p>A {@code Collector} is specified by four functions that work together to
 * accumulate entries into a mutable result container, and optionally perform
 * a final transform on the result.  They are: <ul>
 *     <li>creation of a new result container ({@link #supplier()})</li>
 *     <li>incorporating a new data element into a result container ({@link #accumulator()})</li>
 *     <li>combining two result containers into one ({@link #combiner()})</li>
 *     <li>performing an optional final transform on the container ({@link #finisher()})</li>
 * </ul>
 * Collectors 也有一系列characteristics 这是为了兼容并行以及Stream 不介绍
 * <p>Collectors also have a set of characteristics, such as
 * {@link Characteristics#CONCURRENT}, that provide hints that can be used by a
 * reduction implementation to provide better performance.
 * 一个归并操作顺序执行的实现是通过使用supplier函数式接口 创造一个result容器，调用方法处理每一个输入的元素
 * 并行操作就是将输入元素分区 每部分都创建一个容器 然后将每部分content 使用combiner合并形成一个合并结果
 * <p>A sequential implementation of a reduction using a collector would
 * create a single result container using the supplier function, and invoke the
 * accumulator function once for each input element.  A parallel implementation
 * would partition the input, create a result container for each partition,
 * accumulate the contents of each partition into a subresult for that partition,
 * and then use the combiner function to merge the subresults into a combined
 * result.
 *
 * <p>To ensure that sequential and parallel executions produce equivalent
 * results, the collector functions must satisfy an <em>identity</em> and an
 * <a href="package-summary.html#Associativity">associativity</a> constraints.
 *
 * <p>The identity constraint says that for any partially accumulated result,
 * combining it with an empty result container must produce an equivalent
 * result.  That is, for a partially accumulated result {@code a} that is the
 * result of any series of accumulator and combiner invocations, {@code a} must
 * be equivalent to {@code combiner.apply(a, supplier.get())}.
 * 不关是并行或顺序 其结果必须拥有一致性
 * <p>The associativity constraint says that splitting the computation must
 * produce an equivalent result.  That is, for any input elements {@code t1}
 * and {@code t2}, the results {@code r1} and {@code r2} in the computation
 * below must be equivalent:
 * 此处介绍 code1 a1 即result容器 t1表示每个输入的元素,accumulator表示处理的function, 最后通过function 生成 结果R
 * <pre>{@code
 *     A a1 = supplier.get();
 *     accumulator.accept(a1, t1);
 *     accumulator.accept(a1, t2); 顺序执行不拆分
 *     R r1 = finisher.apply(a1);  // result without splitting
 *
 *     A a2 = supplier.get();
 *     accumulator.accept(a2, t1);
 *     A a3 = supplier.get();
 *     accumulator.accept(a3, t2);						拆分 最后通过conbiner形成结果
 *     R r2 = finisher.apply(combiner.apply(a2, a3));  // result with splitting
 * } </pre>
 *
 * <p>For collectors that do not have the {@code UNORDERED} characteristic,