[后端] 关于java源码中Stream#reduce的一个解读

最新推荐文章于 2024-10-10 10:03:30 发布

eason__sz

最新推荐文章于 2024-10-10 10:03:30 发布

阅读量259

点赞数 7

文章标签： java 后端

本文链接：https://blog.csdn.net/m0_46480384/article/details/119393349

版权

本文深入探讨了Java Stream#reduce方法的三种重载形式，特别是第三个方法中累加器与合并器的区别和作用。通过示例代码展示了在单线程和多线程环境下，累加器与合并器如何协同工作，揭示了它们在并行流处理中的关键角色。同时，强调了在多线程数据同步中预处理方案的重要性，并指出combiner在并行流中确保结果一致性的必要性。

摘要由CSDN通过智能技术生成

如题，最近遇到一个归约问题引发的源码思考

首先 Stream#reduce 有3个重载方法如下：

	T reduce(T identity, BinaryOperator<T> accumulator);
	
    Optional<T> reduce(BinaryOperator<T> accumulator);

    <U> U reduce(U identity,
                 BiFunction<U, ? super T, U> accumulator,
                 BinaryOperator<U> combiner);

第一个方法有两个入参：

T identity : 意思为累加器 accumulator 的起始点
BinaryOperator accumulator : 累加器

此方法比较好理解，归约的过程即对于传入的 identity，将 Stream 流中的每个元素执行传入的累加器 accumulator 中的操作后赋予 identity。

第二个方法只有一个入参，意思与上述类似，区别在于内置的 identity 初始为null，累加器 accumulator 处理完以后的值还是可能为 null，所以返回值用 Optional 包装了一下。

本文的重点在于第三个方法的三个参数：

U identity：意思为累加器 accumulator 的起始点
BiFunction<U, ? super T, U> accumulator：累加器
BinaryOperator< U > combiner：合并器

先来看下官方文档的解释：
accumulator - an associative, non-interfering, stateless function for incorporating an additional element into a result.
累加器 - 一种关联的、无干扰的、无状态的函数，用于将附加元素合并到结果中.
combiner - an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function
组合器 - 一个关联的、无干扰的、无状态的函数，用于组合两个值，它必须与累加器函数兼容.

其中重点在于以上标黄字体，其中官方还给了一个具体的解释：
Stream#reduce源码注释
其中框住的描述大致意思为：
标识值必须是组合器功能的标识。这意味着对于所有 u，combiner(identity, u) 之后的值类型等于 u。此外，组合器功能必须与累加器功能兼容；对于所有 u 和 t，以下必须成立：
combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)

咋一看还是比较疑惑的，既然以上等式需要成立，为什么还要单独设计累加器和组合器呢？
让我们来看一组代码揭开迷惑：

    private static void testForReduce() {
        ArrayList<Integer> list = new ArrayList<>();
        list.add(2);
        list.add(3);
        list.add(4);
        list.add(5);
        Integer reduce = list.stream().reduce(
                1,
                (x, y) -> {
                    System.out.println("------accumulator--------"
                            + "\nvalue:" + (x + y + 2)
                    );
                    return x + y + 2;
                },
                (x, y) -> {
                    System.out.println("------combiner--------"
                            + "\nvalue:" + (x * y)
                    );
                    return x * y;
                }
        );
        System.out.println("result:"+reduce);
    }

你可以大胆猜测一下结果，如果结果是23，那么建议直接跳到文末看结论即可。
以下为上述代码执行流程：
Stream#reduce串行的流程

以上流程中细心的人可能发现了，为什么没有 combiner 的执行流程呢？
注：以上代码中，combiner 确实不参与 reduce 的整个过程中的计算.

那么设计 combiner 的意义何在呢？
下面我们来改造一下代码：

    private static void testForReduce() {
        ArrayList<Integer> list = new ArrayList<>();
        list.add(2);
        list.add(3);
        list.add(4);
        list.add(5);
        Integer reduce = list.parallelStream().reduce(
        		1,
                (x, y) -> {
                    System.out.println("------accumulator--------\n"
                            + "ThreadName: " + Thread.currentThread().getName()
                            + "\nvalue:" + (x + y + 2) + "\n"
                    );
                    return x + y + 2;
                },
                (x, y) -> {
                    System.out.println("------combiner--------\n"
                            + "ThreadName: " + Thread.currentThread().getName()
                            + "\nvalue:" + (x * y)
                    );
                    return x * y;
                }
        );
        System.out.println("result:"+reduce);
    }

以上代码相比于第一次执行的代码：一是将 stream() 改成了 parallelStream()，二是每次打印了当前Thread 名字，下面来看看日志来分析：

------accumulator--------
ThreadName: ForkJoinPool.commonPool-worker-2
value:8

------accumulator--------
ThreadName: ForkJoinPool.commonPool-worker-11
value:5

------accumulator--------
ThreadName: ForkJoinPool.commonPool-worker-9
value:6

------accumulator--------
ThreadName: main
value:7

------combiner--------
ThreadName: ForkJoinPool.commonPool-worker-9
value:30
------combiner--------
ThreadName: main
value:56
------combiner--------
ThreadName: main
value:1680

result:1680

结合以上日志，代码的执行流程为：
stream#reduce并行的执行流程
结合上面的图片，可以清晰的看到：combiner 的执行过程为多个线程之前的合并数据流，所以在之前的单线程处理时，combiner 并不生效。

结论：Stream#reduce 方法的 combiner 在单线程工作时并不生效，combiner 的生效时间在多线程合并时，所以为了保证串行流与并行流的结果一致性，设计了上文中提及的原则：
combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)
如果不满足以上原则，则会造成本文中的两次结果不一致的情形。

引申思考：