Java8 Stream API基础教程

最新推荐文章于 2024-03-28 18:08:19 发布

藿香正气

最新推荐文章于 2024-03-28 18:08:19 发布

阅读量226

点赞数

分类专栏： 2021

原文链接：https://www.baeldung.com/java-8-streams

版权

2021 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

原文：The Java 8 Stream API Tutorial

1.概述

本教程，我们将介绍Java 8 Stream从创建到并行执行的实践。
读者要求：Java 8基础知识（lambda表达式，Optional，方法引用）和Stream API基础知识。可以参考之前的文章：New Features in Java 8、Introduction to Java 8 Streams

2.Stream创建

从不同资源创建一个stream实例有很多方式，一旦创建这个实例，它的资源不会更改，因此从一个单一资源可以创造多个实例。

2.1 Empty Stream

empty() 方法

Stream<String> streamEmpty = Stream.empty();

使用empty() 方法创建避免了当Stream中没有元素时返回null:

public Stream<String> streamOf(List<String> list) {
    return list == null || list.isEmpty() ? Stream.empty() : list.stream();
}

2.2 集合Stream

任何类型的集合（Collection，List，Set）都可以创建一个Stream：

Collection<String> collection = Arrays.asList("a", "b", "c");
Stream<String> streamOfCollection = collection.stream();

2.3 数组Stream

数组也可以创建流：

Stream<String> streamOfArray = Stream.of("a", "b", "c");

也可以从以存在数据之外或数组的一部分创建流：

String[] arr = new String[]{"a", "b", "c"};
Stream<String> streamOfArrayFull = Arrays.stream(arr);
Stream<String> streamOfArrayPart = Arrays.stream(arr, 1, 3);

2.4 Stream.builder()

用builder方法时，要在声明的右边明确创建的类型，否则builder()方法会创建一个 Stream实例：

Stream<String> streamBuilder =
  Stream.<String>builder().add("a").add("b").add("c").build();

2.5 Stream.generate()

generate()方法接受一个Supplier类型的参数，结果流是无限的，开发者应该明确期望的流的大小，否则generate()方法会一直执行直到达到内存限制：

// 创建有10个值为“element”的String元素的Stream
Stream<String> streamGenerated =
  Stream.generate(() -> "element").limit(10);

2.6 Stream.iterate()

另一个创建无限流的方式是用iterate()方法：

Stream<Integer> streamIterated = Stream.iterate(40, n -> n + 2).limit(20);

第一个元素就是iterate()方法的第一个参数（40）, 接下来生成个每一个元素，具体的函数会作用到之前的元素上。在这个例子中，第二个元素为42.

2.7 Stream of Primitives：基本类型的Stream

Java8提供了为三种基本类型创建流的可能性：int、long、double。Stream 是一个泛型接口，不能用基本类型作为参数，所以Java8提供了三个特殊的接口：IntStream, LongStream, DoubleStream.
用这个三个新接口减少了不必要的自动装箱，提高了效率。

IntStream intStream = IntStream.range(1, 3);
LongStream longStream = LongStream.rangeClosed(1, 3);

range(int startInclusive, int endExclusive) 方法创建了一个从第一个参数到第二个参数有序的Stream，后续元素会逐级+1。结果不包括最后一个参数，最后一个参数是这个序列的上限。
**rangeClosed(int startInclusive, int endInclusive)**方法跟上面那个方法仅有一个不同点，即包含最后一个元素。我们可以用这两个方法创建3中基本类型的Stream。
从Java8开始，Random类就提供了一些生成基本类型的流的方法，比如，创建一个有3中元素的DoubleStream：

Random random = new Random();
DoubleStream doubleStream = random.doubles(3);

2.8 String 流

在chars()方法和String类的帮助下，我们也可以生成String类型的Stream。JDK中没有CharStream相关接口，我们用IntStream代表字符流（char）。

IntStream streamOfChars = "abc".chars();

根据正则表达式可以将一个String分成sub-strings：

Stream<String> streamOfString =
  Pattern.compile(", ").splitAsStream("a, b, c");

2.9 文件流

通过Java NIO类Files的lines()方法可以将文本文件（text file）生成Stream。文本文件中的每一行在Stream中为一个单独的元素：

Path path = Paths.get("C:\\file.txt");
Stream<String> streamOfStrings = Files.lines(path);
Stream<String> streamWithCharset = 
  Files.lines(path, Charset.forName("UTF-8"));

Charset可以作为lines()方法的一个参数明确制定。

3 引用流

我们可以实例化一个Stream，可以调用中间操作引用流。执行终端操作后这个Stream就关闭了。
为了说明这个概念，我们用一段不太好的、冗长的但技术上是有效的代码示例：

Stream<String> stream = 
  Stream.of("a", "b", "c").filter(element -> element.contains("b"));
Optional<String> anyElement = stream.findAny();

然而，调用完终端方法findAny()后再尝试引用相同的stream会触发IllegalStateException：

Optional<String> firstElement = stream.findFirst();

IllegalStateException是RuntimeException，编译器不会报错。所以记住：Java 8 的stream流不能重复使用。Java 8 streams can’t be reused.
这个行为是合逻辑的。流的设计是对一些有限的元素以函数的形式进行一系列操作，并不是为了存储元素。
如果想让上面的代码能够正常工作，我们可以这样改变：

List<String> elements =
  Stream.of("a", "b", "c").filter(element -> element.contains("b"))
    .collect(Collectors.toList());
Optional<String> anyElement = elements.stream().findAny();
Optional<String> firstElement = elements.stream().findFirst();

4. Stream Pipeline （流管道）

要对数据源的元素执行一系列操作并聚合其结果，我们需要三个部分：数据源、一些中间操作和一个终端操作。the source, intermediate operation(s) and a terminal operation.
中间操作返回的一个新的修改后的Stream。

Stream<String> onceModifiedStream =
  Stream.of("abcd", "bbcd", "cbcd").skip(1);

如果我们需要多次修改，可以串联使用中间操作。

Stream<String> twiceModifiedStream =
  stream.skip(1).map(element -> element.substring(0, 3));

map（）方法接受一个lambda表达式作为参数。如果想了解lambda更多的知识，可以参考：Lambda Expressions and Functional Interfaces: Tips and Best Practices

一个Stream本身是没有价值的，用户关心的是Stream进行终端操作后的结果。每个Stream只能用一次终端操作。We can only use one terminal operation per stream.
使用流最方便的方式就是用流管道，一个链条链接Stream源，中间操作和终端操作：

List<String> list = Arrays.asList("abc1", "abc2", "abc3");
long size = list.stream().skip(1)
  .map(element -> element.substring(0, 3)).sorted().count();

5.懒加载 Lazy Invocation

Intermediate operations are lazy. 中间操作是懒启动的。这意味着只有当终端操作执行需要时中间操作才启动。
比如：wasCalled() 方法调用时，counter+1；

private long counter;
 
private void wasCalled() {
    counter++;
}

在中间操作filter()方法中调用wasCalled() 方法：

List<String> list = Arrays.asList(“abc1”, “abc2”, “abc3”);
counter = 0;
Stream<String> stream = list.stream().filter(element -> {
    wasCalled();
    return element.contains("2");
});

我们可能认为filter() 会被调用3次，counter值为3。然而，运行这段代码就会发现counter的值并没有发生改变，一直为0，所以filter()一次也没有被调用。原因就是这段代码并没有终端操作。
现在我们重写这段代码，添加一个map()方法操作和一个终端操作findFirst()，加一些log记录追踪方法调用的顺序：

Optional<String> stream = list.stream().filter(element -> {
    log.info("filter() was called");
    return element.contains("2");
}).map(element -> {
    log.info("map() was called");
    return element.toUpperCase();
}).findFirst();

log的结果显示filter()方法调用了2次，map()调用了1次。这是因为管道执行是垂直的。在这个例子中，第一个元素不满足filter()的判断，然后第二个元素启动了filter()方法并通过了判断。根本没有为第三个元素再次启动filter()方法，直接顺着管道到map()方法。
findFirst()方法操作仅一个元素就满足了。所以在这个特定的例子中，懒启动帮助我们避免两次方法调用：一次filter()、一次map()。

6.执行的顺序

从性能的角度来看，流管道链接操作最重要的一个方面就是执行的顺序：

long size = list.stream().map(element -> {
    wasCalled();
    return element.substring(0, 3);
}).skip(2).count();

执行这段代码counter的值会+3。这意味着我们调用了3次map()方法，但size的值为1。所以这个流的结果只有一个元素，但我们毫无理由的多执行了2次map()方法。
如果我们改变map()方法和skip()方法的顺序，counter只会+1，即只调用了一次map()方法：

long size = list.stream().skip(2).map(element -> {
    wasCalled();
    return element.substring(0, 3);
}).count();

这样我们就学到一个规则：减小流大小的中间操作应该放在作用于每个元素的中间操作之前。所以我们需要将 skip(), filter()和distinct() 之类的方法放到流管道的最前面。

7 Stream Reduction

这个API有许多终端操作方法将Stream聚合成一个类型或者一个基本类型，比如：count(), max(), min(), sum(). 这些方法会根据预定义的实现工作，如果一个开发者需要自定义一个Stream的四则运算该怎么做呢？有两个方法可以帮助我们： reduce() 和 collect() 方法。

7.1 reduce()方法

这个方法有3个变体，根据他们的方法签名和返回类型来区分。它们有以下参数：

identity – 初始值或者默认值
accumulator - 聚合元素的函数。这个函数每一步都会生成一个新的值，新value的数量等于这个Stream的大小，但只有最后一个value是有用的，这对性能来说不是很好。
combiner - 聚合accumulator结果的函数。在并行模式下，可以组合多个线程accumulator结果的值。

现在实际看一下这个三个方法：

// reduced = 1+2+3 = 6
OptionalInt reduced =
  IntStream.range(1, 4).reduce((a, b) -> a + b);

//reducedTwoParams = 10 + 1+2+3 = 16
int reducedTwoParams =
  IntStream.range(1, 4).reduce(10, (a, b) -> a + b);

int reducedParams = Stream.of(1, 2, 3)
  .reduce(10, (a, b) -> a + b, (a, b) -> {
     log.info("combiner was called");
     return a + b;
  });

这个方法的结果也是16，但没有log输出，说明combiner并没有调用。想要combiner工作，Stream必须是并行模式：

int reducedParallel = Arrays.asList(1, 2, 3).parallelStream()
    .reduce(10, (a, b) -> a + b, (a, b) -> {
       log.info("combiner was called");
       return a + b;
    });

这个结果是36。combiner调用了两次。
accumulator运行了3次，将每个元素加到identity上，这个操作是并行执行的。结果我们得到了 (10 + 1 = 11; 10 + 2 = 12; 10 + 3 = 13;). 然后combiner组合这3个结果，它需要2次迭代(12 + 13 = 25; 25 + 11 = 36).

7.2 collect()方法

collect()方法也可以对流进行修改操作。它接受一个类型为Collector的参数，这个参数可以定义操作的规则。一些常用的操作已经预定义好了，可以通过Collectors类型使用。
用下面这个List作为流的数据源：

List<Product> productList = Arrays.asList(new Product(23, "potatoes"),
  new Product(14, "orange"), new Product(13, "lemon"),
  new Product(23, "bread"), new Product(13, "sugar"));

将流转换为Collection (Collection, List or Set):

List<String> collectorCollection = 
  productList.stream().map(Product::getName).collect(Collectors.toList());

String类型：（Reducing to String）

String listToString = productList.stream().map(Product::getName)
  .collect(Collectors.joining(", ", "[", "]"));

joiner()方法可以有1～3个参数 (delimiter, prefix, suffix)，使用这个方法开发者应用suffix不必检查这个流是否到了最后。Collector已经做好了。
计算数值型流元素的平均值：

double averagePrice = productList.stream()
  .collect(Collectors.averagingInt(Product::getPrice));

数值型流元素求和：

int summingPrice = productList.stream()
  .collect(Collectors.summingInt(Product::getPrice));

averagingXX(), summingXX() 和 summarizingXX() 既可以应用到基本类型(int, long, double) 也可以对包装类型(Integer, Long, Double)使用。这些方法另一个特性就是提供映射。开发者不必额外使用map()和collect() 方法了。

收集流元素的统计信息：

IntSummaryStatistics statistics = productList.stream()
  .collect(Collectors.summarizingInt(Product::getPrice));

使用IntSummaryStatistics类型的对象实例，开发者可以通过toString() 方法生成一个统计报告，结果就是常见的String类型“IntSummaryStatistics{count=5, sum=86, min=13, average=17,200000, max=23}.”

可以使用getCount(), getSum(), getMin(), getAverage(), and getMax()方法。

根据某个函数对流元素进行分组：

Map<Integer, List<Product>> collectorMapOfLists = productList.stream()
  .collect(Collectors.groupingBy(Product::getPrice));

根据某个条件将流元素分成两个部分：

Map<Boolean, List<Product>> mapPartioned = productList.stream()
  .collect(Collectors.partitioningBy(element -> element.getPrice() > 15));

转型：将Stream转为Set，创建流一个不可改变的Set。

Set<Product> unmodifiableSet = productList.stream()
  .collect(Collectors.collectingAndThen(Collectors.toSet(),
  Collections::unmodifiableSet));

自定义collector：使用Collector.of()方法

Collector<Product, ?, LinkedList<Product>> toLinkedList =
  Collector.of(LinkedList::new, LinkedList::add, 
    (first, second) -> { 
       first.addAll(second); 
       return first; 
    });

LinkedList<Product> linkedListOfPersons =
  productList.stream().collect(toLinkedList);

8 并行流

Java8之前，并行化是复杂的。 ExecutorService 和 ForkJoin 简化了一些开发者的工作，但仍需要记住一些如何创建特定的executor，如何运行等等一些知识。Java8引入了用函数的风格实现并行的方式。
可以创建并行流，即在并行的模式在执行操作。如果数据源是集合或者数组，parallelStream() 方法可以提供帮助。

Stream<Product> streamOfCollection = productList.parallelStream();
boolean isParallel = streamOfCollection.isParallel();
boolean bigPrice = streamOfCollection
  .map(product -> product.getPrice() * 12)
  .anyMatch(price -> price > 200);

如果数据源不是集合或数组，可以用parallel() 方法：

IntStream intStreamParallel = IntStream.range(1, 150).parallel();
boolean isParallel = intStreamParallel.isParallel();

Stream API会用ForkJoin框架并行执行操作，默认会使用公共线程池，目前还没有设计使用自定义线程池，但可以用一组自定义的并行收集器。Java Stream API Parallel Collectors - overcoming limitations of standard Parallel Streams
并行模式下使用流的时候要避免一些阻塞操作，如果执行时间差不多的话推荐使用平行流。
并行模式下的stream可以转换会顺序模式：sequential() 方法

IntStream intStreamSequential = intStreamParallel.sequential();
boolean isParallel = intStreamSequential.isParallel();

9 总结

Stream API是一个强大的但理解起来比较简单的操作一系列原色的工具。合理使用可以减少大量的样板代码，创建更具有可读性的程序，提高应用程序的效率。
在本文的代码示例中，我们留下了需要未使用的Stream（没有使用close()方法或者终端操作）。在实际应用程序中，不要留下任何未使用的Stream，否则会造成内存泄露。

藿香正气

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Java8 Stream API基础教程

原文：The Java 8 Stream API Tutorial1.概述本教程，我们将介绍Java 8 Stream从创建到并行执行的实践。读者要求：Java 8基础知识（lambda表达式，Optional，方法引用）和Stream API基础知识。可以参考之前的文章：New Features in Java 8、Introduction to Java 8 Streams2.Stream创建从不同资源创建一个stream实例有很多方式，一旦创建这个实例，它的资源不会更改，因此从一个单一资源可
复制链接

扫一扫