java8学习：用流收集数据-CSDN博客

内容来自《 java8实战》，本篇文章内容均为非盈利，旨为方便自己查询、总结备份、开源分享。如有侵权请告知，马上删除。
书籍购买地址：java8实战

下面我们将采用这样一个实体类

@Data
public class Dish {
    private final String name;
    private final boolean vegetarian;   //是否是素食
    private final int calories;  //卡路里
    private final Type type;  //类型
    public enum Type{
        MEAT,FISH,OTHER;
    }
    public Dish(String name, boolean vegetarian, int calories, Type type) {
        this.name = name;
        this.vegetarian = vegetarian;
        this.calories = calories;
        this.type = type;
    }
}

上面是一个菜单的一个实体类，下面我们添加到list中

List<Dish> menu = Arrays.asList(
        new Dish("apple",true,50, Dish.Type.OTHER),
        new Dish("chicken",false,350, Dish.Type.MEAT),
        new Dish("rich",true,150, Dish.Type.OTHER),
        new Dish("pizza",true,350, Dish.Type.OTHER),
        new Dish("fish",false,250, Dish.Type.FISH),
        new Dish("orange",true,70, Dish.Type.OTHER),
        new Dish("banana",true,60, Dish.Type.OTHER));

好了到这就可以开始进行菜单中菜的分类了，那么我们可以先按着原来的办法，按着type分类，如下

@Test
public void test() throws Exception {
    Map<Dish.Type,List<Dish>> groupByType = new HashMap<>();
    for (Dish dish : menu) {
        Dish.Type type = dish.getType();
        List<Dish> dishes = groupByType.get(type);
        if (dishes == null){  //如果为null说明第一次遇到某一个type
            dishes = new ArrayList<>();
            groupByType.put(type,dishes);
        }
        dishes.add(dish);
    }
    System.out.println(groupByType);
}

那么我们用Stream中的collect方法来实现

@Test
public void test() throws Exception {
    Map<Dish.Type, List<Dish>> collect = menu.stream().collect(Collectors.groupingBy(menu -> menu.getType()));
    System.out.println("collect = " + collect);
}

到这我们就知道了collect能给我们带来的便捷，下面将具体介绍collect的使用，以及自定义一个collect能接受的参数

使用

求上面菜单中有多少个菜

Long collect = menu.stream().collect(Collectors.counting());
//更好的做法
long count = menu.stream().count();

查找最大和最小的卡路里

//既然是找出最大和最小的卡路里，那么肯定是有比较器的，比较器比较Dish中的卡路里属性
//最大
Comparator<Dish> comparator = Comparator.comparing(Dish::getCalories);
Optional<Dish> collect = menu.stream().collect(Collectors.maxBy(comparator));
//对于返回Option很正常，万一比较列表没有值呢？所以会返回一个Option容器
//最小
Optional<Dish> collect1 = menu.stream().collect(Collectors.minBy(comparator));

汇总：summingInt()，可接收一个把对象映射为求和所需int的函数，并返回收集器，(也就是说求和操作)

//计算卡路里总和
Integer collect = menu.stream().collect(Collectors.summingInt(Dish::getCalories));
System.out.println(collect);
//下面的更好一些，避免了拆箱装箱操作
int sum = menu.stream().mapToInt(Dish::getCalories).sum();
System.out.println("sum = " + sum);
//当然有summingInt就会有summintLong,summingDouble

求卡路里平均数

Double collect = menu.stream().collect(Collectors.averagingInt(Dish::getCalories));

还有一个方法的返回值包含上面所有的信息：summarizingInt()

IntSummaryStatistics collect = menu.stream().collect(Collectors.summarizingInt(Dish::getCalories));
System.out.println("collect = " + collect);
//输出如下，包含最大值最小值等信息
输出：collect = IntSummaryStatistics{count=7, sum=1280, min=50, average=182.857143, max=350}
//想访问里面的任一属性只要通过get方法即可,比如：getMin();

连接字符串

//将菜名全部连接起来
String collect = menu.stream().map(Dish::getName).collect(Collectors.joining());
//collect = applechickenrichpizzafishorangebanana
System.out.println("collect = " + collect);
//上面的结果看不清楚，join也提供了重载的方法，可以加入分隔符
String collect1 = menu.stream().map(Dish::getName).collect(Collectors.joining(","));
//collect1 = apple,chicken,rich,pizza,fish,orange,banana
System.out.println("collect1 = " + collect1);
String collect2 = menu.stream().map(Dish::getName).collect(Collectors.joining(",","[","]"));
//collect2 = [apple,chicken,rich,pizza,fish,orange,banana]
System.out.println("collect2 = " + collect2);

ruducing

//我们可以使用reducing方法来实现之前的求和操作
//0作为初始值，Dish::getCalories每次要执行的方法会返回一个值，Integer::sum对返回的值进行的操作
                                                          //初始值   转换函数        累计函数
Integer collect = menu.stream().collect(Collectors.reducing(0, Dish::getCalories, Integer::sum));
//实现上面求卡路里最大值
//可以将下面的一个参数的reducing看做是上面三个参数的特殊形式，它把流中的第一个项目作为起点，返回值是输入的参数dish1或dish2，所以如果输入参数
//不存在也会有Option
Optional<Dish> collect = menu.stream().collect(Collectors.reducing(((dish1, dish2) -> dish1.getCalories() > dish2.getCalories() ? dish1 : dish2)));

我们前面说的reduce和现在说的collect有什么区别？
- reduce方法旨在把两个值结合起来生成一个新值，他是一个不可变的归约
- collect方法的设计就是要改变容器，从而累计要输出的结果
reducing测试

menu.stream()
      .collect(Collectors.reducing(((dish1, dish2) -> dish1.getName() + dish2.getName())));
//这段代码可以通过编译吗？
//不可以因为reducing需要一个BinaryOperator，而它的定义如下
public interface BinaryOperator<T> extends BiFunction<T,T,T>
//如此可以看出它传入的TT返回也需要是T类型，所以我们传入Dish返回的也应该是Dish类型

分组

其实前面第一个例子我们应使用了分组了，那就是根据菜单中的type进行分组

//我们给groupingBy方法传递了一个FUnction，它提取了流中每一道Dish的type，然后根据type分组，我们把传入的Function称为分类函数，因为他用来把流中的元素分成不同的组
//map的key就是type类型，value就是属于type的所有Dish
Map<Dish.Type, List<Dish>> collect = menu.stream().collect(Collectors.groupingBy(Dish::getType));

但是如果我们的分类条件并不一定是方法引用的返回值呢？比如我们要卡路里> 120的和小于120的，那该怎么分？

@Test
public void test() throws Exception {
    Map<Integer, List<Dish>> collect = menu.stream()
            .collect(Collectors.groupingBy(dish -> {
                if (dish.getCalories() <= 120) return 1;   //只需要区别开就好
                else return 2;//只是区别开就好
            }));
    System.out.println("collect = " + collect);
}
//输出：collect = {1=[Dish(name=apple, vegetarian=true, calories=50, type=OTHER), Dish(name=orange, vegetarian=true, calories=70, type=OTHER), Dish(name=banana, vegetarian=true, calories=60, type=OTHER)],
// 2=[Dish(name=chicken, vegetarian=false, calories=350, type=MEAT), Dish(name=rich, vegetarian=true, calories=150, type=OTHER), Dish(name=pizza, vegetarian=true, calories=350, type=OTHER), Dish(name=fish, vegetarian=false, calories=250, type=FISH)]}

当然返回1和2是不清楚的，如果大于120算是高卡路里，否则就是低卡路里，那么就可以定义一个枚举，然后返回枚举值加以切分就好了

多级分组

如果我们不止只想分一层，比如我们要按是否是素食和肉食分组，然后再按卡路里是否<=60分组，这次我们不返回1和2，采用枚举返回，那么该怎么做

enum MyEnum{YES,NO}
@Test
public void test() throws Exception {
    Map<Boolean, Map<MyEnum, List<Dish>>> collect = menu.stream().collect(Collectors.groupingBy(Dish::isVegetarian, Collectors.groupingBy(dish -> {
        if (dish.getCalories() <= 60) return MyEnum.YES;
        else return MyEnum.NO;
    })));
    System.out.println(collect);
}
//{false={NO=[Dish(name=chicken, vegetarian=false, calories=350, type=MEAT), Dish(name=fish, vegetarian=false, calories=250, type=FISH)]},
// true={YES=[Dish(name=apple, vegetarian=true, calories=50, type=OTHER), Dish(name=banana, vegetarian=true, calories=60, type=OTHER)],
// NO=[Dish(name=rich, vegetarian=true, calories=150, type=OTHER), Dish(name=pizza, vegetarian=true, calories=350, type=OTHER), Dish(name=orange, vegetarian=true, calories=70, type=OTHER)]}}

按group收集数据

上面多级分组我们看到可以把第二个groupingby收集器传递给外层收集器来实现多级分组。传递给第一个groupingby的第二个收集器可以是任何类型，而不一定还是一个groupingby
求每种(type)菜的个数

Map<Dish.Type, Long> collect = menu.stream().collect(Collectors.groupingBy(Dish::getType, Collectors.counting()));
//collect = {MEAT=1, FISH=1, OTHER=5}
System.out.println("collect = " + collect);

求每组最高卡路里的dish

@Test
public void test() throws Exception {
    Map<Dish.Type, Optional<Dish>> collect = menu.stream()
            .collect(Collectors.groupingBy(Dish::getType, Collectors.maxBy(Comparator.comparingInt(Dish::getCalories))));
    System.out.println("collect = " + collect);
    //collect = {MEAT=Optional[Dish(name=chicken, vegetarian=false, calories=350, type=MEAT)],
    // OTHER=Optional[Dish(name=pizza, vegetarian=true, calories=350, type=OTHER)],
    // FISH=Optional[Dish(name=fish, vegetarian=false, calories=250, type=FISH)]}
}

对于上面的代码，Map的第二个泛型是Option类型的，但是我们可以想到，如果menu中没有对应的type，那么根本就不可能到maxBy方法让其返回Option，所以在这的Option并不是很有用，反而影响了我们的查看和使用。那么我们就可以把它去掉

Map<Dish.Type, Dish> collect = menu.stream().collect(Collectors.groupingBy(Dish::getType,
        Collectors.collectingAndThen(Collectors.maxBy(Comparator.comparingInt(Dish::getCalories)), Optional::get)));
System.out.println("collect = " + collect);
//collect = {OTHER=Dish(name=pizza, vegetarian=true, calories=350, type=OTHER),
// FISH=Dish(name=fish, vegetarian=false, calories=250, type=FISH),
// MEAT=Dish(name=chicken, vegetarian=false, calories=350, type=MEAT)}

对照上面，我们发现已经去掉了Option的包装，我们是用的Collectors.collectingAndThen方法，此方法接收两个参数:要转换的收集器和转换函数，首先他会找出最大的卡路里然后再将这个最大的卡路里对象进行转换：Option::get。所以我们的输出结果中就去掉了Optional
一些其他与groupingby使用的例子
求每组的卡路里总和

Map<Dish.Type, Integer> collect = menu.stream().collect(Collectors.groupingBy(Dish::getType, Collectors.summingInt(Dish::getCalories)));
//{OTHER=680, MEAT=350, FISH=250}

和mapping组合使用

public void test() throws Exception {
    Map<Dish.Type, Set<Integer>> collect = menu.stream().collect(
            Collectors.groupingBy(Dish::getType, Collectors.mapping(dish -> {
                if (dish.getCalories() <= 120) return 1;
                else return 2;
            }, Collectors.toSet())));
    System.out.println("collect = " + collect);
}
//collect = {MEAT=[2], FISH=[2], OTHER=[1, 2]}

分区

分区是分组的一种特殊情况：使用一个Predicate函数作为分类函数，所以分区就只能分为true和false区

//分开素食和肉食
Map<Boolean, List<Dish>> collect = menu.stream().collect(Collectors.partitioningBy(Dish::isVegetarian));
//然后我们通过collect.get(true)就能找到素食了
//当然用filter过滤也可以只不过是只能过滤是素食或者是肉食的Dish了

分区的优势
- 分区的好处就比如上面代码演示了，filter只能保留一个结果的Dish，要不是素食的要么不是素食的，而分区就都能保留下来
- 分区的重载方法可以传入一个groupingby进行区内分组
```
Map<Boolean, Map<Dish.Type, List<Dish>>> collect = menu.stream().collect(Collectors.partitioningBy(Dish::isVegetarian, Collectors.groupingBy(Dish::getType)));
//根据是否是素食分区后，再将每个区内细分为各个类型  
```
- 分区后的第二个参数也可以再次传入一个分区进行二次分区
- 记住分区内只能传入返回boolean值的表达式，否则无法通过编译

收集器接口Collector

此接口为实现具体的归约操作提供了范本，之前的toList或者groupingby都是此接口实现的，自己也可以通过这个接口自定义归约实现
接口定义

public interface Collector<T, A, R> {
  /**
    * 建立新容器
    * 返回一个Supplier，他是用来创建一个空的累加器的实例，共数据收集过程使用
    */
   Supplier<A> supplier();

   /**
    * 将元素添加到结果容器
    * 会返回执行归约操作的函数，他就是将元素处理后添加到累加器的，他会有两个参数，一个是累加器，一个是元素本身
    */
   BiConsumer<A, T> accumulator();

   /**
    * 对结果容器应用最终转换
    * 必须返回在累加过程的最后要调用的一个函数，以便将累加器对象转换为整个集合操作的最终结果
    * 如果accumulator方法操作完之后已经符合期待类型，那么此方法可以原样返回不做操作
    */
   Function<A, R> finisher();

   /**
    * 合并两个结果容器：用于并行
    * 会返回一个供归约操作使用的函数，定义了对流的各个子部分进行并行处理时，各个子部分归约所得的累加器要如何合并
    */
   BinaryOperator<A> combiner();

   /**
    * 类似Spliterator中的characteristics方法
    * 会返回一个不可变的Characteristics集合，它定义了收集器的行为
    * 尤其是关于流是否可以并行归约，以及可以使用那些优化的提示
    * 包括三个枚举
    *      CONCURRENT:accumulator方法可以从多个线程同事调用，且该收集器可以并行归约流，如果收集器没有标为UNORDERED，那么它仅在用于无序数据源时才可以并行归约
    *      UNORDERED:归约结果不受流中项目的遍历和累积顺序的影响
    *      IDENTITY_FINISH:累计器对象将会直接用作归约过程的最终结果，也就意味着，将累加器A不加检查的转换为结果R是安全的
    */
   Set<Characteristics> characteristics();
    ....
  }

T是流中要收集的项目的泛型
A是累加器的类型，累加器是在手机过程中用于累积部分结果的对象
R是收集操作得到的对象的类型(收集器返回值类型)

实现一个类似toList()方法的功能

public class MyToList<T> implements Collector<T, List<T>, List<T>> {
    @Override
    public Supplier<List<T>> supplier() {
        //创建一个list作为累加器供以后使用
        return ArrayList::new;
    }
    @Override
    public BiConsumer<List<T>, T> accumulator() {
        //每次传入累加器和元素，然后把元素add到累加器
        //也可以写做 return (list,t) -> list.add(t);
        return List::add;
    }
    @Override
    public BinaryOperator<List<T>> combiner() {
        //如果是并行的，那么这就是将累加器合并的操作
        return (l1,l2) -> {
            l1.addAll(l2);
            return l1;
        };
    }
    @Override
    public Function<List<T>, List<T>> finisher() {
        //因为我们要实现的功能就是将值放入List，现在的累加器正好是我们所需要的List类型，所以直接返回就好啦
        //对于下面这个identity方法的解释
        //Returns a function that always returns its input argument
        //输入进来的在返回出去
        return Function.identity();
    }
    //
    @Override
    public Set<Characteristics> characteristics() {
        //IDENTITY_FINISH：累计器对象将会直接用作归约过程的最终结果,因为我们不需要转换为其他类型的结果
        //CONCURRENT:可以从多个线程同事调用，且该收集器可以并行归约流,但是没有标为UNORDERED，那么它仅在用于无序数据源时才可以并行归约
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.IDENTITY_FINISH,Characteristics.CONCURRENT));
    }
}

进行自定义收集而不去实现Collector
- 对于IDENTITY_FINISH的收集操作，还有一种办法可以得到同样的结果而无需从头实现新的Collectors接口
- Stream有一个重载的collect方法可以接受另外三个方法-->supplier,accumulator,combiner，也就是说不用实现自己的类而是直接把实现逻辑传入固定的参数位置就能够使用，比如把Name收集到List
```
ArrayList<Object> collect = menu.stream().map(Dish::getName).collect(
        ArrayList::new,//创建累加器
        List::add,//对每个元素实现的累加器操作
        List::addAll);//并行组合累加器的操作
```