flink中min和minby区别
一.min和minby介绍
max、min、sum 会分别返回最大值、最小值和汇总值;而 minBy 和 maxBy 则会把最小或者最大的元素全部返回。
Aggregations 为聚合函数的总称,常见的聚合函数包括但不限于 sum、max、min 等。Aggregations 也需要指定一个 key 进行聚合
例子:可以指定key的位置,也可以指定key的名称
keyedStream.sum(0);
keyedStream.sum("key");
keyedStream.min(0);
keyedStream.min("key");
keyedStream.max(0);
keyedStream.max("key");
keyedStream.minBy(0);
keyedStream.minBy("key");
keyedStream.maxBy(0);
keyedStream.maxBy("key");
二.min和minby介绍代码演示
1.min案例代码
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.ArrayList;
import java.util.List;
/**
* Author : Jackson
* Version : 2020/4/24 & 1.0
*/
public class ReduceDemo {
public static void main(String[] args) throws Exception {
//获取运行环境的上下文
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//获取数据源
List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
data.add(new Tuple3<>(0, 1, 0));
data.add(new Tuple3<>(0, 1, 1));
data.add(new Tuple3<>(0, 2, 2));
data.add(new Tuple3<>(0, 1, 3));
data.add(new Tuple3<>(1, 2, 5));
data.add(new Tuple3<>(1, 2, 9));
data.add(new Tuple3<>(1, 2, 11));
data.add(new Tuple3<>(1, 2, 13));
DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
items.keyBy(0).min(2).printToErr();
//一定要触发执行,不然没结果输出
env.execute();
打印结果:
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
Process finished with exit code 0
2.minby案例代码
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.ArrayList;
import java.util.List;
/**
* Author : Jackson
* Version : 2020/4/24 & 1.0
*/
public class ReduceDemo {
public static void main(String[] args) throws Exception {
//获取运行环境的上下文
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//获取数据源
List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
data.add(new Tuple3<>(0, 1, 0));
data.add(new Tuple3<>(0, 1, 1));
data.add(new Tuple3<>(0, 2, 2));
data.add(new Tuple3<>(0, 1, 3));
data.add(new Tuple3<>(1, 2, 5));
data.add(new Tuple3<>(1, 2, 9));
data.add(new Tuple3<>(1, 2, 11));
data.add(new Tuple3<>(1, 2, 13));
DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
items.keyBy(0).minby(2).printToErr();
//一定要触发执行,不然没结果输出
env.execute();
打印结果:
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
Process finished with exit code 0
3.分析对比
数据源
data.add(new Tuple3<>(0, 1, 0));
data.add(new Tuple3<>(0, 1, 1));
data.add(new Tuple3<>(0, 2, 2));
data.add(new Tuple3<>(0, 1, 3));
data.add(new Tuple3<>(1, 2, 5));
data.add(new Tuple3<>(1, 2, 9));
data.add(new Tuple3<>(1, 2, 11));
data.add(new Tuple3<>(1, 2, 13));
min结果:
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
minby结果:
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
结果:
都会选用最小值,第二位产生的结果不一定准确
三.同理max和maxby
1.max案例
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.ArrayList;
import java.util.List;
/**
* Author : Jackson
* Version : 2020/4/24 & 1.0
*/
public class ReduceDemo {
public static void main(String[] args) throws Exception {
//获取运行环境的上下文
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//获取数据源
List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
data.add(new Tuple3<>(0, 1, 0));
data.add(new Tuple3<>(0, 1, 1));
data.add(new Tuple3<>(0, 2, 2));
data.add(new Tuple3<>(0, 1, 3));
data.add(new Tuple3<>(1, 2, 5));
data.add(new Tuple3<>(1, 2, 9));
data.add(new Tuple3<>(1, 2, 11));
data.add(new Tuple3<>(1, 2, 13));
DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
items.keyBy(0).max(2).printToErr();
//一定要触发执行,不然没结果输出
env.execute();
结果:
6> (0,1,0)
6> (0,1,1)
6> (0,1,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)
Process finished with exit code 0
2.maxby案例
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.ArrayList;
import java.util.List;
/**
* Author : Jackson
* Version : 2020/4/24 & 1.0
*/
public class ReduceDemo {
public static void main(String[] args) throws Exception {
//获取运行环境的上下文
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//获取数据源
List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
data.add(new Tuple3<>(0, 1, 0));
data.add(new Tuple3<>(0, 1, 1));
data.add(new Tuple3<>(0, 2, 2));
data.add(new Tuple3<>(0, 1, 3));
data.add(new Tuple3<>(1, 2, 5));
data.add(new Tuple3<>(1, 2, 9));
data.add(new Tuple3<>(1, 2, 11));
data.add(new Tuple3<>(1, 2, 13));
DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
items.keyBy(0).max(2).printToErr();
//一定要触发执行,不然没结果输出
env.execute();
结果:
6> (0,1,0)
6> (0,1,1)
6> (0,2,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)
Process finished with exit code 0
3.分析对比
数据源
data.add(new Tuple3<>(0, 1, 0));
data.add(new Tuple3<>(0, 1, 1));
data.add(new Tuple3<>(0, 2, 2));
data.add(new Tuple3<>(0, 1, 3));
data.add(new Tuple3<>(1, 2, 5));
data.add(new Tuple3<>(1, 2, 9));
data.add(new Tuple3<>(1, 2, 11));
data.add(new Tuple3<>(1, 2, 13));
max结果:
6> (0,1,0)
6> (0,1,1)
6> (0,1,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)
maxby结果:
6> (0,1,0)
6> (0,1,1)
6> (0,2,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)
结论:
max、min、sum 会分别返回最大值、最小值和汇总值;而 minBy 和 maxBy 则会把最小或者最大的元素全部返回。
min 和 minBy 都会返回整个元素,只是 min 会根据用户指定的字段取最小值,并且把这个值保存在对应的位置,而对于其他的字段,并不能保证其数值正确。max 和 maxBy 同理。
事实上,对于 Aggregations 函数,Flink 帮助我们封装了状态数据,这些状态数据不会被清理,所以在实际生产环境中应该尽量避免在一个无限流上使用 Aggregations。而且,对于同一个 keyedStream ,只能调用一次 Aggregation 函数。
四.reduce案例
package flink42.day04;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.ArrayList;
import java.util.List;
/**
* Author : Jackson
* Version : 2020/4/24 & 1.0
*
*/
public class ReduceDemo {
public static void main(String[] args) throws Exception {
//获取运行环境的上下文
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//获取数据源
List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
data.add(new Tuple3<>(0, 1, 0));
data.add(new Tuple3<>(0, 1, 1));
data.add(new Tuple3<>(0, 2, 2));
data.add(new Tuple3<>(0, 1, 3));
data.add(new Tuple3<>(1, 2, 5));
data.add(new Tuple3<>(1, 2, 9));
data.add(new Tuple3<>(1, 2, 11));
data.add(new Tuple3<>(1, 2, 13));
DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
//reduce
SingleOutputStreamOperator<Tuple3<Integer, Integer, Integer>> reduceres = items.keyBy(0).reduce(new ReduceFunction<Tuple3<Integer, Integer, Integer>>() {
@Override
public Tuple3<Integer, Integer, Integer> reduce(Tuple3<Integer, Integer, Integer> t1, Tuple3<Integer, Integer, Integer> t2) throws Exception {
Tuple3<Integer, Integer, Integer> tuple3 = new Tuple3<>();
tuple3.setFields(0, 0, (Integer) t1.getField(2) + (Integer) t2.getField(2));
return tuple3;
}
});
reduceres.printToErr().setParallelism(1);
env.execute()
}
}
结果:
最后一行,才是真正想要的运行结果
(0,1,0)
(0,0,1)
(0,0,3)
(0,0,6)
(1,2,5)
(0,0,14)
(0,0,25)
(0,0,38)
Process finished with exit code 0
————保持饥饿,保持学习
Jackson_MVP