Operators
DataStream Transformations
transformation 译为算子
Transformation | 描述 |
---|---|
Map DataStream → DataStream | 传入一个元素,返回一个元素元素之间类型可以不一样 |
FlatMap DataStream → DataStream | 传入一个元素,返回0个或多个元素,类型可以不同 |
Filter DataStream → DataStream | 通过返回值[boolean]来过滤元素,false表示过滤 |
KeyBy DataStream → KeyedStream | 根据指定的key逻辑分区。key来自数据的字段。 注意 DataStream会转换成KeyedStream |
Aggregations KeyedStream → DataStream | keyedStream 做聚合操作,如sum(),max(),min(),count()等 |
Window KeyedStream → WindowedStream | 在keyedStream上做窗口的限定,比如5秒窗口,100条记录窗口。 |
下面给出每个算子的实例代码
- Map
DataStream → DataStream
输入一个Long 类型元素,输出tuple
public class MapFunctionExample {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Long> stream = env.fromElements(5L,6L,8L,100L);
stream.map(new MyMapFunction()).print();
env.execute();
}
// 输入一个Long 类型元素,输出tuple
public static class MyMapFunction implements MapFunction<Long, Tuple2<String, Long>> {
@Override
public Tuple2<String, Long> map(Long value) throws Exception {
return Tuple2.of(String.valueOf(value), value);
}
}
}
- FlatMap
DataStream → DataStream
返回能被2整除的值
public class MyFlatMapExample {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Long> stream = env.fromElements(5L, 6L, 8L, 100L,56L, 22326L, 89230L, 100L);
stream.flatMap(new MyFlatMap()).print();
env.execute();
}
public static class MyFlatMap implements FlatMapFunction<Long, Long> {
@Override
public void flatMap(Long value, Collector<Long> out) throws Exception {
if (value % 2 == 0) {
out.collect(value);
}
}
}
}
- Filter
DataStream → DataStream
过滤所有偶数
public class MyFilterFunctionExample {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Long> stream = env.fromElements(5L,6L,8L,100L,101L,54561L);
stream.filter(new MyFilterFunction()).print();
env.execute();
}
public static class MyFilterFunction implements FilterFunction<Long> {
@Override
public boolean filter(Long value) throws Exception {
return value % 2 != 0;
}
}
}
- KeyBy Window Agg
keyBy 一般和 window,agg一起使用
根据 tuple的第一个字段分组,使用一个count窗口大小是2。输出分组后数组tuple第二个字段的sum值。
public class MyKeyByExample {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Tuple2<String, Integer>> stream = env.fromElements(
Tuple2.of("2", 2),
Tuple2.of("2", 2),
Tuple2.of("2", 3),
Tuple2.of("1", 21),
Tuple2.of("4", 4),
Tuple2.of("4", 4),
Tuple2.of("7", 8));
stream.keyBy(0).countWindow(2).sum(1).print();
env.execute();
}
}
- RichFunction
一般的operator 都会有richFunction。来初始化一些成员变量参数等(一些无法被序列化在网络中传输的对象)。
open()方法 初始化 simpleDateFormat对象。
close()方法用来关闭一些资源,如数据库连接池等(程序运行结束时调用)。
public class MyRichMapFunctionExample extends RichMapFunction<String, String> {
// SimpleFormatter 无法被序列化在网络中传播
private transient SimpleDateFormat simpleFormatter;
@Override
public void open(Configuration parameters) throws Exception {
simpleFormatter = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss");
super.open(parameters);
}
@Override
public void close() throws Exception {
super.close();
}
@Override
public String map(String value) throws Exception {
// do something
return value;
}
}
更多有关算子的用法,请浏览官网Operators
本教程的所有示例代码都已上传至Github仓库flink-toturial
关注我的公众号
了解我的最新动向