Flink对数据分区
前言:
Flink其实是不存储键值对数据的,但是我们可以人为创建key,将其分组。下面介绍如何使用map算子人为将数据分区。
介绍keyBy()用法
1 按照元组的元素来分区
. //流处理环境
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> text = see.readTextFile("datas/1.txt");
text.flatMap(new FlatMapIterator<String, String>() {
@Override
public Iterator<String> flatMap(String s) throws Exception {
return Arrays.asList(s.split("\n")).iterator();
}
}).map(new MapFunction<String, Tuple2<String,Integer>>() {
@Override
public Tuple2<String,Integer> map(String s) throws Exception {
return new Tuple2<>(s,1);
}
}).keyBy(1).print();
//keyBy(0)代表按照元组第一个元素进行分区
//keyBy(1)代表按照元组第二个元素进行分区
//keyBy(1,2)代表按照元组第二个和第三个元素联合来进行分区
2 按照pojo类对象的字段来分区
//流处理环境
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.fromElements(
new People("张三",18,1),new People("李四",18,1),new People("杨倩",18,0)
).keyBy("sex").print();
pojo类需要满足的条件:
字段名必须声明为public的;
必须有默认的无参构造器;
所有构造器必须声明为public的
hashCode需要重写覆盖
3、使用“key selector”函数自定义key分区
//流处理环境
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.fromElements(
new People("张三",18,1),new People("李四",18,1),new People("杨倩",18,0)
).keyBy(new KeySelector<People, Object>() {
@Override
public Object getKey(People people) throws Exception {
if (people.sex==1){
return "男";
}
else return "女";
}
}).print();
//这里根据性别0,1标记为男女进行分区