1、 数据集都是本地数据集,3个算子一起测试
map:个人理解就是1对1的输出,比如我传一个字符串,可以返回字符串的长度
flatmap:1对多的输出,传1个字符串,可以分割为3个字符串,但是流还是一个流(跟hive对比,就很像LATERAL VIEW这种函数,一个输入,多行输出,udtf)
filter:就是过滤,返回满足条件的(对应sql的where条件)
2、代码示例:
package com.shihuo.apitest_transform;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
public class TransformTest1_Base {
public static void main(String[] args) throws Exception{
//创建执行流处理的执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//全局并行度
env.setParallelism(1);
// 从文件读取数据
String inputPath = "/Users/wangyuhang/Desktop/FlinkTutorial/src/main/resources/sensor.txt";
DataStream<String> stringDataStream = env.readTextFile(inputPath);
// 1.map.把string转化成长度输出
//参数T R
//T就是传的数据,什么类型都行
//R就是返回的类型
DataStream<Integer> mapStream = stringDataStream.map(new MapFunction<String, Integer>() {
@Override
public Integer map(String value) throws Exception {
return value.length();
}
});
//2、flatmap,按照逗号分隔
DataStream<String> flatMapStream = stringDataStream.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> out) throws Exception {
String[] fields = value.split(",");
for(String field:fields)
out.collect(field);
}
});
//3.filter,筛选sensor_1开头的的id对应的数据
DataStream<String> filterStream = stringDataStream.filter(new FilterFunction<String>() {
@Override
public boolean filter(String value) throws Exception {
return value.startsWith("sensor");
}
});
//打印输出
mapStream.print("map");
flatMapStream.print("flatMap");
filterStream.print("filter");
env.execute();
}
}
3、输出结果