引入坐标:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.10.1</version>
</dependency>
WordCount:懂得都懂
- 从txt文件中获取数据
public class WordCount {
public static void main(String[] args) throws Exception {
//1. 创建环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
String path = "E:\\bs\\flinkjava\\src\\main\\resources\\a.txt";
//获取数据源
DataSet<String> source = env.readTextFile(path);
//对数据进行处理
DataSet<Tuple2<String,Integer>> result = source.flatMap(new MyFlatMapper())
.groupBy(0)
.sum(1);
//打印输出
result.print();
}
//实现FlatMapFunction接口,重写flatMap方法
public static class MyFlatMapper implements FlatMapFunction<String, Tuple2<String,Integer>> {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
String[] words = s.split(" ");
for (String word : words) {
collector.collect(new Tuple2<>(word,1));
}
}
}
}
结果:
- 从数据流中获取数据:
public class StreamWC {
public static void main(String[] args) throws Exception {
//创建环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
/*String path = "E:\bs\flinkjava\src\main\resources\a.txt";
DataStreamSource<String> source = env.readTextFile(path);*/
//用parameter tool工具从程序启动参数中提取配置项
ParameterTool parameterTool = ParameterTool.fromArgs(args);
String host = parameterTool.get("host");
Integer port = parameterTool.getInt("port");
DataStreamSource<String> source = env.socketTextStream(host, port);
DataStream<Tuple2<String, Integer>> resultStream = source.flatMap(new WordCount.MyFlatMapper())
.keyBy(0).sum(1);
//打印输出
resultStream.print();
//执行
env.execute();
}
public static class MyFlatMapper implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
String[] words = s.split(" ");
for (String word : words) {
collector.collect(new Tuple2<>(word, 1));
}
}
}
}
实验结果:
总结:
类似简单ETL,
首先构造环境,然后配置获取数据源的方式(E),接着使用转换方法(T),最后输出(L)。