一、项目环境
项目环境搭建这里就不做过多的赘述,直接参考之前的博客搭建就可以了。
链接:https://blog.csdn.net/junR_980218/article/details/125366210
二、项目编写
1、创建input
包,编写words.txt
文件,并且在其中添加如下内容
hello world
hello flink
hello java
2、编写BoundedStreamWordCount
类,并添加如下内容
package com.atguigu.wc;
/**
* @author
* @date 2022/6/20 11:09
*/
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.Collection;
/**
* 使用有界流处理的方式来进行批处理
* @author ctgu
*/
public class BoundedStreamWordCount {
public static void main(String[] args) throws Exception {
//1、创建流式执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//2、读取文件
DataStreamSource<String> stringDataStreamSource = env.readTextFile("input/words.txt");
//3、转换计算
SingleOutputStreamOperator<Tuple2<String, Long>> wordAndOneTuple = stringDataStreamSource.flatMap((String line, Collector<Tuple2<String, Long>> out) ->
{
String[] words = line.split(" ");
for (String word : words) {
out.collect(Tuple2.of(word, 1L));
}
}).returns(Types.TUPLE(Types.STRING, Types.LONG));
//4、分组
KeyedStream<Tuple2<String, Long>, String> wordAndOneKeyStream = wordAndOneTuple.keyBy(data -> data.f0);
//5、求和
SingleOutputStreamOperator<Tuple2<String, Long>> sum = wordAndOneKeyStream.sum(1);
//6、打印输出
sum.print();
//7、自动执行
env.execute();
}
}
执行结果
至此,Flink快速上手之有界流处理的全部内容就结束了。