一、开发环境
1、Java 8
2、IDEA 2021.03
3、Maven 3.6.1
4、Flink 1.13.0
5、Git
二、搭建项目
创建一个Maven项目
选择本地Maven仓库
添加依赖
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<flink.version>1.13.0</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.12</scala.binary.version>
<slf4j.version>1.7.30</slf4j.version>
</properties>
<dependencies>
<!--引入Flink依赖-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!--引入日志相关依赖-->
<!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-api -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.30</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-to-slf4j -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-to-slf4j</artifactId>
<version>2.14.0</version>
</dependency>
</dependencies>
日志文件配置,创建log4j.properties
文件,并且添加如下内容
log4j.rootLogger=error,stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.lo4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%-4r [%t] % - 5p %c %x - %m%n
三、项目编写
创建input包,并且创建words.txt
文件并在其中添加下面内容
hello world
hello flink
hello java
创建BatchWordCount
类,添加如下内容
package com.atguigu.wc;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.AggregateOperator;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.operators.FlatMapOperator;
import org.apache.flink.api.java.operators.UnsortedGrouping;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
/**
* @author
* @date 2022/6/20 10:11
*/
public class BatchWordCount {
public static void main(String[] args) throws Exception {
//1、创建执行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//2、从文件中读取数据
DataSource<String> stringDataSource = env.readTextFile("input/words.txt");
//3、将每行数据进行分词,转换成二元组类型
FlatMapOperator<String, Tuple2<String, Long>> wordAndOneTuple = stringDataSource.flatMap((String line, Collector<Tuple2<String, Long>> out) ->
//将一行文本进行分词
{
String[] words = line.split(" ");
//将每个单词转换成二元组输出
for (String word : words) {
out.collect(Tuple2.of(word, 1L));
}
}
).returns(Types.TUPLE(Types.STRING, Types.LONG));
//4、按照word进行分组
UnsortedGrouping<Tuple2<String, Long>> wordAndOneGroup = wordAndOneTuple.groupBy(0);
//5、分组内进行聚合统计
AggregateOperator<Tuple2<String, Long>> sum = wordAndOneGroup.sum(1);
//6、打印结果输出
sum.print();
}
}
运行结果,将words.txt
文件中单词的个数进行的汇总输出,结果如下图所示
Flink 快速上手之使用DataSet API
实现批处理就全部完成了。