一、环境说明
工具 | 版本 |
---|---|
IDEA | 2021.3.2 |
Flink | 1.10.2 |
Scala | 2.12 |
JDK | 1.8_181 |
二、实现步骤
- 新建maven工程
- 添加flink相关依赖到pom.xml中
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.suben.bigdata</groupId> <artifactId>flink-basic</artifactId> <version>1.0-SNAPSHOT</version> <properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-scala_2.12</artifactId> <version>1.10.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-scala_2.12</artifactId> <version>1.10.1</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-simple</artifactId> <version>1.7.25</version> <scope>compile</scope> </dependency> </dependencies> <build> <plugins> <!-- 该插件用于将 Scala 代码编译成 class 文件 --> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.4.6</version> <executions> <execution> <!-- 声明绑定到 maven 的 compile 阶段 --> <goals> <goal>compile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>3.0.0</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project>
- 添加scala框架支持
- 准备批处理测试数据,新建wc.txt,添加如下内容:
I love Guizhou I love my home I love Flink I love Bigdata
- 编写批处理的wordcount,新建WordCountBatch的Object,代码如下:
import org.apache.flink.api.scala.{AggregateDataSet, DataSet, ExecutionEnvironment, createTypeInformation} object WordCountBatch { def main(args: Array[String]): Unit = { // 1. 初始化环境 val env = ExecutionEnvironment.getExecutionEnvironment val lineDS: DataSet[String] = env.readTextFile("E:\\IdeaProjects\\bigdata-sets002\\flink-basic\\data\\wc.txt") // 2. 读取数据并进行转换 val aggregateDS: AggregateDataSet[(String, Int)] = lineDS.flatMap(_.split(" ")) .map(x => (x, 1)) .groupBy(0) .sum(1) // 3. 打印结果 aggregateDS.print() } }
- 运行结果如下:
- 安装nc,我用的是Windows版本的Netcat,下载后解压,启动即可,如下图所示:
然后在控制台输入如下文字即可:I love Guizhou I love my home I love Flink I love Bigdata
- 编写流处理的wordcount,新建WordCountStream的Object类,然后添加如下代码即可:
import org.apache.flink.streaming.api.scala._ object WordCountStream { def main(args: Array[String]): Unit = { // 1. 初始化环境 val env = StreamExecutionEnvironment.getExecutionEnvironment // 2. 读取数据并进行转换 // 接收一个socket文本流 val inputDataStream: DataStream[String] = env.socketTextStream("localhost",6666) // 进行转化处理统计 val resultDataStream: DataStream[(String, Int)] = inputDataStream .flatMap(_.split(" ")) .filter(_.nonEmpty) .map((_, 1)) .keyBy(0) .sum(1) // 3. 为结果好看些,设置并发度为1 resultDataStream.print().setParallelism(1) // 启动任务执行 env.execute("stream word count") } }
- 运行代码(运行前确认下nc需要处于启动状态),正常的结果如下图所示: