下面是Flink一个基于窗口的单词统计入门案例。
1. 依赖
根据自己的scala版本进行选择。
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.12</artifactId>
<version>1.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>1.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.12</artifactId>
<version>1.13.0</version>
</dependency>
2. 流处理
窗口是5秒,即每5秒打印一次窗口数据。
package ace.gjh.stream
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
/**
* 流计算-单词计数
*
* @author ACE_GJH
* @date 2021/5/7
*/
object WordCount {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val dataStream = env.socketTextStream("localhost", 9999)
dataStream
.flatMap(_.split(" "))
.map((_, 1))
.keyBy(_._1)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.sum(1)
.print()
env.execute("word-count")
}
}
3. 数据源
nc -lk 9999