环境
ubuntu14、flink1.7.2、scala2.11、kafka2.3.0、jdk1.8、idea2019
步骤
- 抽取出业务时间戳,告诉 Flink 框架基于业务时间做窗口
- 过滤出点击行为(pv)数目
- 按一小时的窗口大小,每 5 分钟统计一次,做滑动窗口聚合(Sliding Window)
- 按每个窗口聚合,输出每个窗口中点击量前 N 名的商品
实现
创建maven项目,命名UserBehaviorAnalysis,其pom内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.ustc</groupId>
<artifactId>UserBehaviorAnalysis</artifactId>
<packaging>pom</packaging>
<version>1.0-SNAPSHOT</version>
<!--最外层声明所有子模块共用的版本信息-->
<properties>
<flink.version>1.7.2</flink.version>
<scala.binary.version>2.11</scala.binary.version>
<kafka.version>2.3.0</kafka.version>
</properties>
<!--热门商品统计子模块-->
<modules>
<module>HotItemsAnalysis</module>
</modules>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_${scala.binary.version}</artifactId>
<version>${kafka.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
</dependencies>
<!--引入公有插件-->
<build>
<plugins>
<!--将scala编译成class文件-->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.4.6</version>
<executions>
<execution>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<discriptorRef>jar-with-dependencies</discriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
在该项目中创建子模块(右击->module),取名 HotItemsAnalysis,其pom内容(保持默认内容即可):
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:sche