1、搭建maven工程 flink
(1)工程创建操作步骤,如图
工程创建ok
2、搭建maven工程 flink
(1)pom文件
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.7.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.7.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- 该插件用于将Scala代码编译成class文件 -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<!-- 声明绑定到maven的compile阶段 -->
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
(2)添加scala框架 和 scala文件夹
准备数据:
//D:\tmp文件目录下创建一个hello.txt文本,内容有:
Hello World Flink!
I Love China!
Really?
Yes,I veay China,Flink?
离线,代码实现:
package com.study.flink1205
import org.apache.flink.api.scala.{AggregateDataSet, DataSet, ExecutionEnvironment}
import org.apache.flink.api.scala._
//离线
object DateSetWcApp {
def main(args: Array[String]): Unit = {
//构造执行环境
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
//读取文件
val txtDataSet: DataSet[String] = env.readTextFile("D:\\tmp\\hello.txt")
//经过groupby进行分组,sum进行聚合
val aggSet: AggregateDataSet[(String, Int)] = txtDataSet.flatMap(_.split(" ")).map((_, 1)).groupBy(0).sum(1)
aggSet.print()
}
}
启动程序,控制台打印信息:
实时,代码实现
package com.study.flink1205
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.api.scala._ // flatMap和Map需要引用的隐式转换
//实时
object DataStreamWcApp {
def main(args: Array[String]): Unit = {
//创建流处理环境
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//接收socket文本流
val dataStream: DataStream[String] = env.socketTextStream("hadoop105", 7777)
//处理 分组并且sum聚合
val sumStream: DataStream[(String, Int)] = dataStream.flatMap(_.split(" ")).filter(_.nonEmpty).map((_, 1)).keyBy(0).sum(1)
//打印
sumStream.print()
env.execute()
}
}
虚拟机,执行命令:
[root@hadoop105 module]# nc -lk 7777
启动程序,虚拟机输入数据信息
注意:Flink程序支持java 和 scala两种语言
在引入包中,有java和scala两种包时注意要使用scala的包