步骤
- 1.获取Flink流处理运行环境
- 2.构建一个socket源
- 3.连接9999端口发送实时数据
- 4.使用flink操作进行单词统计
- 5.输出结果
代码开发
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
object StreamingWordCount {
def main(args: Array[String]): Unit = {
/**
* 1.获取Flink流处理运行环境
* 2.构建一个socket源
* 3.连接9999端口发送实时数据
* 4.使用flink操作进行单词统计
* 5.输出结果
*/
// 1.获取flink流处理运行环境
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
// 2.构建一个socket源
val socketDataStream: DataStream[String] = env.socketTextStream("node01", 9999)
// 3.对接收到的数据转换为单词元祖
val wordDataStream: DataStream[(String, Int)] = socketDataStream.flatMap(x => x.split(" ")).map(x => (x, 1))
// 4.对拿到的单词元祖进行keyBy操作,就相当于Spark的GroupBy
val groupedDataStream: KeyedStream[(String, Int), Tuple] = wordDataStream.keyBy(0)
// 5.使用timewindow指定窗口的长度/每五秒执行一次
val windowedDataStream: WindowedStream[(String, Int), Tuple, TimeWindow] = groupedDataStream.timeWindow(Time.seconds(5))
// 6.对结果进行累加
val sumDataStream: DataStream[(String, Int)] = windowedDataStream.sum(1)
// 7.输出结果
sumDataStream.print()
// DataStream一定要执行,DataSet如果只输出到打印台不需要这一步
env.execute("StreamingWordCount")
}
}
运行结果
控制台输出结果
log4j:WARN No appenders could be found for logger (org.apache.flink.api.scala.ClosureCleaner$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
1> (spark,2)
8> (hadoop,1)
3> (hello,2)
5> (world,1)
Process finished with exit code 0