DataFrame转换算子
前言
- 通过从一个或多个 DataStream 生成新的 DataStream 的过程被称为 Transformation 操作。
- 在转换过程中,每种操作类型被定义为不同的 Operator
- Flink 程序能够将多个 Transformation 组成一个 DataFlow 的拓扑
map、filter
import org.apache.flink.streaming.api.scala.{
DataStream, StreamExecutionEnvironment}
object map_Filter {
def main(args: Array[String]): Unit = {
val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
import org.apache.flink.api.scala._
val sourceStream: DataStream[Int] = environment.fromElements(1,2,3,4,5,6)
val mapStream: DataStream[Int] = sourceStream.map(x =>x*10)
val resultStream: DataStream[Int] = mapStream.filter(x => x%3 ==0)
resultStream.print()
environment.execute()
}
}
输出
flatMap
object flatMap_keyBy_sum {
def main(args: Array[String]): Unit = {
//获取程序入口类
val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//从socket当中获取数据
val resultDataStream: DataStream[String] = environment.socketTextStream("node01",9999)
//导入隐式转换的包
import org.apache.flink.api.scala._
val value: DataStream[(String, Int)] = resultDataStream .flatMap(x => x.split(" ")).map(x => (x, 1))
value.print()
//执行程序
environment.execute()
}
}
输入
输出
flatMap、keyBy、Sum
flatMap
- 按照参数函数的规则将数据分为一个个元组
keyBy官网介绍:
- 根据指定key值进行分组,需要实现定义枚举类
- 指定元组的索引进行分组,即不同的元素去往不同的组
sum
- 将当前窗口的数据进行累加
keyBy、flatMap、Sum配合使用
import org.apache.flink