flink api常用操作算子
1.flink 中算子是将一个或多个DataStream转换为新的DataStream,可以将多个转换组合成复杂的数据流拓扑
2.在flink中有多种不同的DataStream类型,他们之间是通过使用各种算子进行的
3.在flink中使用scala语言开发,需要引用import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.scala._
1.1 map操作
map可以理解为映射,对每个元素进行一定的变换后,映射为另一个元素
package com.kn.operator
import org.apache.flink.api.common.functions.MapFunction
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.flink.api.java.tuple.Tuple1
object MapOperator {
def main(args: Array[String]): Unit = {
//获取环境变量
val env = StreamExecutionEnvironment.getExecutionEnvironment
//准备数据,类型DataStreamSource
val dataStreamSource = env.fromElements(Tuple1.of("flink")
,Tuple1.of("spark")
,Tuple1.of("hadoop"))
.map(new MapFunction[Tuple1[String],String] { //准备map操作,将元素做一定的转换,映射
override def map(value: Tuple1[String]): String = {
return "i like "+ value.f0
}
})
.print()
env.execute("flink map operator")
}
}
运行结果:
2> i like flink
4> i like hadoop
3> i like spark
1.1.2 scala 环境
package com.kn.operator
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.scala._
object MapOperator {
def main(args: Array[String]): Unit = {
//获取环境变量
val env = StreamExecutionEnvironment.getExecutionEnvironment
//准备数据,类型DataStreamSource
val dataStreamSource = env.fromElements(Tuple1.apply("flink")
,Tuple1.apply("spark")
,Tuple1.apply("hadoop"))
.map("i like "+_._1)
.print()
env.execute("flink map operator")
}
}
运行结果:
3> i like hadoop
2> i like spark
1> i like flink
1.2 flatmap
flatmap 可以理解为将元素摊平,每个元素可以变为0个、1个、或者多个元素。
package com.kn.operator
import org.apache.flink.api.common.functions.FlatMapFunction
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.flink.api.java.tuple.Tuple1
import org.apache.flink.util.Collector
object FlatMapOperator {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.fromElements(Tuple1.of("flink jobmanger taskmanager")
,Tuple1.of("spark streaming")
,Tuple1.of("hadoop hdfs"))
//注意这里的FlatMapFuncation函数,第一个参数为input类型,第二个参数为output类型
.flatMap(new FlatMapFunction[Tuple1[String],Tuple1[String]](){
override def flatMap(value: Tuple1[String], out: Collector[Tuple1[String]]): Unit = {
for(s:String <- value.f0.split(" ")){
out.collect(Tuple1.of(s))
}
}
})
.print()
env.execute("flink flatmap operator")
}
}
运行结果:
4> (spark)
3> (flink)
1> (hadoop)
3> (jobmanger)
4> (streaming)
3> (taskmanager)
1> (hdfs)
1.2.2 scala环境
package com.kn.operator
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.scala._
import org.apache.flink.util.Collector
object FlatMapOperator {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.fromElements(Tuple1.apply("flink jobmanger taskmanager")
,Tuple1.ap