多流转换算子
2.2.3 多流转换算子
2.2.3.1 Split 和 Select
Split DataStream → SplitStream: 根据某些特征把一个 DataStream 拆分成两个或者多个DataStream。
Select SplitStream → DataStream : 从一个 SplitStream 中获取一个或者多个 DataStream。
package com.flink.transform.study
import com.flink.streamapi.study.SensorReading
import org.apache.flink.streaming.api.scala._
object TransformTest02 {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val inputPath = "/home/ied/myFlinkStudy/resources/sensor.txt"
val dataStream = env.readTextFile(inputPath)
.map(x => {
val arr = x.split(", ")
SensorReading(arr(0), arr(1).toLong, arr(2).toDouble)
})
// 将温度分为低温流和高温流
val splitStream = dataStream
.split(x => {
if (x.temperature > 30) Seq("high") else Seq("low")
})
val highTemp = splitStream.select("high")
val lowTemp = splitStream.select("low")
val allTemp = splitStream.select("high", "low")
highTemp.print("high")
lowTemp.print("low")
allTemp.print("all")
env.execute("Split Test")
}
}
2.2.3.2 Connect 和 CoMap (合并不需要相同的数据类型)
DataStream,DataStream → ConnectedStreams: 连接两个保持他们类型的数据流,两个数据流被Connect之后,只是被放在了一个同一个流中,内部依然保持各自的数据和形式不发生任何变化,两个流相互独立。
ConnectedStreams -→ DataStream:作用于 ConnectedStreams 上,功能与 map 和 flatMap 一样,对ConnectedStreams 中的每一一个 Stream 分别进行 map 和 flatMap 处理。
package com.flink.transform.study
import com.flink.streamapi.study.SensorReading
import org.apache.flink.streaming.api.scala._
object TransformTest03 {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val inputPath = "/home/ied/myFlinkStudy/resources/sensor.txt"
val streamData = env.readTextFile(inputPath)
val res = streamData
.map(x => {
val arr = x.split(", ")
SensorReading(arr(0), arr(1).toLong, arr(2).toDouble)
})
val splitStream = res
.split(x => {
if (x.temperature > 30) Seq("high") else Seq("low")
})
val highTemp = splitStream.select("high")
val lowTemp = splitStream.select("low")
// 合流 connect()
val waringStream = highTemp.map(x => (x.id, x.temperature))
val connectStream = waringStream.connect(lowTemp)
// 用 coMap 对数据进行分别处理
val coMapResult = connectStream
.map(
warningData => (warningData._1, warningData._2, "waring"),
lowData => (lowData.id, "healthy")
)
coMapResult.print("coMap")
env.execute("coMap Test")
}
}
2.2.3.3 Union(合并需要相同的数据类型)
Union DataStream → DataStream: 对两个或者两个以上的 DataStream 进行union操作,产生一个包含所有 DataStream 元素的新 DataStream。
package com.flink.transform.study
import com.flink.streamapi.study.SensorReading
import org.apache.flink.streaming.api.scala._
import scala.collection.Seq
object TransformTest04 {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val inputPath = "/home/ied/myFlinkStudy/resources/sensor.txt"
val dataStream = env.readTextFile(inputPath)
val res = dataStream
.map(x => {
val arr = x.split(", ")
SensorReading(arr(0), arr(1).toLong, arr(2).toDouble)
})
val splitStream = res
.split(x => {
if (x.temperature > 30) Seq("high") else Seq("low")
})
val highStream = splitStream.select("high")
val lowStream = splitStream.select("low")
val allStream = splitStream.select("high", "low")
val unionStream = highStream.union(lowStream, allStream)
unionStream.print("union")
env.execute("Union Test")
}
}