Flink转换算子

1. 基本转换操作

2.1. map

在这里插入图片描述

val streamMap = stream.map { x => x * 2 }

2.2 flapMap

val streamFlatMap = stream.flatMap{
    x => x.split(" ")
}

2.3. Filter

在这里插入图片描述

val streamFilter = stream.filter{
    x => x == 1
}

2. 分组操作

2.1. KeyBy

在这里插入图片描述
DataStream → KeyedStream:逻辑地将一个流拆分成不相交的分区,每个分区包含具有相同key的元素,在内部以hash的形式实现的。

import org.apache.flink.api.java.functions.KeySelector
import org.apache.flink.api.scala._

object WordCount {
    def main(args: Array[String]): Unit = {


        val env = StreamExecutionEnvironment.getExecutionEnvironment

        val inputDataStream = env.fromCollection(List(
            "sensor1,193,39.5",
            "sensor1,194,38.5",
            "sensor1,195,40.5",
            "sensor2,196,39.8",
            "sensor1,197,39.1",
            "sensor2,198,34.5",
            "sensor2,199,37.1"
        ))

        // DataStream → KeyedStream:逻辑地将一个流拆分成不相交的分区,每个分区包含具有相同key的元素,在内部以hash的形式实现的。
        // 注意:这里是逻辑的将一个流拆分成不相交的分区,泛型分别表示输入数据类型和输出key的类型
        // keyBy() 参数有四种写法

        val resultDataStream = inputDataStream
            .map { data =>
                val dataArray = data.split(",")
                SensorReading2(dataArray(0), dataArray(1).toLong, dataArray(2).toDouble)
            }
        // 1. 根据元素位置
        val resultStream2: DataStream[SensorReading2] = resultDataStream.keyBy(0).sum(1)

        // 2. 根据元素样例类属性 sum等滚动聚合函数只能通过 1.元素位置索引 2.元素名称
        val resultStream3: DataStream[SensorReading2] = resultDataStream.keyBy(data => data.id).sum(1)

        // 3. 根据元素样例类的属性名
        val resultStream4: DataStream[SensorReading2] = resultDataStream.keyBy("id").sum("temperature")

        // 4. 根据自定义选择器
        val resultStream5: DataStream[SensorReading2] = resultDataStream.keyBy(new MyIDSelector()).sum("temperature")

        resultStream2.print()
        resultStream3.print()
        resultStream4.print()
        resultStream5.print()

        env.execute()
    }


}

class MyIDSelector() extends KeySelector[SensorReading2, String] {
    override def getKey(value: SensorReading2): String = value.id
}

case class SensorReading2(id: String, timestamp: Long, temperature: Double)

3. 滚动聚合算子(Rolling Aggregation)

3.1 sum/min/max/minBy/maxBy

这些算子可以针对KeyedStream的每一个支流做聚合。

sum()
min()
max()
minBy()
maxBy()

注意:

这些滚动聚合函数只能根据 1.元素位置索引 2.元素名称 进行滚动操作

3.2 reduce

import org.apache.flink.api.common.functions.ReduceFunction
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.streaming.api.scala._

object WordCount {


    def main(args: Array[String]): Unit = {
        val env = StreamExecutionEnvironment.getExecutionEnvironment

        val collectionStgream = env.fromCollection(List(
            ("sensor1", 193, 39.5),
            ("sensor1", 194, 38.5),
            ("sensor1", 195, 40.5),
            ("sensor2", 196, 39.8),
            ("sensor1", 197, 39.1),
            ("sensor2", 198, 34.5),
            ("sensor2", 199, 37.1)
        ))

        val resultStream: DataStream[SensorReading2] = collectionStgream.map(data => SensorReading2(data._1, data._2.toLong, data._3.toDouble))

        val resultStream2: KeyedStream[SensorReading2, Tuple] = resultStream.keyBy(0)

        resultStream2.print("keyBy结果:") // 结果:SensorReading2(sensor2,196,39.8)

        // reduce算子参数可以是 1.匿名函数 2.继承ReduceFunction接口的对象

       // 1.匿名函数
        val resultStream3: DataStream[SensorReading2] = resultStream2.reduce {
            (curData, newData) =>
                // 根据id聚合,取一个分组中的最大时间戳,取最小温度
                SensorReading2(curData.id, curData.timestamp.max(newData.timestamp), curData.temperature.min(newData.temperature))
        }

        // 2.继承ReduceFunction接口的对象
        val resultStream4: DataStream[SensorReading2] = resultStream2.reduce {
            (curData, newData) =>
                // 根据id聚合,取一个分组中的最大时间戳,取最小温度
                SensorReading2(curData.id, curData.timestamp.max(newData.timestamp), curData.temperature.min(newData.temperature))
        }

        resultStream3.print("reduce结果:")
        resultStream4.print("自定义ReduceFunction结果:")

        env.execute("xxxx")

    }
}

class MyReduceFunction extends ReduceFunction[SensorReading2]{
    // 以往的数据curData和不断到来的数据newData进行聚合,返回聚合后的数据
    override def reduce(curData: SensorReading2, newData: SensorReading2): SensorReading2 = {
        SensorReading2(curData.id,curData.timestamp.max(newData.timestamp) , curData.temperature.min(newData.temperature))
    }
}

case class SensorReading2(id: String, timestamp: Long, temperature: Double)

4. 分流操作

4.1. Split 和 Select

import org.apache.flink.streaming.api.scala._

object SplitAndSelect {
    def main(args: Array[String]): Unit = {

        val env = StreamExecutionEnvironment.getExecutionEnvironment

        val collectionStgream = env.fromCollection(List(
            ("sensor1", 193, 39.5),
            ("sensor1", 194, 38.5),
            ("sensor1", 195, 40.5),
            ("sensor2", 196, 39.8),
            ("sensor1", 197, 39.1),
            ("sensor2", 198, 34.5),
            ("sensor2", 199, 37.1)
        ))

        val resultStream = collectionStgream.map {
            data => SensorReading3(data._1, data._2.toLong, data._3.toDouble)
        }

        // 分流,对流入的数据贴标签
        val splitStream = resultStream.split {
            data =>
                if (data.temperature > 39) {
                    Seq("high")
                } else {
                    Seq("low")
                }
        }

        // 根据标签选择需要的流
        val hightStream: DataStream[SensorReading3] = splitStream.select("high")
        val lowStream: DataStream[SensorReading3] = splitStream.select("low")
        val allStream: DataStream[SensorReading3] = splitStream.select("high","low")

        hightStream.print("high")
        lowStream.print("low")
        allStream.print("low-high")
        env.execute("xxxx")

    }
}

case class SensorReading3(id: String, timestamp: Long, temperature: Double)

5. 合流操作

5.1 connect 和 coMap

5.1.2 connect

在这里插入图片描述

DataStream,DataStream → ConnectedStreams:连接两个保持他们类型的数据流,两个数据流被Connect之后,只是被放在了同一个流中,内部依然保持各自的数据和形式不发生任何变化,两个流相互独立。

5.1.2 coMap和coFlatMap

在这里插入图片描述
ConnectedStreams → DataStream:作用于ConnectedStreams上,功能与map和flatMap一样,对ConnectedStreams中的每一个Stream分别进行map和flatMap处理。

        val Stream1: DataStream[(Double, String)] = hightStream.map(data => (data.temperature, "warning information"))

        // 1.连接两条流,两条流的类型可以不一样
        val connectedStream: ConnectedStreams[(Double, String), SensorReading3] = Stream1.connect(lowStream)


        // 2.合并两条流,合并的两条流的类型仍然可以不一样 这里的map不能写中括号,否则报错
        val coStream: DataStream[Any] = connectedStream.map(
            stream1 => stream1._1,
            stream2 => (stream2.temperature, "normal information")
        )

        coStream.print("coStream:")

        env.execute("xxxx")

5.2 union

在这里插入图片描述
DataStream → DataStream:对两个或者两个以上的DataStream进行union操作,产生一个包含所有DataStream元素的新DataStream,union流的数据类型必须一样。

        val unionStream: DataStream[SensorReading3] = hightStream.union(lowStream).union(allStream)

5.3 Connect与 Union 区别

  1. Union之前两个流的类型必须是一样,Connect可以不一样,可以在之后的coMap中再去调整成为一样的,但仍然可以保持不一样。
  2. Connect只能操作两个流,Union可以操作多个。
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值