Flink学习笔记3-Flink框架api介绍

  1. reduce
    reduce是归并操作,它可以将KeyedStream 转变为 DataStream,实质是按照key做叠加计算。示例如下:
import org.apache.flink.api.common.functions.{RichFlatMapFunction, RichMapFunction, RichReduceFunction}
import org.apache.flink.api.java.tuple.{Tuple1, Tuple2}
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.flink.util.Collector
object ReduceOperator {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.fromElements(Tuple1.of("flink hadoop taskmanager")
      ,Tuple1.of("spark hadoop")
      ,Tuple1.of("hadoop hdfs")).flatMap(new RichFlatMapFunction[Tuple1[String],Tuple2[String,Long]] {
        override def flatMap(value: Tuple1[String], out: Collector[Tuple2[String,Long]]): Unit = {
          for(s:String <- value.f0.split(" ")){
            out.collect(Tuple2.of(s,1L))}}}).keyBy(0)
      .reduce(new RichReduceFunction[Tuple2[String, Long]] {
        override def reduce(value1: Tuple2[String, Long], value2: Tuple2[String, Long]): Tuple2[String, Long] = {
          return Tuple2.of(value1.f0,value1.f1+value2.f1)
        }
      }).print()
      env.execute("flink reduce operator")
  1. union操作
    union可以将多个流合并到一个流中,以便对合并的流进行集中统处理。是对多个流的水平拼接多个数据流必须是同类型。示例如下:
import org.apache.flink.api.java.tuple.Tuple1
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
	object UnionOperator {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val df1 = env.fromElements(Tuple1.of("flink")
      ,Tuple1.of("spark")
      ,Tuple1.of("hadoop"))
    val df2 = env.fromElements(Tuple1.of("oracle")
      ,Tuple1.of("mysql")
      ,Tuple1.of("sqlserver"))
      //将多个流合并到一个流,多个数据流必须同类型,使流数据集中处理
    df1.union(df2).print()
    env.execute("flink union operator")}}
  1. join操作
    join是根据指定的key将两个流做关联。
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.triggers.CountTrigger
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.windowing.assigners.{ProcessingTimeSessionWindows}
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.util.Collector
object JoinOperator {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val df1 = env.fromElements(
      Tuple2.apply("flink",1L)
      ,Tuple2.apply("spark",2L)
      ,Tuple2.apply("hadoop",3L))
    val df2 = env.fromElements(Tuple2.apply("flink",1L)
      ,Tuple2.apply("mysql",1L)
      ,Tuple2.apply("spark",1L))
df1.join(df2)
      .where(_._1)
      .equalTo(_._1)
      .window(ProcessingTimeSessionWindows.withGap(Time.seconds(10)))
      .trigger(CountTrigger.of(1))  //这里的含义为每到来一个元素,都会立刻触发计算。
      .apply((t1,t2,out:Collector[Tuple2[String,Long]]) =>{
        out.collect(Tuple2.apply(t1._1,t1._2+t2._2))
      })
      .print()
      env.execute("flink join operator")}}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值