Flink

spark和flink设计理念
Spark:是以批次处理为理念设计出来的,后期加上的流处理。(伪实时处理)
Flink:以流处理为理念设计出来的。

DataSet API:是Flink用于批处理应用程序的核心
DataStream API:流处理
Gelly:是一个可扩展图形处理和分析库。

execute用法
批处理,将数据sink到指定地点需要调用execute方法
流处理,必须调用execute 方法

Flink原理结构图
在这里插入图片描述
Flink集群架构
在这里插入图片描述

flink集群启动命令
在这里插入图片描述
开启集群
启动高可用主节点,hadoop101,hadoop102
启动从节点

作业管理器jobManager
四张图的创建地点:
四张图:streamGraph、jobGraph、ExecutionGraph、物理执行图

  • 在client端:streamGraph→jobGraph
  • 在jobManager端:jobGraph→ExecutionGraph
  • 在taskmanager端:ExecutionGraph→物理执行图

JobManager作用
1:画图jobGraph→ExecutionGraph
2:负责协调job作业的分发
3:协调taskmanager做好checkpoint检查点
4:负责管理worker(taskmanager)节点
5:接收worker节点的执行状态以及心跳信息

并行度
.setParallelism(N):动态概念,管理并行度。
Slot:静态概念,配置了多少个slot,证明有多大并行执行任务的能力。

任务链
达成条件:one-to-one模式,并行度相同

union和connect算子
union:整合前数据类型必须一致,可以多条dataStream进行整合
connect:整合前数据类型可以不一致,在后面通过CoMap或者CoFlatMap将数据流变成一致就可以,只能操作两条数据流

//union
    val listDataStream1: DataStream[String] = env.fromCollection(List("1","2","3","4"))
    val listDataStream2: DataStream[String] = env.fromCollection(List("mysql hive", "hbase", "hadoop", "hbase"))
    val dataStream3 = env.fromCollection(List("tom jerry", "hauhau dahuang", "xiaoming", "xiaohong"))
    val result: DataStream[String] = listDataStream1.union(listDataStream2)
//    result.print().setParallelism(1)
    val result1: DataStream[String] = listDataStream1.union(listDataStream2,dataStream3)
    result1.print().setParallelism(1)
 //Connect:整合两条数据流,整合前数据类型可以不一样,输出前需转换成一致
    // CoMap和CoFlatMap:专门操作通过connect整合后的数据流
    val listDataStream1: DataStream[Int] = env.fromCollection(List(1,2,3,4))
    val listDataStream2: DataStream[String] = env.fromCollection(List("mysql hive", "hbase", "hadoop", "hbase"))
    val result: ConnectedStreams[Int, String] = listDataStream1.connect(listDataStream2)
    val result1: DataStream[String] = result.flatMap(new CoFlatMapFunction[Int, String, String] {
      override def flatMap1(in1: Int, collector: Collector[String]): Unit = {
        val string: String = in1.toString
        collector.collect(string)
      }

      override def flatMap2(in2: String, collector: Collector[String]): Unit = {
//        collector.collect(in2)
        val strings: Array[String] = in2.split(" ")
        for (x<-strings){
          collector.collect(x)
        }
      }
    })
    result1.print().setParallelism(1)

Flink自定义UDF函数(过滤函数)

函数类(Function Classes)

package flink.chapter3

import org.apache.flink.api.common.functions.FilterFunction
import org.apache.flink.api.scala._

class FilterFun extends FilterFunction[String]{
  override def filter(value: String): Boolean = {
    //value.startsWith("h")
    value.contains("o")
  }
}

object Enter3{
  def main(args: Array[String]): Unit = {
    val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val data: DataSet[String] = env.fromElements("hadoop","spark","hive")
    val result: DataSet[String] = data.filter(new FilterFun)
    result.print()

  }
}

第二种(将函数实现成匿名类):

package flink.chapter3

import org.apache.flink.api.common.functions.RichFilterFunction
import org.apache.flink.api.scala._

object RichFilter_Demo {
  def main(args: Array[String]): Unit = {
    val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val data: DataSet[String] = env.fromElements("hadoop","spark","hive")
    val result: DataSet[String] = data.filter(new RichFilterFunction[String] {
      override def filter(t: String): Boolean = {
        t.contains("i")
      }
    })
    result.print()
  }
}

自定义富函数传参(Rich Functions)

package flink.chapter3
import org.apache.flink.api.common.functions.FilterFunction
import org.apache.flink.api.scala._
class FilterFun1(word:String)extends FilterFunction[String]{
  override def filter(value: String): Boolean = {
    value.startsWith(word)
  }
}
object Enter4{
  def main(args: Array[String]): Unit = {
    val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val data: DataSet[String] = env.fromElements("hadoop","spark","hive")
    val result: DataSet[String] = data.filter(new FilterFun1("s"))
    result.print()
  }
}

匿名函数(Lambda Functions)

package flink.chapter3
import org.apache.flink.api.scala._
object DefineFun {
  def main(args: Array[String]): Unit = {
    val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val data: DataSet[String] = env.fromElements("hadoop","spark","hive")
    val result: DataSet[String] = data.filter(_.startsWith("s"))
    result.print()
  }
}

富函数(Rich Functions)

package flink.chapter3

import org.apache.flink.api.common.functions.RichFlatMapFunction
import org.apache.flink.api.scala._
import org.apache.flink.configuration.Configuration
import org.apache.flink.util.Collector

class DefineMap extends RichFlatMapFunction[String,(Int,String)]{
var subTask = 0

  override def open(parameters: Configuration): Unit = {
    //获取子任务编号
    subTask=getRuntimeContext.getIndexOfThisSubtask
  }

  override def flatMap(in: String, collector: Collector[(Int, String)]): Unit = {
    //输出前需要收集信息,通过collector.collect方法
    collector.collect(subTask,in)
  }

  override def close(): Unit = {
  }
}

object enter5{
  def main(args: Array[String]): Unit = {
    val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val data: DataSet[String] = env.fromElements("hadoop","spark","hive")
    val result: DataSet[(Int, String)] = data.flatMap(new DefineMap)
    result.print()
  }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值