flink的一些总结

本文详细介绍了Flink在窗口操作、自定义源、序列化、资源调优等方面的应用和注意事项,包括windowAll、countWindow的区别,processWindowFunction与keyedProcessFunction的差异,自定义CDC序列化,以及Flink连接Kafka的细节。还提到了Flink在YARN上的部署和配置问题,强调了并行度、slot和资源分配的调整策略。
摘要由CSDN通过智能技术生成
  • Flink提交任务到yarn上面 不一定需要flink集群 只需要一个Flink环境即可 bin/flink run -m yarn-cluster 需要提前配置好hadoop路径 如果是CDH环境的话 在.bash_profile 设置 export HADOOP_CLASSPATH=`hadoop classpath `

  • 如果出现下面问题是因为 在 flink 1.11.1 版本中 ActiveResourceManagerFactory 的 createResourceManager() 的抽象方法没有实现,而flink 1.12.0 是实现了的,在代码依赖中使用的是flink 1.11.1版本,导致次报错信息。 导致无法提交到yarn上面 解决办法 升级flink版本

org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting YarnJobClusterEntrypoint down with application status FAILED. Diagnostics java.lang.AbstractMethodError: org.apache.flink.runtime.resourcemanager.ResourceManagerFactory.createResourceManager(Lorg/apache/flink/configuration/Configuration;Lorg/apache/flink/runtime/clusterframework/types/ResourceID;Lorg/apache/flink/runtime/rpc/RpcService;Lorg/apache/flink/runtime/highavailability/HighAvailabilityServices;Lorg/apache/flink/runtime/heartbeat/HeartbeatServices;Lorg/apache/flink/runtime/rpc/FatalErrorHandler;Lorg/apache/flink/runtime/entrypoint/ClusterInformation;Ljava/lang/String;Lorg/apache/flink/runtime/metrics/groups/ResourceManagerMetricGroup;Lorg/apache/flink/runtime/resourcemanager/ResourceManagerRuntimeServices;)Lorg/apache/flink/runtime/resourcemanager/ResourceManager;
    at org.apache.flink.runtime.resourcemanager.ResourceManagerFactory.createResourceManager(ResourceManagerFactory.java:61)
    at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:167)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:216)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:517)
    at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:95)
.
2020-12-23 10:20:00,282 INFO  org.apache.flink.runtime.blob.BlobServer                     [] - Stopped BLOB server at 0.0.0.0:44886
2020-12-23 10:20:00,286 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Stopping Akka RPC service.
2020-12-23 10:20:00,299 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Stopping Akka RPC service.
2020-12-23 10:20:00,385 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2020-12-23 10:20:00,389 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
2020-12-23 10:20:00,407 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2020-12-23 10:20:00,408 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
2020-12-23 10:20:00,473 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut down.
2020-12-23 10:20:00,479 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut down.
2020-12-23 10:20:00,519 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Stopped Akka RPC service.
2020-12-23 10:20:00,519 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Stopped Akka RPC service.
2020-12-23 10:20:00,519 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Could not start cluster entrypoint YarnJobClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:517) [model-compute-1.0.0-jar-with-dependencies.jar:?]
    at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:95) [flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: java.lang.AbstractMethodError: org.apache.flink.runtime.resourcemanager.ResourceManagerFactory.createResourceManager(Lorg/apache/flink/configuration/Configuration;Lorg/apache/flink/runtime/clusterframework/types/ResourceID;Lorg/apache/flink/runtime/rpc/RpcService;Lorg/apache/flink/runtime/highavailability/HighAvailabilityServices;Lorg/apache/flink/runtime/heartbeat/HeartbeatServices;Lorg/apache/flink/runtime/rpc/FatalErrorHandler;Lorg/apache/flink/runtime/entrypoint/ClusterInformation;Ljava/lang/String;Lorg/apache/flink/runtime/metrics/groups/ResourceManagerMetricGroup;Lorg/apache/flink/runtime/resourcemanager/ResourceManagerRuntimeServices;)Lorg/apache/flink/runtime/resourcemanager/ResourceManager;
    at org.apache.flink.runtime.resourcemanager.ResourceManagerFactory.createResourceManager(ResourceManagerFactory.java:61) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:167) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:216) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201]
    at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_201]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168) ~[model-compute-1.0.0-jar-with-dependencies.jar:?]
    ... 2 more
  • flink connect kafka
    内部已经集成kafka-client 不用在继续添加kafka-client的依赖 避免不必要的依赖进入
  • sparkStreaming 一批批 mapPartition flink 一条条 map flatMap keyBy timeWindow 处理这个窗口内的

windowAll 统计一段时间的数据量

 dataStream.windowAll(TumblingProcessingTimeWindows.of(Time.seconds(20)))
    .aggregate(new myAggregate())
    .addSink(new MyJdbcSink(name))


class myAggregate() extends AggregateFunction[String, Long, Long] {
   
  override def createAccumulator(): Long = 0

  override def add(in: String, acc: Long): Long = acc + 1

  override def getResult(acc: Long): Long = acc

  override def merge(acc: Long, acc1: Long): Long = acc + acc1
}


class MyJdbcSink(name: String) extends RichSinkFunction[Long] {
   
  override def invoke(windowRecordsNum: Long, context: SinkFunction.Context): Unit = {
   

    // 获取上次的值
    val lastCount: String = JdbcUtil.getLastCount(name)

    // 初始有值
    if (lastCount != null) {
   
      JdbcUtil.updateCard(name, windowRecordsNum.toString, (windowRecordsNum + lastCount.toLong).toString)
    } else {
   
      JdbcUtil.insertCard(name, windowRecordsNum.toString, windowRecordsNum.toString)
    }
  }
}
// WindowAll 算子:并行度始终为1
// 其他窗口 如果不设置 默认和-p 保持一致

socket 模拟 kafka接收数据 或者自定义source run 方法 里面 一直 while

val dataStream: DataStream[String] = env.socketTextStream("localhost", 9999)

import java.util.concurrent.TimeUnit

import org.apache.flink.streaming.api.functions.source.{
   RichSourceFunction, SourceFunction}

class selfSource(value: String) extends RichSourceFunction[String] {
   
  override def run(ctx: SourceFunction.SourceContext[String]): Unit = {
   
    while (true) {
   

      ctx.collect(value)

      TimeUnit.SECONDS.sleep(5)
    }
  }

  override def cancel(): Unit = {
   

  }
}

    val stream: DataStream[String] = env.addSource(new selfSource("TestMsg"))

countWindow

Flink 的 window 有两个基本款,TimeWindow 和 CountWindow。
TimeWindow 是到时间就触发窗口,CountWindow 是到数量就触发。

countWindow 顾名思义根据 数量进行闭合窗口进行计算 一直在等待数量 如果数量不达标 那么就会一直等待下去 如果是对时间很敏感的业务 不行 所以 需要改成 countWindow+timeout 所以
官网很少提及 countWIndow 存在弊端 要么数量达到触发窗口关闭 要么时间到达触发窗口关闭

即 countWindow+timeout 带时间限制的计数窗口

实现方式有两种

  • 1 自定义触发器 实现较复杂 且不易于理解
// timeWindow+trigger =countWindow+timeout
class CountTriggerWithTimeout[W <: TimeWindow](maxCount: Long, timeCharacteristic: TimeCharacteristic) extends Trigger[Object, W] {
   
  private val countState: ReducingStateDescriptor[java.lang.Long] = new ReducingStateDescriptor[java.lang.Long]("count", new Sum(), LongSerializer.INSTANCE)


  override def onElement(element: Object, timestamp: Long, window: W, ctx: TriggerContext): TriggerResult = {
   
    val count: ReducingState[java.lang.Long] = ctx.getPartitionedState(countState)
    count.add(1L)
    if (count.get >= maxCount || timestamp >= window.getEnd) TriggerResult.FIRE_AND_PURGE else TriggerResult.CONTINUE
  }

  override def onProcessingTime(time: Long, window: W, ctx: TriggerContext): TriggerResult = {
   
    if (timeCharacteristic == TimeCharacteristic.EventTime) TriggerResult.CONTINUE else {
   
      if (time >= window.getEnd) TriggerResult.CONTINUE else TriggerResult.FIRE_AND_PURGE
    }
  }

  override def onEventTime(time: Long, window: W, ctx: TriggerContext): TriggerResult = {
   
    if (timeCharacteristic == TimeCharacteristic.ProcessingTime) TriggerResult.CONTINUE else {
   
      if (time >= window.getEnd) TriggerResult.CONTINUE else TriggerResult.FIRE_AND_PURGE
    }
  }

  override def clear(window: W, ctx: TriggerContext): Unit = {
   
    ctx.getPartitionedState(countState).clear
  }

  class Sum extends ReduceFunction[java.lang.Long] {
   
    def reduce(value1: java.lang.Long, value2: java.lang.Long): java.lang.Long = value1 + value2
  }

}


class selfProcess1() extends ProcessWindowFunction[String, String, String, GlobalWindow] {
   
  override def process(key: String, context: Context, elements: Iterable[String], out: Collector[String]): Unit = {
   
    out.collect(key + "|" + elements.mkString("|"))
  }
}


          dataStream.keyBy("key")
          .timeWindow(Time.seconds(10))
          .trigger(new CountTriggerWithTimeout[TimeWindow](10, env.getStreamTimeCharacteristic))
          .process(new selfProcess1())
          .print()

  • 2 利用process函数 flink状态实现 代码较简单 容易理解

case class Status(count: Int, value: String)

class TimeCountWindowProcessFunction(count: Long, windowSize: Long) extends KeyedProcessFunction[String, String, String] {
   

  lazy val state: ValueState[Status] = getRuntimeContext.getState(new ValueStateDescriptor[Status]("state", classOf[Status]))

  override def processElement(value: String, ctx: KeyedProcessFunction[String, String, String]#Context, out: Collector[String]): 
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值