Flink 心跳服务机制

心跳机制是用于检测客户端或者服务端是否存活的一种机制,通过定期向对方发送请求方法,常见的心跳检测有两种:

  1. socket 套接字SO_KEEPALIVE本身带有的心跳机制,定期向对方发送心跳包,对方在收到心跳包后会自动回复;
  2. 应用自身实现心跳机制,同样也是使用定期发送请求的方式;

        Flink对各组件服务状态的监控统一使用心跳服务来管理,如同其上诉2实现机制一样,其主要也是调用方通过周期性定时发送心跳请求,接收方接收到心跳请求后作出对应的心跳响应;其内部实现是通过RPC相互调用的方式,并重置对方超时线程的调度。在Flink的各个服务组件中,ResourceManager、JobMaster、TaskExecutor三者之间存在相互检测的心跳机制:ResourceManager会主动发送心跳请求探测JobMaster、TaskExecutor是否存活;JobMaster也会主动发送心跳请求探测TaskExecutor是否存活,以便进行任务重启或者失败处理。在flink心跳机制中,其主要心跳通信核心处理如下:

  1. 心跳超时:心跳服务启动后,Flink会启动一个线程来处理心跳超时事件,在设定的心跳超时时间到达后才执行线程。如果接收到组件的心跳消息,会先将该线程取消而后重新开启,重置心跳超时事件的触发。心跳服务依赖于HeartbeatListener,当在timeout时间范围内未接收到心跳响应,则会触发超时处理线程,该线程通过调用HeartbeatListener的notifyHeartbeatTimeout方法做后续心跳超时处理操作(一般是尝试重连)。
  2. 心跳请求:心跳检查是双向的,一方会主动发起心跳请求,而另一方则是对心跳做出响应,两者通过RPC相互调用,重置对方的超时线程。以JobManager和TaskManager为例,JM在启动时会开启周期调度,向已经注册到JM中的TM发起心跳检查,通过RPC调用TM的requestHeartbeat方法,重置对JM超时线程的调用,表示当前JM状态正常。在TM接受到来自JM的requestHeartbeat心跳请求方法后,TM会通过RPC调用JM的receiveHeartbeat,重置对TM超时线程的调用,表示TM状态正常。

Flink中心跳服务使用的主要接口和类如下图所示:

HeartbeatTarget类:

        心跳核心目标类:其主要用来发送心跳信息,也用来接收心跳响应。心跳发送者和接收者都是该接口的子类。两者都可以携带Payload负载信息。

public interface HeartbeatTarget<I> {
   /**
    * Sends a heartbeat response to the target. Each heartbeat response can carry a payload which
    * contains additional information for the heartbeat target.
    *
    * @param heartbeatOrigin Resource ID identifying the machine for which a heartbeat shall be reported.
    * @param heartbeatPayload Payload of the heartbeat. Null indicates an empty payload.
    */
   void receiveHeartbeat(ResourceID heartbeatOrigin, I heartbeatPayload);  // 接收监控目标发送来的心跳请求响应信息

   /**
    * Requests a heartbeat from the target. Each heartbeat request can carry a payload which
    * contains additional information for the heartbeat target.
    *
    * @param requestOrigin Resource ID identifying the machine issuing the heartbeat request.
    * @param heartbeatPayload Payload of the heartbeat request. Null indicates an empty payload.
    */
   void requestHeartbeat(ResourceID requestOrigin, I heartbeatPayload);   // 向监控目标发送心跳请求
}

HeartbeatManager:

        心跳管理器用来启动或停止监视HeartbeatTarget,并报告该目标心跳超时事件。通过monitorTarget来传递并监控HeartbeatTarget,这个方法可以看做是整个服务的输入,告诉心跳服务去管理哪些目标。

public interface HeartbeatManager<I, O> extends HeartbeatTarget<I> {
    // 开始监控心跳目标,当目标心跳超时,会报告给与HeartbeatManager关联的HeartbeatListener
    void monitorTarget(ResourceID resourceID, HeartbeatTarget<O> heartbeatTarget);
    // 取消监控心跳目标,ResourceID是心跳目标的标识
    void unmonitorTarget(ResourceID resourceID);
    // 停止当前心跳管理器
    void stop();
    // 返回最近一次心跳时间,如果心跳目标被移除了则返回-1
    long getLastHeartbeatFrom(ResourceID resourceId);
}

HeartbeatListener:

    和HeartbeatManager密切相关的接口,可以看做服务的输出。主要有以下作用:

  • 心跳超时通知
  • 接收心跳信息中的Payload
  • 检索作为心跳响应输出的Payload
public interface HeartbeatListener<I, O> {
    // 心跳超时会调用该方法
    void notifyHeartbeatTimeout(ResourceID resourceID);
    // 接收到有关心跳的payload就会执行该方法
    void reportPayload(ResourceID resourceID, I payload);
    // 检索下一个心跳消息的Payload
    O retrievePayload(ResourceID resourceID);
}

心跳服务的创建:

        集群启动时会初始化一些服务,在ClusterEntrypoint#initializeServices方法中创建心跳管理服务。其会从配置文件中提取心跳间隔heartbeat.interval和心跳超时时间heartbeat.timeout配置,并创建HeartbeatServices;

heartbeatServices = createHeartbeatServices(configuration);

protected HeartbeatServices createHeartbeatServices(Configuration configuration) {
  return HeartbeatServices.fromConfiguration(configuration);
}

public static HeartbeatServices fromConfiguration(Configuration configuration) {
    // 心跳间隔,默认10s
    long heartbeatInterval = configuration.getLong(HeartbeatManagerOptions.HEARTBEAT_INTERVAL);
    // 心跳超时时间,50s
    long heartbeatTimeout = configuration.getLong(HeartbeatManagerOptions.HEARTBEAT_TIMEOUT);

    return new HeartbeatServices(heartbeatInterval, heartbeatTimeout);
}

createHeartbeatManager和createHeartbeatManagerSender核心方法:

        这两个方法使用的两个类HeartbeatManagerImpl、HeartbeatManagerSenderImpl是整个心跳服务的关键。

        HeartbeatManagerImpl由心跳接受方、响应者(例如TM)创建,接收来自心跳发起方、请求方(JM)的心跳发送请求,其主要包含两个重要属性heartbeatListener、heartbeatTargets。heartbeatTargets是一个Map集合,key代表要发送心跳组件(例如:TM)的ID,value则是为当前组件创建的触发心跳超时的线程HeartbeatMonitor,两者一一对应,心跳超时会触发对应heartbeatListener的notifyHeartbeatTimeout方法。注意:被发起方心跳接受者监控线程的开启是在接收到请求心跳(requestHeartbeat被调用后)以后才触发的,属于被动触发。

//  外部调用者传递heartbeatTarget,并为其创建一个HeartbeatMonitor
public void monitorTarget(ResourceID resourceID, HeartbeatTarget<O> heartbeatTarget) {
   if (!stopped) {
       if (heartbeatTargets.containsKey(resourceID)) {
           log.debug("The target with resource ID {} is already been monitored.", resourceID);
       } else {
           // HeartbeatMonitor中保存目标监控处理核心类heartbeatTarget,并且将其关联对应的超时处理监听器heartbeatListener
           HeartbeatManagerImpl.HeartbeatMonitor<O> heartbeatMonitor = new HeartbeatManagerImpl.HeartbeatMonitor<>(
               resourceID,
               heartbeatTarget,
               mainThreadExecutor,
               heartbeatListener,
               heartbeatTimeoutIntervalMs);

           heartbeatTargets.put(
               resourceID,
               heartbeatMonitor);

           // check if we have stopped in the meantime (concurrent stop operation)
           if (stopped) {
               heartbeatMonitor.cancel();
               heartbeatTargets.remove(resourceID);
           }
       }
   }
}

Heartbeat monitor管理心跳目标,在timeout时间内没有接收到心跳信号,则判定心跳超时,通知给HeartbeatListener,每次接收到心跳信号则重置当前timer。

static class HeartbeatMonitor<O> implements Runnable {
    private final ResourceID resourceID; /** Resource ID of the monitored heartbeat target. */
    private final HeartbeatTarget<O> heartbeatTarget; /** Associated heartbeat target. */
    private final ScheduledExecutor scheduledExecutor;
    private final HeartbeatListener<?, ?> heartbeatListener; /** Listener which is notified about heartbeat timeouts. */
    private final long heartbeatTimeoutIntervalMs; /** Maximum heartbeat timeout interval. */
    private volatile ScheduledFuture<?> futureTimeout;
    private final AtomicReference<State> state = new AtomicReference<>(State.RUNNING);
    private volatile long lastHeartbeat;  //  最近一次接收到心跳的时间

    HeartbeatMonitor(
        ResourceID resourceID,
        HeartbeatTarget<O> heartbeatTarget,
        ScheduledExecutor scheduledExecutor,
        HeartbeatListener<?, O> heartbeatListener,
        long heartbeatTimeoutIntervalMs) {
        this.resourceID = Preconditions.checkNotNull(resourceID); // 被监控的机器ID
        this.heartbeatTarget = Preconditions.checkNotNull(heartbeatTarget); // 心跳目录核心处理类
        this.scheduledExecutor = Preconditions.checkNotNull(scheduledExecutor);
        this.heartbeatListener = Preconditions.checkNotNull(heartbeatListener); // 心跳监听器

        Preconditions.checkArgument(heartbeatTimeoutIntervalMs > 0L, "The heartbeat timeout interval has to be larger than 0.");
        this.heartbeatTimeoutIntervalMs = heartbeatTimeoutIntervalMs;
        lastHeartbeat = 0L;
        resetHeartbeatTimeout(heartbeatTimeoutIntervalMs);
    }
    /....................
    // 报告心跳
    void reportHeartbeat() { 
        lastHeartbeat = System.currentTimeMillis();  // 保留最近一次接收心跳时间
        resetHeartbeatTimeout(heartbeatTimeoutIntervalMs);  // 接收心跳后, 重置timeout线程
    }
    // 重置TIMEOUT
    void resetHeartbeatTimeout(long heartbeatTimeout) {
        if (state.get() == State.RUNNING) {
            cancelTimeout(); //先取消线程,在重新开启
            futureTimeout = scheduledExecutor.schedule(this, heartbeatTimeout, TimeUnit.MILLISECONDS); // 启动超时线程

            // Double check for concurrent accesses (e.g. a firing of the scheduled future)
            if (state.get() != State.RUNNING) {
                cancelTimeout();
            }
        }
    }
    /................
    // 心跳超时,触发listener的notifyHeartbeatTimeout
    @Override
    public void run() {
        // The heartbeat has timed out if we're in state running
        if (state.compareAndSet(State.RUNNING, State.TIMEOUT)) {
            heartbeatListener.notifyHeartbeatTimeout(resourceID);
        }
    }
}

HeartbeatManagerSenderImpl是HeartbeatManagerImpl的子类,由心跳请求方(例如JM)创建,创建后立即开启周期调度线程,每次遍历自己管理的heartbeatTarget,触发heartbeatTarget.requestHeartbeat()心跳请求,属于主动触发。

this.heartbeatPeriod = heartbeatPeriod;
mainThreadExecutor.schedule(this, 0L, TimeUnit.MILLISECONDS);

public void run() {
   if (!stopped) {
      log.debug("Trigger heartbeat request.");
      for (HeartbeatMonitor<O> heartbeatMonitor : getHeartbeatTargets()) {
         CompletableFuture<O> futurePayload = getHeartbeatListener().retrievePayload(heartbeatMonitor.getHeartbeatTargetId()); // 重新创建当前负载信息
         final HeartbeatTarget<O> heartbeatTarget = heartbeatMonitor.getHeartbeatTarget(); // 心跳核心处理类

         if (futurePayload != null) {
            CompletableFuture<Void> requestHeartbeatFuture = FutureUtils.thenAcceptAsyncIfNotDone(
               futurePayload,
               getMainThreadExecutor(),
               payload -> heartbeatTarget.requestHeartbeat(getOwnResourceID(), payload)); // 使用 心跳核心处理类 去发送请求 

            requestHeartbeatFuture.exceptionally(
               (Throwable failure) -> {
                  log.warn("Could not request the heartbeat from target {}.", heartbeatTarget, failure);

                  return null;
               });
         } else {
            heartbeatTarget.requestHeartbeat(getOwnResourceID(), null);
         }
      }
      getMainThreadExecutor().schedule(this, heartbeatPeriod, TimeUnit.MILLISECONDS); // 周期调度
   }
}

 

心跳服务示例:

1、心跳发起、请求方:JM中HeartbeatManagerSenderImpl使用

  1. 接收TM的注册后,将heartbeatTarget该加入到心跳目标的集合中,心跳发起请求方JM会使用自己的心跳管理发送器HeartbeatManagerSenderImpl来周期遍历调度自己管理的heartbeatTarget,触发heartbeatTarget.requestHeartbeat()心跳请求;在这就是触发JM针对TM的heartbeatTarget.requestHeartbeat()。
  2. 在requestHeartbeat中通过RPC调用taskExecutor#heartbeatFromJobManager,从JM发送心跳请求信息至TM;最终TM接受到心跳请求信息后会调用HeartbeatManagerImpl中的requestHeartbeat,启动或重置超时线程,表示JM状态正常。在该方法中又通过RPC调用JM的receiveHeartbeat。
public CompletableFuture<RegistrationResponse> registerTaskManager(
      final String taskManagerRpcAddress,
      final TaskManagerLocation taskManagerLocation,
      final Time timeout) {
   final ResourceID taskManagerId = taskManagerLocation.getResourceID();

   if (registeredTaskManagers.containsKey(taskManagerId)) {
      final RegistrationResponse response = new JMTMRegistrationSuccess(resourceId);
      return CompletableFuture.completedFuture(response);
   } else {
      return getRpcService()
         .connect(taskManagerRpcAddress, TaskExecutorGateway.class)
         .handleAsync(
            (TaskExecutorGateway taskExecutorGateway, Throwable throwable) -> {
               if (throwable != null) {
                  return new RegistrationResponse.Decline(throwable.getMessage());
               }
               
               slotPoolGateway.registerTaskManager(taskManagerId); // TaskManager注册
               registeredTaskManagers.put(taskManagerId, Tuple2.of(taskManagerLocation, taskExecutorGateway));

               // monitor the task manager as heartbeat target    // 加入心跳目标
               taskManagerHeartbeatManager.monitorTarget(taskManagerId, new HeartbeatTarget<AllocatedSlotReport>() {
                  @Override
                  public void receiveHeartbeat(ResourceID resourceID, AllocatedSlotReport payload) {
                     // the task manager will not request heartbeat, so this method will never be called currently
                  }

                  @Override
                  public void requestHeartbeat(ResourceID resourceID, AllocatedSlotReport allocatedSlotReport) {
                     taskExecutorGateway.heartbeatFromJobManager(resourceID, allocatedSlotReport);  // JM要求TM发送心跳请求
                  }
               });

               return new JMTMRegistrationSuccess(resourceId);
            },
            getMainThreadExecutor());
   }
}

TM接收到JM的RPC心跳请求后,会最终调用TM上心跳接受处理器HeartbeatManagerImpl#requestHeartbeat来进行该心跳请求的处理:

//HeartbeatManagerImpl#requestHeartbeat()
public void requestHeartbeat(final ResourceID requestOrigin, I heartbeatPayload) {
   if (!stopped) {
      log.debug("Received heartbeat request from {}.", requestOrigin);
      final HeartbeatTarget<O> heartbeatTarget = reportHeartbeat(requestOrigin); // 启动超时线程, 并获取目标heartbeatTarget, 此时的目标是JM

      if (heartbeatTarget != null) {
         if (heartbeatPayload != null) {
            heartbeatListener.reportPayload(requestOrigin, heartbeatPayload); // 监听器汇报负载情况
         }
         CompletableFuture<O> futurePayload = heartbeatListener.retrievePayload(requestOrigin); // 监听器生成当前负载信息

         if (futurePayload != null) {
            CompletableFuture<Void> sendHeartbeatFuture = FutureUtils.thenAcceptAsyncIfNotDone(
               futurePayload,
               mainThreadExecutor,    // 心跳处理核心目标类 通过RPC调用 心跳请求发起方(JM)的receiveHeartbeat   
               retrievedPayload ->    heartbeatTarget.receiveHeartbeat(getOwnResourceID(), retrievedPayload)); // heartbeatTarget实例为下面monitorTarget所注册的

            sendHeartbeatFuture.exceptionally((Throwable failure) -> {
                  log.warn("Could not send heartbeat to target with id {}.", requestOrigin, failure);
                  return null;
               });
         } else {
            heartbeatTarget.receiveHeartbeat(ownResourceID, null);
         }
      }
   }
}

//TaskExecutor上的心跳核心处理类监控注册    TaskExecutor#establishJobManagerConnection
// monitor the job manager as heartbeat target
jobManagerHeartbeatManager.monitorTarget(jobManagerResourceID, new HeartbeatTarget<AccumulatorReport>() {
   @Override
   public void receiveHeartbeat(ResourceID resourceID, AccumulatorReport payload) {
      jobMasterGateway.heartbeatFromTaskManager(resourceID, payload);
   }

   @Override
   public void requestHeartbeat(ResourceID resourceID, AccumulatorReport payload) {
      // request heartbeat will never be called on the task manager side
   }
});

2、心跳接受、响应方:TM中HeartbeatManagerImpl的使用

  • TM启动后会和JM建立连接,连接成功后会为JM创建HeartbeatTarget,并重写receiveHeartbeat方法。此时,HeartbeatManagerImpl中已经创建好对应的monitor线程,只有在JM执行requestHeartbeat后,才会触发该线程的执行。
  • 在receiveHeartbeat方法内部,直接通过RPC调用JM的heartbeatFromTaskManager方法,最终进入JM侧的HeartbeatManagerSenderImpl#receiveHeartbeat中,在reportHeartbeat重置JM monitor线程的触发,代表TM正常执行。
//TaskExecutor#establishJobManagerConnection
private void establishJobManagerConnection(JobID jobId, final JobMasterGateway jobMasterGateway, JMTMRegistrationSuccess registrationSuccess) {
    /.............
    ResourceID jobManagerResourceID = registrationSuccess.getResourceID();
    // monitor the job manager as heartbeat target
    jobManagerHeartbeatManager.monitorTarget(jobManagerResourceID, new HeartbeatTarget<AccumulatorReport>() {
        // TM只接收心跳请求,接受到来自JM的请求信息后,会通过RPC回调jobMasterGateway.heartbeatFromTaskManager()
        @Override
        public void receiveHeartbeat(ResourceID resourceID, AccumulatorReport payload) {
            jobMasterGateway.heartbeatFromTaskManager(resourceID, payload);
        }

        @Override
        public void requestHeartbeat(ResourceID resourceID, AccumulatorReport payload) {
            // request heartbeat will never be called on the task manager side
        }
    });
    /..............
}
// jobMaster心跳请求方实例 
// taskManagerHeartbeatManager的创建 HeartbeatManagerSenderImpl继承自HeartbeatManagerImpl
taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
    resourceId,
    new TaskManagerHeartbeatListener(),
    getMainThreadExecutor(),
    log);
    
// 接收到来自TM的心跳响应
public void heartbeatFromTaskManager(final ResourceID resourceID, AccumulatorReport accumulatorReport) {
    taskManagerHeartbeatManager.receiveHeartbeat(resourceID, accumulatorReport); // 
}

// JM接收到来自TM的心跳响应
public void receiveHeartbeat(ResourceID heartbeatOrigin, I heartbeatPayload) {
    if (!stopped) {
        log.debug("Received heartbeat from {}.", heartbeatOrigin);
        //接收到心跳后的操作
        reportHeartbeat(heartbeatOrigin);
    
        if (heartbeatPayload != null) {
            heartbeatListener.reportPayload(heartbeatOrigin, heartbeatPayload);
        }
    }
}

 

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Flink序列化机是将数据从Java对象转换为字节序列的过程,以便在网络上传输或存储到磁盘中。Flink使用Kryo作为默认的序列化器,Kryo是一个快速高效的Java序列化框架,可以将Java对象序列化为字节数组,也可以将字节数组反序列化为Java对象。在Flink中,序列化器是根据数据类型来选择的,不同的数据类型有不同的序列化器。例如,对于Tuple、Pojo和CaseClass等复合类型,它们的序列化器是复合的,会将内嵌类型的序列化委托给对应类型的序列化器。在序列化操作时,会委托相应具体序列化的序列化器进行相应的序列化操作。Flink还提供了WritableSerializer和AvroSerializer等其他类型的序列化器,用户可以根据需要选择不同的序列化器。 示例代码如下: ```java // 定义一个POJO类 public class Person implements Serializable { private String name; private int age; public Person() {} public Person(String name, int age) { this.name = name; this.age = age; } public String getName() { return name; } public void setName(String name) { this.name = name; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } } // 使用Kryo序列化器将Person对象序列化为字节数组 Person person = new Person("张三", 20); KryoSerializer<Person> serializer = new KryoSerializer<>(Person.class, new ExecutionConfig()); byte[] bytes = serializer.serialize(person); // 使用Kryo序列化器将字节数组反序列化为Person对象 Person person2 = serializer.deserialize(bytes); System.out.println(person2.getName() + " " + person2.getAge()); // 输出:张三 20 ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值