druid.io index_realtime任务的hand off:仍然是源码+log说清楚

前言

之前的博文:Druid.io index_realtime实时任务源码分析 介绍了整个index_realtime任务运行的流程,但是对于某些细节还是没有从源码上分析的很透彻,本文将主要对hand off这个阶段进行主要分析,说清楚什么是hand off阶段,以及这个阶段做了什么工作

可以先看下官网文档介绍:https://druid.apache.org/docs/0.20.0/design/architecture.html#indexing-and-handoff

简单来说hand off就是:实时任务完成后的segment文件要经过coordinator加载到historical上,实时任务结束,新segment由historical负责了,要变成历史segment了

源码 + log 说明流程

下面直接进入主题,整个任务结束部分如下图所示,当然包含了所谓的hand off阶段。

在这里插入图片描述
借助上图和源码 & log 我们分析下具体的代码逻辑和流程

首先对于tranquility index_realtime实时任务 是通过EventReceiverFirehoseFactory来接收数据并处理的,这里数据处理时间就是:interval + 时间窗口, 这是tranquility这种pull消费模式特有的,那么过了这个时间窗口即便是再有数据也不会进行处理了,在源码上则是使用定时线程来关闭EventReceiverFirehoseFactory的,并且任务运行log也直接说明了将会在哪个时间点进行关闭

eg如下:index_realtime_mv2_2020-11-13T08:00:00.000Z_0_0任务log

2020-11-13T09:00:000Z + 10分钟时间窗口 + tranquility的5分钟延迟关闭 就是 2020-11-13T09:15:00.000Z(具体的源码可见:io.druid.segment.realtime.firehose.TimedShutoffFirehoseFactory.TimedShutoffFirehose)
在这里插入图片描述
确实是到点就"熄火"了, 源码如下

TimedShutoffFirehose(InputRowParser parser, File temporaryDirectory) throws IOException
{
  firehose = delegateFactory.connect(parser, temporaryDirectory);

  exec = Execs.scheduledSingleThreaded("timed-shutoff-firehose-%d");

  exec.schedule(
      new Runnable()
      {
        @Override
        public void run()
        {
          log.info("Closing delegate firehose.");

          shutdown = true;
          try {
            firehose.close();
          }
          catch (IOException e) {
            log.warn(e, "Failed to close delegate firehose, ignoring.");
          }
        }
      },
      shutoffTime.getMillis() - System.currentTimeMillis(),
      TimeUnit.MILLISECONDS
  );

  log.info("Firehose created, will shut down at: %s", shutoffTime);
}

回到任务主逻辑(可参看之前的博文): io.druid.indexing.common.task.RealtimeIndexTask#run

run(){
....

try {
      toolbox.getDataSegmentServerAnnouncer().announce();
      toolbox.getDruidNodeAnnouncer().announce(discoveryDruidNode);


      plumber.startJob();

      // Set up metrics emission
      toolbox.getMonitorScheduler().addMonitor(metricsMonitor);

      // Firehose temporary directory is automatically removed when this RealtimeIndexTask completes.
      FileUtils.forceMkdir(firehoseTempDir);

      // Delay firehose connection to avoid claiming input resources while the plumber is starting up.
      final FirehoseFactory firehoseFactory = spec.getIOConfig().getFirehoseFactory();
      final boolean firehoseDrainableByClosing = isFirehoseDrainableByClosing(firehoseFactory);

      // Skip connecting firehose if we've been stopped before we got started.
      synchronized (this) {
        if (!gracefullyStopped) {
          firehose = firehoseFactory.connect(spec.getDataSchema().getParser(), firehoseTempDir);
          committerSupplier = Committers.supplierFromFirehose(firehose);
        }
      }

      // Time to read data!
      while (firehose != null && (!gracefullyStopped || firehoseDrainableByClosing) && firehose.hasMore()) {
        Plumbers.addNextRow(
            committerSupplier,
            firehose,
            plumber,
            tuningConfig.isReportParseExceptions(),
            metrics
        );
      }
    }
    catch (Throwable e) {
      normalExit = false;
      log.makeAlert(e, "Exception aborted realtime processing[%s]", dataSchema.getDataSource())
         .emit();
      throw e;
    }
    finally {
      if (normalExit) {
        try {
          // Persist if we had actually started.
          if (firehose != null) {
            log.info("Persisting remaining data.");

            final Committer committer = committerSupplier.get();
            final CountDownLatch persistLatch = new CountDownLatch(1);
            plumber.persist(
                new Committer()
                {
                  @Override
                  public Object getMetadata()
                  {
                    return committer.getMetadata();
                  }

                  @Override
                  public void run()
                  {
                    try {
                      committer.run();
                    }
                    finally {
                      persistLatch.countDown();
                    }
                  }
                }
            );
            persistLatch.await();
          }

          if (gracefullyStopped) {
            log.info("Gracefully stopping.");
          } else {
            log.info("Finishing the job.");
            synchronized (this) {
              if (gracefullyStopped) {
                // Someone called stopGracefully after we checked the flag. That's okay, just stop now.
                log.info("Gracefully stopping.");
              } else {
                finishingJob = true;
              }
            }

            if (finishingJob) {
              plumber.finishJob();
            }
          }
        }
        catch (InterruptedException e) {
          log.debug(e, "Interrupted while finishing the job");
        }
        catch (Exception e) {
          log.makeAlert(e, "Failed to finish realtime task").emit();
          throw e;
        }
        finally {
          if (firehose != null) {
            CloseQuietly.close(firehose);
          }
          toolbox.getMonitorScheduler().removeMonitor(metricsMonitor);
        }
      }

      toolbox.getDataSegmentServerAnnouncer().unannounce();
      toolbox.getDruidNodeAnnouncer().unannounce(discoveryDruidNode);
    }

    log.info("Job done!");
    return TaskStatus.success(getId());
  }

}

firehose一直工作的时候,RealtimePlumber是一直处理数据的,看那个while循环, 如果firsehose close了,那么要走到finally代码了,正常情况下就是正常退出流程,所以可以看到任务log: Persisting remaining data.在这里插入图片描述
可以知道就是最后阶段firehose余下内存中数据的处理了

这里persist剩下数据还用了CountDownLatch, 即必须要剩下数据处理完成,代码才会走下去, 即如下抽取出来的源码

final CountDownLatch persistLatch = new CountDownLatch(1);
plumber.persist(
    new Committer()
    {
      @Override
      public Object getMetadata()
      {
        return committer.getMetadata();
      }

      @Override
      public void run()
      {
        try {
          committer.run();
        }
        finally {
          persistLatch.countDown();
        }
      }
    }
);
persistLatch.await();

persist 具体源码可以参见:io.druid.segment.realtime.plumber.RealtimePlumber#persistHydrant & io.druid.segment.IndexMergerV9#persist, 这里不详细展开,注意观察下任务log

在这里插入图片描述

接着代码会执行io.druid.segment.realtime.plumber.RealtimePlumber#finishJob, 源码如下:这里就是要做最后的merge小segment文件然后进行hand off了

@Override
 public void finishJob()
 {
   log.info("Shutting down...");

   shuttingDown = true;

   for (final Map.Entry<Long, Sink> entry : sinks.entrySet()) {
     persistAndMerge(entry.getKey(), entry.getValue());
   }

   final long forceEndWaitTime = System.currentTimeMillis() + config.getHandoffConditionTimeout();
   while (!sinks.isEmpty()) {
     try {
       log.info(
           "Cannot shut down yet! Sinks remaining: %s",
           Joiner.on(", ").join(
               Iterables.transform(
                   sinks.values(),
                   new Function<Sink, String>()
                   {
                     @Override
                     public String apply(Sink input)
                     {
                       return input.getSegment().getIdentifier();
                     }
                   }
               )
           )
       );

       synchronized (handoffCondition) {
         while (!sinks.isEmpty()) {
           if (config.getHandoffConditionTimeout() == 0) {
             handoffCondition.wait();
           } else {
             long curr = System.currentTimeMillis();
             if (forceEndWaitTime - curr > 0) {
               handoffCondition.wait(forceEndWaitTime - curr);
             } else {
               throw new ISE(
                   "Segment handoff wait timeout. [%s] segments might not have completed handoff.",
                   sinks.size()
               );
             }
           }
         }
       }
     }
     catch (InterruptedException e) {
       throw Throwables.propagate(e);
     }
   }

   handoffNotifier.close();
   shutdownExecutors();

   stopped = true;

   if (!cleanShutdown) {
     throw new ISE("Exception occurred during persist and merge.");
   }
 }

在这里插入图片描述

这里的重要操作步骤是persistAndMerge(核心) 见io.druid.segment.realtime.plumber.RealtimePlumber#persistAndMerge, 主要源码如下

// mergeExecutor 线程的 segment to deep storage & metadata store
  log.info("Pushing [%s] to deep storage", sink.getSegment().getIdentifier());

DataSegment segment = dataSegmentPusher.push(
   mergedFile,
   sink.getSegment().withDimensions(IndexMerger.getMergedDimensionsFromQueryableIndexes(indexes)),
   false
);
log.info("Inserting [%s] to the metadata store", sink.getSegment().getIdentifier());
segmentPublisher.publishSegment(segment);

// 同时 handoffNotifier 开始注册监听hand off 阶段是否完成了
handoffNotifier.registerSegmentHandoffCallback(
    new SegmentDescriptor(sink.getInterval(), sink.getVersion(), config.getShardSpec().getPartitionNum()),
    mergeExecutor, new Runnable()
    {
      @Override
      public void run()
      {
        abandonSegment(sink.getInterval().getStartMillis(), sink);
        metrics.incrementHandOffCount();
      }
    }
);

到此hand off就开始了

  1. the metadata store过程中新的segment元数据信息要插入到DRUID_SEGMENTS表中
  2. 而Coordinator则是有定时线程任务,去元数据库中取出segment信息加载到内容,并在historicals中进行协调分配(见代码:io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner#run
  3. Coordinator 分配有着自己的算法逻辑,一旦有问题或者卡住,或者保留规则设置问题,等等,那么就会出现经典的Still waiting for Handoff问题

在这里插入图片描述

如下图log所示,如果Still waiting for Handoff问题出现,你会看到疯狂的这么个log,下图log最终是Segment Handoff complete
在这里插入图片描述
io.druid.indexing.common.task.RealtimeIndexTask#run结束代码中还有如下两句,主要是unannounce,还记得之前的announce peon任务等内容吗?这里就是清理zookeeper node, 因为hand off完成,segment属于historical了,任务要消失

toolbox.getDataSegmentServerAnnouncer().unannounce();
toolbox.getDruidNodeAnnouncer().unannounce(discoveryDruidNode);

总结

本文主要阐述了下hand off,并结合源码和log说明了下流程,至少应该理解这么个阶段,实际遇到问题也知道怎么去排查。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
3765209 [Druid-ConnectionPool-Create-146186969] com.alibaba.druid.pool.DruidDataSource.run:2815| create connection SQLException, url: jdbc:postgresql://10.251.150.35:5432/adverse, errorCode 0, state 08001 org.postgresql.util.PSQLException: Connection to 10.251.150.35:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:285) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:217) at org.postgresql.Driver.makeConnection(Driver.java:458) at org.postgresql.Driver.connect(Driver.java:260) at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:118) at com.alibaba.druid.filter.FilterAdapter.connection_connect(FilterAdapter.java:764) at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:112) at com.alibaba.druid.filter.stat.StatFilter.connection_connect(StatFilter.java:232) at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:112) at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1645) at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1709) at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2813) Caused by: java.net.ConnectException: 拒绝连接 (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at org.postgresql.core.PGStream.<init>(PGStream.java:81) at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:93) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:197)
最新发布
07-13

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值