flume事务解析

本文深入探讨Flume中的事务处理,重点分析了put事务(由taildir触发,确保每条数据至少发送一次)和take事务(由sink如HDFSEventSink触发),强调了KafkaChannel在数据可靠性及效率上的优势。事务流程涉及开始、提交、回滚,保证了数据在Flume中不丢失。
摘要由CSDN通过智能技术生成

在flume中事务是一个重要的概念,事务保证了数据的可用性。这里的事务有别于数据库中的事务,比事务在回滚时,可能会造成数据重复,所以flume保证的是每条数据最少发送一次,以此来保证数据不丢失

此篇从具体的数据流中分析事务,配置的数据流是taildir+kafkachannel,然后kafkachannel+hdfsSink

kafkachannel中维护了两个事务,分别是put事务和take事务。

个人站点地址:http://bigdatadecode.club/flume事务解析.html

put事务

kafkachannel的put事务是由taildir触发的,我们从代码中跟下put事务的流程。

taildir的入口是TaildirSource.process,代码如下:

public Status process() {
  Status status = Status.READY;
  try {
    existingInodes.clear();
    existingInodes.addAll(reader.updateTailFiles());
    for (long inode : existingInodes) {
      TailFile tf = reader.getTailFiles().get(inode);
      // 判断是否需要tail
      // 判断规则,修改时间是否大于上次记录的tial时间,记录的postition是否大于该文件的length
      if (tf.needTail()) {
        tailFileProcess(tf, true);
      }
    }
    closeTailFiles();
    ...
  } catch (Throwable t) {
    ...
  }
  return status;
}

当file的修改时间大于记录的上次tail时间或者记录的postition大于file的length时(从0处tail),需要tail该file。
文件的tail逻辑在tailFileProcess代码中。

private void tailFileProcess(TailFile tf, boolean backoffWithoutNL)
    throws IOException, InterruptedException {
  while (true) {
    reader.setCurrentFile(tf);
    // 从文件中读取batchSize条数据
    List<Event> events = reader.readEvents(batchSize, backoffWithoutNL);
    if (events.isEmpty()) {
      break;
    }
    sourceCounter.addToEventReceivedCount(events.size());
    sourceCounter.incrementAppendBatchReceivedCount();
    try {
      // 事务的实现
      getChannelProcessor().processEventBatch(events);
      reader.commit();
    } catch (ChannelException ex) {
      logger.warn("The channel is full or unexpected failure. " +
          "The source will try again after " + retryInterval + " ms");
      TimeUnit.MILLISECONDS.sleep(retryInterval);
      retryInterval = retryInterval << 1;
      retryInterval = Math.min(retryInterval, maxRetryInterval);
      continue;
    }
    retryInterval = 1000;
    sourceCounter.addToEventAcceptedCount(events.size());
    sourceCounter.incrementAppendBatchAcceptedCount();
    // 追上写入的速度之后才会退出当前file?是否存在其它文件无法得到tail的机会??
    // 这应该是个bug
    if (events.size() < batchSize) {
      break;
    }
  }
}

上面的bug是当一个fileGroup中有多个正在写入的文件时,如果某个文件的写入量大,致使每次都能从中读取batchSize条数据,则其它文件将没有机会被读取。
这个bug我已提交到社区FLUME-3101

下面看下事务具体是怎么实现的,
getChannelProcessor().processEventBatch(events) -> ChannelProcesser.processEventBatch,看下processEventBatch的代码:

public void processEventBatch(List<Event> events) {
  ...
  // 将event与channel组成map
  for (Event event : events) {
    ...
  }
  // Process required channels
  for (Channel reqChannel : reqChannelQueue.keySet()) {
    // 得到channel对应的事务
    Transaction tx = reqChannel.getTransaction();
    Preconditions.checkNotNull(tx, "Transaction object must not be null");
    try {
      // 开始事务
      tx.begin();
      // 处理事务,这里是先将event写入内存,然后由commit批量将events写入kafka
      List<Event> batch = reqChannelQueue.get(reqChannel);

      for (Event event : batch) {
        reqChannel.put(event);
      }
      // 提交事务,也是一个事务的结束
      tx.commit();
    } catch (Throwable t) {
      // 发生议程,进行事务回滚
      tx.rollback();
      if (t instanceof Error) {
        LOG.error("Error while writing to required channel: " + reqChannel, t);
        throw (Error) t;
      } else if (t instanceof ChannelException) {
        throw (ChannelException) t;
      } else {
        throw new ChannelException("Unable to put batch on required " +
            "channel: " + reqChannel, t);
      }
    } finally {
      if (tx != null) {
        tx.close();
      }
    }
  }
  // Process optional channels
  for (Channel optChannel : optChannelQueue.keySet()) {
    ...
  }
}

首先从该source中绑定的channel中拿到对应的Transaction,然后调用begin方法开始事务,等数据处理结束之后,调用commit提交事务,如果处理数据的过程中发生错误,则在catch中捕获,调用rollback进行事务回滚。

先看下数据处理的逻辑,通过reqChannel.put(event)将数据将入channel的内存中。看似调用的是channel的方法,其实channel的put只是对Transaction的put进行了下封装,而Transaction.put的具体实现是在channel中的Transaction.doPut里实现的。
reqChannel.put(event) -> BasicChannelSemantics.put -> BasicTransactionSemantics.put -> BasicTransactionSemantics.duPut
其中doPut是一个抽象方法,其具体实现放在各个channel的Transaction中。这里使用的kafkaChannel,其实现如下:

protected void doPut(Event event) throws InterruptedException {
  // 事务类型 PUT or TAKE
  type = TransactionType.PUT;
  ...
  Integer partitionId = null;
  try {
    if (staticPartitionId != null) {
      partitionId = staticPartitionId;
    }
    //Allow a specified header to override a static ID
    if (partitionHeader != null) {
      String headerVal = event.getHeaders().get(partitionHeader);
      if (headerVal != null) {
        partitionId = Integer.parseInt(headerVal);
      }
    }
    // 将event构建一个ProducerRecord对象放入producerRecords中,
    // 等待commit时写入kafka
    if (partitionId != null) {
      producerRecords.get().add(
          new ProducerRecord<String, byte[]>(topic.get(), partitionId, key,
                                             serializeValue(event, parseAsFlumeEvent)));
    } else {
      producerRecords.get().add(
          new ProducerRecord<String, byte[]>(topic.get(), key,
                                             serializeValue(event, parseAsFlumeEvent)));
    }
  } catch (NumberFormatException e) {
    throw new ChannelException("Non integer partition id specified", e);
  } catch (Exception e) {
    throw new ChannelException("Error while serializing event", e);
  }
}

doPut首先给事务的类型赋值,然后将event放入内存中,如果此过程中没有发生错误,则会调用commit对内存中的event提交到kafka中。

下面看下commit的代码,commit的调用逻辑和put类似,具体实现是在KafkaChannel.KafkaTransaction的duCommit中,代码如下:

protected void doCommit() throws InterruptedException {
  if (type.equals(TransactionType.NONE)) {
    return;
  }
  // 判断需要commit的事务类型
  // 此处先分析PUT的commit
  if (type.equals(TransactionType.PUT)) {
    if (!kafkaFutures.isPresent()) {
      kafkaFutures = Optional.of(new LinkedList<Future<RecordMetadata>>());
    }
    try {
      long batchSize = producerRecords.get().size();
      long startTime = System.nanoTime();
      int index = 0;
      for (ProducerRecord<String, byte[]> record : producerRecords.get()) {
        index++;
        // 多线程之间共享一个producer实例(官方推荐,但也可以根据自己的情况而定)
        // The producer is thread safe and sharing a single producer instance 
        // across threads will generally be faster than having multiple instances.
        kafkaFutures.get().add(producer.send(record, new ChannelCallback(index, startTime)));
      }
      //prevents linger.ms from being a problem
      // 强制发送累加队列RecordAccumulator中的数据
      producer.flush();
      // 等待各线程将数据发送至kafka
      for (Future<RecordMetadata> future : kafkaFutures.get()) {
        future.get();
      }
      long endTime = System.nanoTime();
      counter.addToKafkaEventSendTimer((endTime - startTime) / (1000 * 1000));
      counter.addToEventPutSuccessCount(batchSize);
      producerRecords.get().clear();
      kafkaFutures.get().clear();
    } catch (Exception ex) {
      logger.warn("Sending events to Kafka failed", ex);
      throw new ChannelException("Commit failed as send to Kafka failed",
              ex);
    }
  } else {
    ...
  }
}

PUT类型的事务和TAKE类型的事务都是在doCommit中提交,这里调用的都是kafka的Java Api,需要注意的是各个线程之间共享一个producer实例,event发送到kafka可以认为是同步发送,因为调用了future.get等待各个线程的结束。
这里还调用了producer.flush(),这是为了防止配置了linger.ms对record进行合并发送,flush强制将队列中的数据发送到kafka。

无论是在doPut还是在doCommit中发生错误,都会对事务进行回滚。回滚是在doRollback中,代码如下:

protected void doRollback() throws InterruptedException {
  if (type.equals(TransactionType.NONE)) {
    return;
  }
  if (type.equals(TransactionType.PUT)) {
    // PUT时发生错误,则把内存中的数据清空
    // 但没有对回滚的次数进行统计
    producerRecords.get().clear();
    kafkaFutures.get().clear();
  } else {
    ...
  }
}

*由上面的分析可知,kafkachannel是将event通过doPut写入内存,然后通过doCommit将内存中的数据发送到kafka,这个事务是将event写入到kafka时才结束。
而memorychannel则是将event通过doPut写入内存(putList)中,然后通过doCommit将putList中的数据写入queue中,写入queue成功则事务结束。可见如果使用kafkachannel向kafka中写数据时会比memorychannel要高效,更重要的是能保证数据的事务性*。

下面看下take事务

take事务

kafkachannel中的take事务是由sink触发的,这里是指hdfsSink,下面看下take的事务代码。

此处的sink用的是HDFSEventSink,其process代码如下:

// 非线程安全
public Status process() throws EventDeliveryException {
  // 拿到sink关联的channel
  Channel channel = getChannel();
  // 从channel中得到Transaction
  Transaction transaction = channel.getTransaction();
  List<BucketWriter> writers = Lists.newArrayList();
  // 开始事务
  transaction.begin();
  try {
    int txnEventCount = 0;
    for (txnEventCount = 0; txnEventCount < batchSize; txnEventCount++) {
      // 从channel中取出一条数据
      Event event = channel.take();
      if (event == null) {
        break;
      }
      ...
      synchronized (sfWritersLock) {
        bucketWriter = sfWriters.get(lookupPath);
        // we haven't seen this file yet, so open it and cache the handle
        // 没有文件的句柄,则新建一个
        if (bucketWriter == null) {
          hdfsWriter = writerFactory.getWriter(fileType);
          bucketWriter = initializeBucketWriter(realPath, realName,
            lookupPath, hdfsWriter, closeCallback);
          sfWriters.put(lookupPath, bucketWriter);
        }
      }
      // track the buckets getting written in this transaction
      // 一次事务中,take的event可能来自不同topic的parition,则需要同时打开多个文件句柄
      if (!writers.contains(bucketWriter)) {
        writers.add(bucketWriter);
      }

      // Write the data to HDFS
      try {
        bucketWriter.append(event);
      } catch (BucketClosedException ex) {
        ...
      }
    }
    ...
    // flush all pending buckets before committing the transaction
    for (BucketWriter bucketWriter : writers) {
      bucketWriter.flush();
    }
    // 事务提交
    transaction.commit();
    ...
  } catch (IOException eIO) {
    // 发生异常进行事务回滚
    transaction.rollback();
    LOG.warn("HDFS IO error", eIO);
    return Status.BACKOFF;
  } catch (Throwable th) {
    transaction.rollback();
    LOG.error("process failed", th);
    if (th instanceof Error) {
      throw (Error) th;
    } else {
      throw new EventDeliveryException(th);
    }
  } finally {
    transaction.close();
  }
}

sink的process中先从对应的channel中得到Transaction,然后调用begin开始执行事务,然后开始处理数据。
处理数据时,调用channel.take从channel中take一条event,take最终调用的是KafkaTransaction.doTake,代码如下:

protected Event doTake() throws InterruptedException {
  // 事务类型
  type = TransactionType.TAKE;
  try {
    // channelUUID是final类型的,那一个kafkachannel实例只有一个consumer?
    if (!(consumerAndRecords.get().uuid.equals(channelUUID))) {
      logger.info("UUID mismatch, creating new consumer");
      decommissionConsumerAndRecords(consumerAndRecords.get());
      consumerAndRecords.remove();
    }
  } catch (Exception ex) {
    logger.warn("Error while shutting down consumer", ex);
  }
  if (!events.isPresent()) {
    events = Optional.of(new LinkedList<Event>());
  }
  Event e;
  // Give the channel a chance to commit if there has been a rebalance
  if (rebalanceFlag.get()) {
    logger.debug("Returning null event after Consumer rebalance.");
    return null;
  }
  if (!consumerAndRecords.get().failedEvents.isEmpty()) {
    e = consumerAndRecords.get().failedEvents.removeFirst();
  } else {
    if ( logger.isTraceEnabled() ) {
      logger.trace("Assignment during take: {}",
          consumerAndRecords.get().consumer.assignment().toString());
    }
    try {
      long startTime = System.nanoTime();
      if (!consumerAndRecords.get().recordIterator.hasNext()) {
        consumerAndRecords.get().poll();
      }
      if (consumerAndRecords.get().recordIterator.hasNext()) {
        ConsumerRecord<String, byte[]> record = consumerAndRecords.get().recordIterator.next();
        e = deserializeValue(record.value(), parseAsFlumeEvent);
        TopicPartition tp = new TopicPartition(record.topic(), record.partition());
        OffsetAndMetadata oam = new OffsetAndMetadata(record.offset() + 1, batchUUID);
        consumerAndRecords.get().saveOffsets(tp,oam);

        //Add the key to the header
        if (record.key() != null) {
          e.getHeaders().put(KEY_HEADER, record.key());
        }

        long endTime = System.nanoTime();
        counter.addToKafkaEventGetTimer((endTime - startTime) / (1000 * 1000));

        if (logger.isDebugEnabled()) {
          logger.debug("{} processed output from partition {} offset {}",
              new Object[] {getName(), record.partition(), record.offset()});
        }
      } else {
        return null;
      }
    } catch (Exception ex) {
      logger.warn("Error while getting events from Kafka. This is usually caused by " +
                  "trying to read a non-flume event. Ensure the setting for " +
                  "parseAsFlumeEvent is correct", ex);
      throw new ChannelException("Error while getting events from Kafka", ex);
    }
  }
  eventTaken = true;
  events.get().add(e);
  return e;
}

doTake其实就是使用consumer消费kafka。理想情况下应该让一个consumer消费多个topic的一个partition但这里consumer是和channelUUID对应的,而channelUUID又是final类型的,那是不是说kafkachannel实例中只有一个consumer
这里的消费逻辑是consumer通过poll将数据拉到本地内存中,然后在sink中一条一条的取,每取一条offset就加1,内存取完之后再调用一次poll。
sink拿到event之后,根据event的信息放入相应的bucketWriter中,取出batchSize大小之后将所有的bucketWriter进行一次flush。flush成功之后进行事务的commit

commit调用的是doCommit,下面看下代码:

protected void doCommit() throws InterruptedException {
  logger.trace("Starting commit");
  if (type.equals(TransactionType.NONE)) {
    return;
  }
  if (type.equals(TransactionType.PUT)) {
    ...
  } else {
    // event taken ensures that we have collected events in this transaction
    // before committing
    // commit之前要保证当前事务中的event都被采集了
    if (consumerAndRecords.get().failedEvents.isEmpty() && eventTaken) {
      logger.trace("About to commit batch");
      long startTime = System.nanoTime();
      // 提交offset
      consumerAndRecords.get().commitOffsets();
      long endTime = System.nanoTime();
      counter.addToKafkaCommitTimer((endTime - startTime) / (1000 * 1000));
      if (logger.isDebugEnabled()) {
        logger.debug(consumerAndRecords.get().getCommittedOffsetsString());
      }
    }

    int takes = events.get().size();
    if (takes > 0) {
      counter.addToEventTakeSuccessCount(takes);
      events.get().clear();
    }
  }
}

这里offset是手动触发的,调用的是kafka consumer的apiconsumer.commitSync(offsets)
如果在commit或者flush的过程中发生异常,则进行事务回滚,代码如下:

protected void doRollback() throws InterruptedException {
  if (type.equals(TransactionType.NONE)) {
    return;
  }
  if (type.equals(TransactionType.PUT)) {
    ...
  } else {
    // 回滚次数统计
    counter.addToRollbackCounter(events.get().size());
    // 将内存中的event放入failedEvents中
    consumerAndRecords.get().failedEvents.addAll(events.get());
    events.get().clear();
  }
}

总结

flume的事务保证了数据不会丢失,是flume中一个重要的概念。

疑虑

HdfsSink 和 kafkachannel consumer都是单线程吗?
一个kafkachannel实例一个consumer,sink从consumer中取数,然后分给不同的bucketWriter,可以认为consumer是单线程,处理数据是多线程?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值