在使用Flume过程中为了不让channel的事件丢失,使用了fileChannel做持久化。应用场景是Flume使用kafka的source去读取kafka的数据,有一天kafka数据量暴增,导致Flume sink处理压力增大,sink的速度赶不上source的速度,从而使得fileChannel持久化的log文件一直增长,最后面把磁盘撑满,Flume报错停止工作。在清理了一些log文件之后来之后,Flume开始工作,但是有log文件一直没有删除掉,而且为了让Flume工作手动删除了一些log文件造成了事件丢失。为了找到应对这种情况的方法,需要弄明白下面几个问题:
- Flume有没有类似于Flink的反压机制
- fileChannel持久化文件的生成机制
- fileChannel事务机制
- Flume删除log的机制
一、首先弄明白持久化文件的生成机制
Flume source采集到数据后会往channel发送事件,channel处理事件的逻辑集中在processEventBatch这个方法里
for (Channel ch : reqChannels) {
List<Event> eventQueue = reqChannelQueue.get(ch);
if (eventQueue == null) {
eventQueue = new ArrayList<Event>();
reqChannelQueue.put(ch, eventQueue);
}
eventQueue.add(event);
}
List<Channel> optChannels = selector.getOptionalChannels(event);
for (Channel ch : optChannels) {
List<Event> eventQueue = optChannelQueue.get(ch);
if (eventQueue == null) {
eventQueue = new ArrayList<Event>();
optChannelQueue.put(ch, eventQueue);
}
eventQueue.add(event);
}
先遍历获取channel和事件之间的关系,然后再遍历一个个channel批量处理事件
Flume批量处理事件用到了事务
tx.begin();
List<Event> batch = reqChannelQueue.get(reqChannel);
for (Event event : batch) {
reqChannel.put(event);
}
tx.commit();
这里事务开始没有做任何操作
进入reqChannel.put(event)看下代码实现,跟踪到了FileChannel的doPut方法,这个方法里调用了Log.put方法:
try {
try {
FlumeEventPointer ptr = logFiles.get(logFileIndex).put(buffer);
error = false;
return ptr;
} catch (LogFileRetryableIOException e) {
if (!open) {
throw e;
}
roll(logFileIndex, buffer);
FlumeEventPointer ptr = logFiles.get(logFileIndex).put(buffer);
error = false;
return ptr;
}
} finally {
if (error && open) {
roll(logFileIndex);
}
}
在这个方法里可以看到会先从logFiles里取出LogFile.Writter,然后调用LogFile.Writter的put方法,这里有个Exception LogFileRetryableIOException,跟踪到LogFile.Writter put方法里,最终调用的是write方法
private Pair<Integer, Integer> write(ByteBuffer buffer)
throws IOException {
if (!isOpen()) {
throw new LogFileRetryableIOException("File closed " + file);
}
long length = position();
long expectedLength = length + (long) buffer.limit();
if (expectedLength > maxFileSize) {
throw new LogFileRetryableIOException(expectedLength + " > " +
maxFileSize);
}
int offset = (int) length;
Preconditions.checkState(offset >= 0, String.valueOf(offset));
// OP_RECORD + size + buffer
int recordLength = 1 + (int) Serialization.SIZE_OF_INT + buffer.limit();
usableSpace.decrement(recordLength);
preallocate(recordLength);
ByteBuffer toWrite = ByteBuffer.allocate(recordLength);
toWrite.put(OP_RECORD);
writeDelimitedBuffer(toWrite, buffer);
toWrite.position(0);
int wrote = getFileChannel().write(toWrite);
Preconditions.checkState(wrote == toWrite.limit());
return Pair.of(getLogFileID(), offset);
}
在这里可以看到有判断expectedLength是不是大于maxFileSize,这里的maxFileSize就是在conf里指定的fileChannel的maxFileSize,如果expectedLength大于maxFileSize就会抛出LogFileRetryableIOException错误,在外层捕获后就会调用roll方法,在roll方法里会先将已经写满的LogFile.Writter先关闭,然后重新生成一个新的LogFile.Writter替换现有的。
private synchronized void roll(int index, ByteBuffer buffer)
throws IOException {
lockShared();
try {
LogFile.Writer oldLogFile = logFiles.get(index);
// check to make sure a roll is actually required due to
// the possibility of multiple writes waiting on lock
if (oldLogFile == null || buffer == null ||
oldLogFile.isRollRequired(buffer)) {
try {
LOGGER.info("Roll start " + logDirs[index]);
int fileID = nextFileID.incrementAndGet();
File file = new File(logDirs[index], PREFIX + fileID);
LogFile.Writer writer = LogFileFactory.getWriter(file, fileID,
maxFileSize, encryptionKey, encryptionKeyAlias,
encryptionCipherProvider, usableSpaceRefreshInterval,
fsyncPerTransaction, fsyncInterval);
idLogFileMap.put(fileID, LogFileFactory.getRandomReader(file,
encryptionKeyProvider, fsyncPerTransaction));
// writer from this point on will get new reference
logFiles.set(index, writer);
// close out old log
if (oldLogFile != null) {
oldLogFile.close();
}
} finally {
LOGGER.info("Roll end");
}
}
} finally {
unlockShared();
}
}
到这里为止已经弄明白了fileChannel的持久化文件的生成策略,接下来弄明白Flume是怎么使用Log的,也就是Flume fileChannel的事务机制
二、fileChannel的事务机制
Flume fileChannel有三个队列,put、inFlightPuts、queue、inflightTakes、take,在之前讲到的往log里持久化事件时,还会往queue队列的inFlightPuts队列里增加事件,也就是下面代码中的queue.addWithoutCommit(ptr, transactionID),这里以当前的transactionID为key,以事件的存储指针为value往inFlightPuts里添加数据,这里事件的存储指针存储了这个事件持久化的文件ID,通过这个指针能够找到当前正在用的log文件有哪些,这个在后续删除无用的log可以用到。
try {
FlumeEventPointer ptr = log.put(transactionID, event);
Preconditions.checkState(putList.offer(ptr), "putList offer failed "
+ channelNameDescriptor);
queue.addWithoutCommit(ptr, transactionID);
success = true;
} catch (IOException e) {
throw new ChannelException("Put failed due to IO error "
+ channelNameDescriptor, e);
} finally {
log.unlockShared();
if (!success) {
// release slot obtained in the case
// the put fails for any reason
queueRemaining.release();
}
}
在批量put操作之后,是commit
log.commitPut(transactionID);
channelCounter.addToEventPutSuccessCount(puts);
synchronized (queue) {
while (!putList.isEmpty()) {
if (!queue.addTail(putList.removeFirst())) {
StringBuilder msg = new StringBuilder();
msg.append("Queue add failed, this shouldn't be able to ");
msg.append("happen. A portion of the transaction has been ");
msg.append("added to the queue but the remaining portion ");
msg.append("cannot be added. Those messages will be consumed ");
msg.append("despite this transaction failing. Please report.");
msg.append(channelNameDescriptor);
LOG.error(msg.toString());
Preconditions.checkState(false, msg.toString());
}
}
queue.completeTransaction(transactionID);
总结一下,flume source将事件推送到channel的过程是个事务,会先用log把put操作持久化,然后调用putList.offer方法将事件放入putList,成功之后调用queue的addWithoutCommit方法将事件加入queue的inflightPuts队列,在commit事务时会先用log把commitPut操作持久化,成功之后调用queue的completeTransaction方法根据当前transactionID 将inFlightPuts里对应的事件删掉
这里还需要注意的是queue.addTail方法里:
synchronized boolean addTail(FlumeEventPointer e) {
if (getSize() == backingStore.getCapacity()) {
return false;
}
long value = e.toLong();
Preconditions.checkArgument(value != EMPTY);
backingStore.incrementFileID(e.getFileID());
add(backingStore.getSize(), value);
return true;
}
会判断当前的channel里存放的事件数目超过了capacity就报错。所以flume有控制channel最大事件数量的手段,不会无限制处理。
Sink从channel中拿事件也是一个事务,下面以kafkaSink为例。
kafkaSink process方法中先从channel中获取事务,然后运行begin方法,begin同样没有做什么,之后调用channel.take方法获取事件
try {
while (true) {
FlumeEventPointer ptr = queue.removeHead(transactionID);
if (ptr == null) {
return null;
} else {
try {
// first add to takeList so that if write to disk
// fails rollback actually does it's work
Preconditions.checkState(takeList.offer(ptr),
"takeList offer failed "
+ channelNameDescriptor);
log.take(transactionID, ptr); // write take to disk
Event event = log.get(ptr);
return event;
} catch (IOException e) {
throw new ChannelException("Take failed due to IO error "
+ channelNameDescriptor, e);
} catch (NoopRecordException e) {
LOG.warn("Corrupt record replaced by File Channel Integrity " +
"tool found. Will retrieve next event", e);
takeList.remove(ptr);
} catch (CorruptEventException ex) {
if (fsyncPerTransaction) {
throw new ChannelException(ex);
}
LOG.warn("Corrupt record found. Event will be " +
"skipped, and next event will be read.", ex);
takeList.remove(ptr);
}
}
}
} finally {
log.unlockShared();
}
queue.removeHead(transactionID)会将事件加入inflightTakes队列,之后takeList.offer将事件加入takeList,成功之后log将take操作持久化。
在批量获取事件成功之后,就是提交事务
log.lockShared();
try {
log.commitTake(transactionID);
queue.completeTransaction(transactionID);
channelCounter.addToEventTakeSuccessCount(takes);
} catch (IOException e) {
throw new ChannelException("Commit failed due to IO error "
+ channelNameDescriptor, e);
} finally {
log.unlockShared();
}
queueRemaining.release(takes);
fileChannel的doCommit方法在takes大于0时会进入到上面的程序处理逻辑
首先log持久化commitTake操作,然后在queue.completeTransaction里删除inflightTakes队列的事件。
接下来看rollBack方法:
putList.clear();
takeList.clear();
queue.completeTransaction(transactionID);
会把putList和takeList清空,然后调用queue.completeTransaction把inflightPuts和inflightTakes里根据transactionId把事件删除。
这里把事务机制说明了下,但是现在还没有弄清楚当log进行持久化之后,这些持久化文件什么时候删除。
三、log过期文件删除机制
这里把事务机制说明了下,但是现在还没有弄清楚当log进行持久化之后,这些持久化文件什么时候删除。
logFileRefCountsAll = queue.getFileIDs();
首先会去看队列里有没有正在处理的事件,把存放这些事件的file找出来
for (int index = 0; index < logDirs.length; index++) {
logFileRefCountsActive.add(logFiles.get(index).getLogFileID());
}
然后获取当前正在写的logFile对应的file,原理是去查看inflightTakes和inslightPuts里事件对应的持久化文件。
最后把小于当前正在写的file的id,而且在queue里没有在处理的事件对应的file的id的全部找出来删除,所以如果有事件在queue里,那么事件对应的持久化file就不会被删除。但是每次source或者sink去处理完事务之后,都会调用queue.completeTransaction把inflightPuts和inflightTakes里根据transactionId把事件删除,过期的持久化文件还是会被定时线程捞出来。这里就要说到transactionId的生成逻辑了,在每次transaction结束之后都会把transaction关闭,下次再打开时会去重新生成transactionId,所以如果没有在报错的时候去调用rollBack方法就会导致过期文件没有删除。
public Transaction getTransaction() {
if (!initialized) {
synchronized (this) {
if (!initialized) {
initialize();
initialized = true;
}
}
}
BasicTransactionSemantics transaction = currentTransaction.get();
if (transaction == null || transaction.getState().equals(
BasicTransactionSemantics.State.CLOSED)) {
transaction = createTransaction();
currentTransaction.set(transaction);
}
return transaction;
}
再回头看开头的几个问题基本已经解释清楚,但是仔细看flume的source和sink代码,会发现出错都会rollBack,不可能出现过期文件没有删除的问题,于是经过了解,发现sink是自己实现的,看了sink代码发现出错时会commit,但是commit是先持久化commit操作再去调用queue.completeTransaction,不过此时已经磁盘空间不足了,就没有来得及调用queue.completeTransaction,于是旧的transactionId存留在inflightTakes里,导致过期文件删除不掉。