Flink 修改BucketingSink解决小文件的问题

最新推荐文章于 2023-09-05 20:25:27 发布

CarbonDioxide12138

最新推荐文章于 2023-09-05 20:25:27 发布

阅读量2.5k

点赞数

分类专栏： flink 运维开发

本文链接：https://blog.csdn.net/sinat_32176947/article/details/107473154

版权

运维开发同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

flink

4 篇文章 1 订阅

订阅专栏

Flink 修改BucketingSink解决小文件的问题

0、背景
1、BucketingSink 解析
2、修改

0、背景

用flink往hdfs写文件的时候，我们一般会用到HDFS Connector 。其实flink里提供了两种HDFS Connector，一个是Bucketing File Sink，另一个是Streaming File Sink。因为我们使用的Hadoop是2.6.5的，对StreamFileSink的支持不太好，所以在我们构建我们的基于Flink的数据传输平台的时候，就选了Bucketing File Sink。
因为我们有的业务量不大，数据断断续续的从kafka过来，使用BucketingSink的时候就发现HDFS上生成了很多的小文件，这给我们本就不堪重负的namenode和hdfs增加了不少负担，于是我们不得不着手解决问题，当然了，写一个定时任务每天去合并小文件也是一个非常不错的办法。

1、BucketingSink 解析

要改BucketingSink 之前，得先看看它的逻辑，主要看看invoke这个方法的逻辑吧，这个方法从getBucketPath这里获取要写往的hdfs目录，我们可以重写这个方法来实现基于业务的servicePath和eventTime时间来确定往哪个目录写。通过shouldRoll这个方法，来判断是否该新建一个文件，至于为什么flink要新建这么多文件，其实是为了确保端到端的恰好一次而做的两阶段提交导致了，在体量大的公司根本不会有这样的问题，也难怪社区不关注这一点了。

	public void invoke(T value) throws Exception {
		Path bucketPath = bucketer.getBucketPath(clock, new Path(basePath), value);

		long currentProcessingTime = processingTimeService.getCurrentProcessingTime();

		BucketState<T> bucketState = state.getBucketState(bucketPath);
		if (bucketState == null) {
			bucketState = new BucketState<>(currentProcessingTime);
			state.addBucketState(bucketPath, bucketState);
		}

		if (shouldRoll(bucketState, currentProcessingTime)) {
			openNewPartFile(bucketPath, bucketState);
		}

		bucketState.writer.write(value);
		bucketState.lastWrittenToTime = currentProcessingTime;
	}

默认的shouldRoll逻辑主要是两个，一个是文件大小达到指定的配置（batchSize）时，一个是长时间没有写入的情况。我们主要是触发了第二个阈值（batchRolloverInterval），所以导致了很多小文件。

	private boolean shouldRoll(BucketState<T> bucketState, long currentProcessingTime) throws IOException {
		boolean shouldRoll = false;
		int subtaskIndex = getRuntimeContext().getIndexOfThisSubtask();
		if (!bucketState.isWriterOpen) {
			shouldRoll = true;
			LOG.debug("BucketingSink {} starting new bucket.", subtaskIndex);
		} else {
			long writePosition = bucketState.writer.getPos();
			if (writePosition > batchSize) {
				shouldRoll = true;
				LOG.debug(
					"BucketingSink {} starting new bucket because file position {} is above batch size {}.",
					subtaskIndex,
					writePosition,
					batchSize);
			} else {
				if (currentProcessingTime - bucketState.creationTime > batchRolloverInterval) {
					shouldRoll = true;
					LOG.debug(
						"BucketingSink {} starting new bucket because file is older than roll over interval {}.",
						subtaskIndex,
						batchRolloverInterval);
				}
			}
		}
		return shouldRoll;
	}

确定了是否需要滚动文件之后，由一个openNewPartFile的方法来打开文件，可以看到这里就是遍历目录下的所有文件，part-0-0，part-0-1…直到不存在的一个part-0-x为止，然后打开这个part-0-x，开始继续写文件。

private void openNewPartFile(Path bucketPath, BucketState<T> bucketState) throws Exception {
		......
		int subtaskIndex = getRuntimeContext().getIndexOfThisSubtask();
		Path partPath = assemblePartPath(bucketPath, subtaskIndex, bucketState.partCounter);
		while (fs.exists(partPath) ||
				fs.exists(getPendingPathFor(partPath)) ||
				fs.exists(getInProgressPathFor(partPath))) {
			bucketState.partCounter++;
			partPath = assemblePartPath(bucketPath, subtaskIndex, bucketState.partCounter);
		}

		// Record the creation time of the bucket
		bucketState.creationTime = processingTimeService.getCurrentProcessingTime();

		// increase, so we don't have to check for this name next time
		bucketState.partCounter++;
		......
		bucketState.writer.open(fs, inProgressPath);
	}

2、修改

既然openNewPartFile是无脑打开新文件，我们再它里面加一个判断，让他在遍历的时候，再检查一下文件大小有没有达到我们设定的batchsize，如果没有达到，重新打开那个文件就好了。

private void openNewPartFile(Path bucketPath, BucketState<T> bucketState) throws Exception {
	......
		while (fs.exists(partPath) ||
				fs.exists(getPendingPathFor(partPath)) ||
				fs.exists(getInProgressPathFor(partPath))) {
			bucketState.partCounter++;
			partPath = assemblePartPath(bucketPath, subtaskIndex, bucketState.partCounter);
		}
		Path inProgressPath;
		Path lastPartPath = null;
		Path lastFilePath = null;
		boolean shouldOpenNewFile = true;
		if(bucketState.partCounter>0) {
			lastPartPath = assemblePartPath(bucketPath, subtaskIndex, bucketState.partCounter - 1);
			if(fs.exists(lastPartPath)) {
				lastFilePath = lastPartPath;
			} else if (fs.exists(getPendingPathFor(lastPartPath))){
				lastFilePath=getPendingPathFor(lastPartPath);
			}else if (fs.exists(getInProgressPathFor(lastPartPath))){
				lastFilePath=getInProgressPathFor(lastPartPath);
			}
            shouldOpenNewFile = fs.getFileStatus(lastFilePath).getLen()>=batchSize;           
		}

		if(shouldOpenNewFile){
			bucketState.creationTime = processingTimeService.getCurrentProcessingTime();
			// increase, so we don't have to check for this name next time
			bucketState.partCounter++;
			LOG.debug("Next part path is {}", partPath.toString());
			bucketState.currentFile = partPath.toString();
			inProgressPath=getInProgressPathFor(partPath);
		}else {
			fs.rename(lastFilePath,getInProgressPathFor(lastPartPath));
			inProgressPath = getInProgressPathFor(lastPartPath);
			bucketState.currentFile = lastPartPath.toString();
		}
		......
		bucketState.writer.open(fs, inProgressPath);
		bucketState.isWriterOpen = true;
	}

改完了这里我们也破坏了flink自带的checkpoint两阶段提交的恰好一次特性，这对于异常恢复的时候，业务方可能会有数据重复，这也是他们不能接受的，好在有一点是我们的checkpoint里会记录正在写的那个文件的有效长度，我们在恢复的时候，就可以读到这个位置，然后把这个位置后面的数据丢掉，也可以变相实现恰好一次。BucketingSink的异常恢复调用关系是这样的initializeState->handleRestoredBucketState->handlePendingInProgressFile。这个handlePendingInProgressFile方法的作用就是在checkpoint的时候，把in-progress文或者pending文件恢复成有效文件。所以我们的改造就是

private void handlePendingInProgressFile(String file, long validLength) {
	......
				if (refTruncate != null) {
					......
				} else {
					Path validLengthFilePath = getValidLengthPathFor(partPath);
					if (!fs.exists(validLengthFilePath) && fs.exists(partPath)) {
						LOG.debug("Writing valid-length file for {} to specify valid length {}", partPath, validLength);
//						try (FSDataOutputStream lengthFileOut = fs.create(validLengthFilePath)) {
//							lengthFileOut.writeUTF(Long.toString(validLength));
//						}
						LOG.info("we don't need valid lenth file,so we remove it valid-length file");
						LOG.info("valid length is "+validLength+" bytes");
						LOG.info("last time writing file size is " + fs.getFileStatus(partPath).getLen());
						FSDataInputStream inputStream = fs.open(partPath);
						FSDataOutputStream outputStream = fs.create(getTranPathFor(partPath));

						byte[] buf = new byte[4096];
						long alreadyWriteSize = 0L;
						for(int bytesRead = inputStream.read(buf); bytesRead >= 0 ; bytesRead = inputStream.read(buf)) {
							if(validLength-alreadyWriteSize>=4096) {
								outputStream.write(buf, 0, bytesRead);
								alreadyWriteSize += bytesRead;
							}else if(validLength-alreadyWriteSize<4096 && validLength-alreadyWriteSize>=0){
								outputStream.write(buf,0,(int)(validLength-alreadyWriteSize));
								alreadyWriteSize += (int)(validLength-alreadyWriteSize);
							} else {
								//do nothing because these byte are larger than valid-length
							}
						}
						inputStream.close();
						outputStream.close();

						LOG.info("remove old file and move transation file to writing file");
						fs.delete(partPath,false);
						fs.rename(getTranPathFor(partPath),partPath);

						LOG.info("remove file which partCounter larger than current bucket");
						try {
							Path currentFileParent = partPath.getParent();
							String partName = partPath.getName();
							int currentSubIndex = Integer.parseInt(partName.split("-")[1]);
							int validCounter = Integer.parseInt(partName.split("-")[2].split("\\.")[0]);

							for (FileStatus fileStatus : fs.listStatus(currentFileParent)) {
								int tmpSubIndex = Integer.parseInt(fileStatus.getPath().getName().split("-")[1]);
								int tmpCounter = Integer.parseInt(fileStatus.getPath().getName().split("-")[2].split("\\.")[0]);
								if (tmpSubIndex == currentSubIndex && tmpCounter > validCounter) {
									LOG.info("removing "+fileStatus.getPath());
									fs.delete(fileStatus.getPath(), false);
								}
							}
						}catch (Exception e){
							System.out.println(e.getMessage());
							System.out.println("delete invalid part file failed,please check it ");
						}
					}
				}

			} catch (IOException e) {
				LOG.error("Error while restoring BucketingSink state.", e);
				throw new RuntimeException("Error while restoring BucketingSink state.", e);
			} catch (InvocationTargetException | IllegalAccessException e) {
				LOG.error("Could not invoke truncate.", e);
				throw new RuntimeException("Could not invoke truncate.", e);
			}
		}
	}
	
	private Path getTranPathFor(Path path) {
		return new Path(path.getParent(), path.getName()+".transaction");
	}

核心逻辑就是在恢复的时候，先读取有效长度，再把最后一个part打开，按字节读取，读取到有效长度之后，后面的脏数据就不读了，把这些已经读出来的数据，写到一个新文件里，最后删除旧part，把临时part重命名成最后一个part，然后打开文件供后续写入。

CarbonDioxide12138

关注

0
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
Flink 修改BucketingSink解决小文件的问题

Flink 修改BucketingSink解决小文件的问题0、背景1、BucketingSink 解析2、修改0、背景     用flink往hdfs写文件的时候，我们一般会用到HDFS Connector 。其实flink里提供了两种HDFS Connector，一个是Bucketing File Sink，另一个是Streaming File Sink。因为我们使用的Hadoop是2.6.5的，对StreamFileSink的支持不太好，所以在我们构建我们的基
复制链接

扫一扫