并发导致的问题
湖仓一体中,一个实现方向是数据湖增加数据仓库能力,使其满足数据仓库具有的一些特性。而事务是一个较为重要的能力。通过事务,我们可以更好的管理数据、保证数据的正确性。在数据仓库中,事务是并发控制中非常重要的一环。然而,事务在数据湖中的支持并不够完美。仍然会有一些并发导致的问题出现。假设当前没有任何并发控制机制,若两个请求同时upsert同一条数据,那么最后会以谁的为准?若同时有两个replace commit发生,一个是drop partition发起的,一个是clustering发起的,同时replace了同一个file id,但clustering根据这个写出了一个新的文件。那么这个file id的数据到底该不该被删除?因此,需要解决很多由并发导致的问题。
Lakehouse Concurrency Control: Are we too optimistic?中提出了三种写入情况,也分析了当前Hudi面临的并发问题。
一种简单而激进的解决方案是使用锁来解决并发下的数据准确性问题。然而,锁往往在大多数情况下的代价都很高。即使可以避免死锁的发生,锁也会带来很多性能损耗。因此,社区的发展方向是尽量避免使用锁。
为了保证并发情况下的数据准确性,社区也做出了很多工作。其中,乐观并发控制(OCC)是一个重要的功能。
OCC
Hudi为了避免这种并发写入的错误发生,在时间线上实现了一个基于文件级别的基于日志的并发控制协议,该协议依赖于对云存储的最小原子写入。通过这种方式,Hudi能够提供一些灵活的部署模型,相比仅跟踪表快照的纯OCC方法,提供更高的并发性能。
在Hudi的OCC中,如果是博客中提出的多写入者情况下,每次提交元数据前都会进行一次冲突检测。
这是BaseCommitActionExecutor中的自动提交的函数代码,以这个为例,主要进行了以下操作:
- 创建一个表示当前事务正在进行中的HoodieInstant。
- 检查事务管理器是否已初始化,并根据此检查状态抛出异常。
- 获取事务管理器实例并开始事务。如果存在上一个已完成的事务,将其作为参数传递给beginTransaction方法。
- 设置提交元数据。
- 检测写入冲突。这里重新加载存活时间线,以获取当前事务启动后的所有更新。
- 执行提交操作,包括额外的元数据。
- 最后,结束当前事务。
protected void autoCommit(Option<Map<String, String>> extraMetadata, HoodieWriteMetadata<O> result) {
final Option<HoodieInstant> inflightInstant = Option.of(new HoodieInstant(State.INFLIGHT,
getCommitActionType(), instantTime));
ValidationUtils.checkState(this.txnManagerOption.isPresent(), "The transaction manager has not been initialized");
TransactionManager txnManager = this.txnManagerOption.get();
txnManager.beginTransaction(inflightInstant,
lastCompletedTxn.isPresent() ? Option.of(lastCompletedTxn.get().getLeft()) : Option.empty());
try {
setCommitMetadata(result);
// reload active timeline so as to get all updates after current transaction have started. hence setting last arg to true.
TransactionUtils.resolveWriteConflictIfAny(table, txnManager.getCurrentTransactionOwner(),
result.getCommitMetadata(), config, txnManager.getLastCompletedTransactionOwner(), true, pendingInflightAndRequestedInstants);
commit(extraMetadata, result);
} finally {
txnManager.endTransaction(inflightInstant);
}
}
重点就是其中的第5步。也就是TransactionUtils.resolveWriteConflictIfAny这个函数。
public static Option<HoodieCommitMetadata> resolveWriteConflictIfAny(
final HoodieTable table,
final Option<HoodieInstant> currentTxnOwnerInstant,
final Option<HoodieCommitMetadata> thisCommitMetadata,
final HoodieWriteConfig config,
Option<HoodieInstant> lastCompletedTxnOwnerInstant,
boolean reloadActiveTimeline,
Set<String> pendingInstants) throws HoodieWriteConflictException {
// 这里这个判断便是在判断是否是多writer写入的情况,如果是的话需要进行这个检测
if (config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) {
// deal with pendingInstants
Stream<HoodieInstant> completedInstantsDuringCurrentWriteOperation = getCompletedInstantsDuringCurrentWriteOperation(table.getMetaClient(), pendingInstants);
// 获得对应的resolutionStrategy
ConflictResolutionStrategy resolutionStrategy = config.getWriteConflictResolutionStrategy();
// 根据传入参数决定是否重加载timeline 其实就两种情况:自动提交为true 否则为false(这种情况下table是新加载的)
if (reloadActiveTimeline) {
table.getMetaClient().reloadActiveTimeline();
}
// 获得需要判断冲突的instant
Stream<HoodieInstant> instantStream = Stream.concat(resolutionStrategy.getCandidateInstants(
table.getMetaClient(), currentTxnOwnerInstant.get(), lastCompletedTxnOwnerInstant),
completedInstantsDuringCurrentWriteOperation);
// 获得本次action的信息
final ConcurrentOperation thisOperation = new ConcurrentOperation(currentTxnOwnerInstant.get(), thisCommitMetadata.orElse(new HoodieCommitMetadata()));
// 开始遍历instant 检测冲突
instantStream.forEach(instant -> {
try {
ConcurrentOperation otherOperation = new ConcurrentOperation(instant, table.getMetaClient());
if (resolutionStrategy.hasConflict(thisOperation, otherOperation)) {
LOG.info("Conflict encountered between current instant = " + thisOperation + " and instant = "
+ otherOperation + ", attempting to resolve it...");
resolutionStrategy.resolveConflict(table, thisOperation, otherOperation);
}
} catch (IOException io) {
throw new HoodieWriteConflictException("Unable to resolve conflict, if present", io);
}
});
LOG.info("Successfully resolved conflicts, if any");
return thisOperation.getCommitMetadataOption();
}
return thisCommitMetadata;
}
也就是说,其关键步骤主要有以下几个:
- 获得对应的resolutionStrategy
- 重新加载timeline
- 获得需要去检查的instant集合
- 用当前操作和instant集合检测冲突
我们这里一步步的来看一下:
首先resolutionStrategy是通过hudi配置来指定的:
可以看到,如果没有指定策略的话,如果设置了bucket index,则会使用BucketIndexConcurrentFileWritesConflictResolutionStrategy类,否则则使用默认值SimpleConcurrentFileWritesConflictResolutionStrategy。
除此之外,还有一个PreferWriterConflictResolutionStrategy策略。
根据其名字,我们就可以初步解释一下其作用。
BucketIndexConcurrentFileWritesConflictResolutionStrategy:在bucket级别判断冲突
SimpleConcurrentFileWritesConflictResolutionStrategy:最简单的检查冲突方式,如果有file id冲突就终止当前操作
PreferWriterConflictResolutionStrategy:保证正在写入的操作的高优先,即如果当前操作是table service与正在写入冲突,也要终止自己。
public static final ConfigProperty<String> WRITE_CONFLICT_RESOLUTION_STRATEGY_CLASS_NAME = ConfigProperty
.key(LOCK_PREFIX + "conflict.resolution.strategy")
.defaultValue(SimpleConcurrentFileWritesConflictResolutionStrategy.class.getName())
.withInferFunction(hoodieConfig -> {
if (HoodieIndex.IndexType.BUCKET.name().equalsIgnoreCase(hoodieConfig.getStringOrDefault(HoodieIndexConfig.INDEX_TYPE, null))) {
return Option.of(BucketIndexConcurrentFileWritesConflictResolutionStrategy.class.getName());
} else {
return Option.of(SimpleConcurrentFileWritesConflictResolutionStrategy.class.getName());
}
})
.markAdvanced()
.sinceVersion("0.8.0")
.withDocumentation("Lock provider class name, this should be subclass of "
+ "org.apache.hudi.client.transaction.ConflictResolutionStrategy");
reload timeline
这个就不谈了 肯定会reload的
获得需要去检查的instant集合
在SimpleConcurrentFileWritesConflictResolutionStrategy中有一个默认实现:
可以看到主要是获得了两个timeline:
1.所有在上次成功提交之后成功提交的commit
2.所有在这次提交开始之后pending的compaction和clustering
(这里其实是有点问题的,2并不能避免全部冲突,只能覆盖一小部分)
@Override
public Stream<HoodieInstant> getCandidateInstants(HoodieTableMetaClient metaClient, HoodieInstant currentInstant,
Option<HoodieInstant> lastSuccessfulInstant) {
HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
// To find which instants are conflicting, we apply the following logic
// 1. Get completed instants timeline only for commits that have happened since the last successful write.
// 2. Get any scheduled or completed compaction or clustering operations that have started and/or finished
// after the current instant. We need to check for write conflicts since they may have mutated the same files
// that are being newly created by the current write.
Stream<HoodieInstant> completedCommitsInstantStream = activeTimeline
.getCommitsTimeline()
.filterCompletedInstants()
.findInstantsAfter(lastSuccessfulInstant.isPresent() ? lastSuccessfulInstant.get().getTimestamp() : HoodieTimeline.INIT_INSTANT_TS)
.getInstantsAsStream();
Stream<HoodieInstant> compactionAndClusteringPendingTimeline = activeTimeline
.getTimelineOfActions(CollectionUtils.createSet(REPLACE_COMMIT_ACTION, COMPACTION_ACTION))
.findInstantsAfter(currentInstant.getTimestamp())
.filterInflightsAndRequested()
.getInstantsAsStream();
return Stream.concat(completedCommitsInstantStream, compactionAndClusteringPendingTimeline);
}
而在PreferWriterConflictResolutionStrategy中还有一个重写的方法:
这里主要是判断了两种情况:
第一种:当前操作是compaction或clustering时,会获得以下instant:
1.所有在提交开始后完成的instant
2.正在pending的instant
第二种:其他操作时,只获得1.所有在提交开始后完成的instant
从名字中我们也可以看出来,这个重写的方法更偏向于写操作,所以就算正在pending的写优先级也会被比compaction和clustering高
@Override
public Stream<HoodieInstant> getCandidateInstants(HoodieTableMetaClient metaClient, HoodieInstant currentInstant,
Option<HoodieInstant> lastSuccessfulInstant) {
HoodieActiveTimeline activeTimeline = metaClient.reloadActiveTimeline();
if ((REPLACE_COMMIT_ACTION.equals(currentInstant.getAction())
&& ClusteringUtils.isClusteringCommit(metaClient, currentInstant))
|| COMPACTION_ACTION.equals(currentInstant.getAction())) {
return getCandidateInstantsForTableServicesCommits(activeTimeline, currentInstant);
} else {
return getCandidateInstantsForNonTableServicesCommits(activeTimeline, currentInstant);
}
}
private Stream<HoodieInstant> getCandidateInstantsForNonTableServicesCommits(HoodieActiveTimeline activeTimeline, HoodieInstant currentInstant) {
// To find out which instants are conflicting, we apply the following logic
// Get all the completed instants timeline only for commits that have happened
// since the last successful write based on the transition times.
// We need to check for write conflicts since they may have mutated the same files
// that are being newly created by the current write.
List<HoodieInstant> completedCommitsInstants = activeTimeline
.getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, REPLACE_COMMIT_ACTION, COMPACTION_ACTION, DELTA_COMMIT_ACTION))
.filterCompletedInstants()
.findInstantsModifiedAfterByStateTransitionTime(currentInstant.getTimestamp())
.getInstantsOrderedByStateTransitionTime()
.collect(Collectors.toList());
LOG.info(String.format("Instants that may have conflict with %s are %s", currentInstant, completedCommitsInstants));
return completedCommitsInstants.stream();
}
/**
* To find which instants are conflicting, we apply the following logic
* Get both completed instants and ingestion inflight commits that have happened since the last successful write.
* We need to check for write conflicts since they may have mutated the same files
* that are being newly created by the current write.
*/
private Stream<HoodieInstant> getCandidateInstantsForTableServicesCommits(HoodieActiveTimeline activeTimeline, HoodieInstant currentInstant) {
// Fetch list of completed commits.
Stream<HoodieInstant> completedCommitsStream =
activeTimeline
.getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, REPLACE_COMMIT_ACTION, COMPACTION_ACTION, DELTA_COMMIT_ACTION))
.filterCompletedInstants()
.findInstantsModifiedAfterByStateTransitionTime(currentInstant.getTimestamp())
.getInstantsAsStream();
// Fetch list of ingestion inflight commits.
Stream<HoodieInstant> inflightIngestionCommitsStream =
activeTimeline
.getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, DELTA_COMMIT_ACTION))
.filterInflights()
.getInstantsAsStream();
// Merge and sort the instants and return.
List<HoodieInstant> instantsToConsider = Stream.concat(completedCommitsStream, inflightIngestionCommitsStream)
.sorted(Comparator.comparing(o -> o.getStateTransitionTime()))
.collect(Collectors.toList());
LOG.info(String.format("Instants that may have conflict with %s are %s", currentInstant, instantsToConsider));
return instantsToConsider.stream();
}
用当前操作和instant集合检测冲突
检测冲突的函数也有两个实现:
首先是SimpleConcurrentFileWritesConflictResolutionStrategy中的默认实现:
可以看到,就是判断当前操作和另一个instant操作涉及的fileid有没有交集
@Override
public boolean hasConflict(ConcurrentOperation thisOperation, ConcurrentOperation otherOperation) {
// TODO : UUID's can clash even for insert/insert, handle that case.
Set<Pair<String, String>> partitionAndFileIdsSetForFirstInstant = thisOperation.getMutatedPartitionAndFileIds();
Set<Pair<String, String>> partitionAndFileIdsSetForSecondInstant = otherOperation.getMutatedPartitionAndFileIds();
Set<Pair<String, String>> intersection = new HashSet<>(partitionAndFileIdsSetForFirstInstant);
intersection.retainAll(partitionAndFileIdsSetForSecondInstant);
if (!intersection.isEmpty()) {
LOG.info("Found conflicting writes between first operation = " + thisOperation
+ ", second operation = " + otherOperation + " , intersecting file ids " + intersection);
return true;
}
return false;
}
而BucketIndexConcurrentFileWritesConflictResolutionStrategy中也有一个重写方法。可以看到,这里的不同是将检测file id冲突变成了检测bucket id冲突。其他逻辑是一致的。
@Override
public boolean hasConflict(ConcurrentOperation thisOperation, ConcurrentOperation otherOperation) {
// TODO : UUID's can clash even for insert/insert, handle that case.
Set<String> partitionBucketIdSetForFirstInstant = thisOperation
.getMutatedPartitionAndFileIds()
.stream()
.map(partitionAndFileId ->
BucketIdentifier.partitionBucketIdStr(partitionAndFileId.getLeft(), BucketIdentifier.bucketIdFromFileId(partitionAndFileId.getRight()))
).collect(Collectors.toSet());
Set<String> partitionBucketIdSetForSecondInstant = otherOperation
.getMutatedPartitionAndFileIds()
.stream()
.map(partitionAndFileId ->
BucketIdentifier.partitionBucketIdStr(partitionAndFileId.getLeft(), BucketIdentifier.bucketIdFromFileId(partitionAndFileId.getRight()))
).collect(Collectors.toSet());
Set<String> intersection = new HashSet<>(partitionBucketIdSetForFirstInstant);
intersection.retainAll(partitionBucketIdSetForSecondInstant);
if (!intersection.isEmpty()) {
LOG.info("Found conflicting writes between first operation = " + thisOperation
+ ", second operation = " + otherOperation + " , intersecting bucket ids " + intersection);
return true;
}
return false;
}
至于检测到冲突之后会发生什么:
如果冲突操作是compact 并且instant早于当前的 或者是log_compaction,则正常提交 否则报错
public Option<HoodieCommitMetadata> resolveConflict(HoodieTable table,
ConcurrentOperation thisOperation, ConcurrentOperation otherOperation) {
// A completed COMPACTION action eventually shows up as a COMMIT action on the timeline.
// We need to ensure we handle this during conflict resolution and not treat the commit from a
// compaction operation as a regular commit. Regular commits & deltacommits are candidates for conflict.
// Since the REPLACE action with CLUSTER operation does not support concurrent updates, we have
// to consider it as conflict if we see overlapping file ids. Once concurrent updates are
// supported for CLUSTER (https://issues.apache.org/jira/browse/HUDI-1042),
// add that to the below check so that concurrent updates do not conflict.
if (otherOperation.getOperationType() == WriteOperationType.COMPACT) {
if (HoodieTimeline.compareTimestamps(otherOperation.getInstantTimestamp(), HoodieTimeline.LESSER_THAN, thisOperation.getInstantTimestamp())) {
return thisOperation.getCommitMetadataOption();
}
} else if (HoodieTimeline.LOG_COMPACTION_ACTION.equals(thisOperation.getInstantActionType())) {
// Since log compaction is a rewrite operation, it can be committed along with other delta commits.
// The ordering of the commits is taken care by AbstractHoodieLogRecordReader scan method.
// Conflict arises only if the log compaction commit has a lesser timestamp compared to compaction commit.
return thisOperation.getCommitMetadataOption();
}
// just abort the current write if conflicts are found
throw new HoodieWriteConflictException(new ConcurrentModificationException("Cannot resolve conflicts for overlapping writes"));
}
早检查机制
在多writer的场景下,Hudi的现有冲突检测发生在writer完成数据写入之后,提交元数据之前。这意味着所有计算和数据写入已经完成,但在开始提交时才检测到冲突的发生,导致资源的浪费。
举个例子: 假设有两个写入作业:job1向Hudi表写入10M的数据,包括对file group1的更新。另一个job2向Hudi表写入100G的数据,也更新了相同的file group1。 job1完成并成功提交给Hudi。几个小时后,job2完成了数据文件(100G)的写入,并开始提交元数据。此时发现与job1存在冲突,job2不得不中止,失败后重新运行。显然,job2浪费了大量的计算资源和时间。
Hudi目前具有两个重要的机制:标记机制和心跳机制: 标记机制可以跟踪属于活跃写入的所有文件。 心跳机制可以跟踪Hudi表的所有writer。
基于这些标记和心跳,hudi提出了一种新的冲突检测方法:早期冲突检测。在writer创建标记之前,以及开始写入文件之前,Hudi执行这种新的冲突检测,尝试尽早检测写入冲突(对于直接标记)或获取异步冲突检查结果(对于基于timeline服务器的标记),以在冲突发生时尽早中止写入程序,以便尽快释放计算资源,提高资源利用率。
标记机制
这里先简单介绍一下标记机制。
Hudi的marker file的作用是写入时标记写入是否完成的。
marker file是在createHandle/mergeHandle/appendHandle初始化的时候创建的。在write开始的时候会创建这个marker file。marker file是在提交之后删除的。也就是说,如果写入结束后marker file依然存在,则说明这次写入出现了问题,需要进行处理,比如回滚或者清理等。
marker file是通过文件名记录指定的写入的,包含三部分:数据文件名、标记扩展名 (.marker) 和创建文件的 I/O 操作(CREATE – inserts, MERGE – updates/deletes, or APPEND – either)。
有两种写入标记的方式:
1.直接将标记写入存储,这是一种传统的配置方式。
虽然相较于扫描整个表格查找未提交的数据文件而言,它具有更高的效率,但随着待写入的数据文件数量的增加,创建的标记文件数量也会增加。对于需要写入大量数据文件(例如10K或更多)的大规模写入操作,这可能会在云存储(如AWS S3)中创建性能瓶颈。在AWS S3中,每个文件的创建和删除调用都会触发一个HTTP请求,每个前缀在存储桶中每秒可以处理的请求数量存在速率限制。当需要同时写入大量数据文件和标记文件时,标记文件的操作可能会占用较长的时间,有时甚至达到几分钟或更长。
2.将标记写入timeline server,该server会在将标记写入存储之前批量处理标记请求(默认)。这种选项可以提高大文件的写入性
这里timeline server起到的是一个缓存服务器的作用,所有的marker会被存在内容中,并攒批写入hdfs 查询请求都是从内存中查询
早检测流程
早检测机制流程如图所示:
这个检测的入口在创建marker file的时候。判断用户是否开启supportsOptimisticConcurrencyControl和isEarlyConflictDetectionEnable这两个开关,开启的话则会进行早冲突检测。
public Option<Path> create(String partitionPath, String dataFileName, IOType type, HoodieWriteConfig writeConfig,
String fileId, HoodieActiveTimeline activeTimeline) {
if (writeConfig.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()
&& writeConfig.isEarlyConflictDetectionEnable()) {
// 判断有没有pending的compaction或者clusteing 现在还不支持table service的早冲突检测,直接返回
HoodieTimeline pendingCompactionTimeline = activeTimeline.filterPendingCompactionTimeline();
HoodieTimeline pendingReplaceTimeline = activeTimeline.filterPendingReplaceTimeline();
// TODO If current is compact or clustering then create marker directly without early conflict detection.
// Need to support early conflict detection between table service and common writers.
if (pendingCompactionTimeline.containsInstant(instantTime) || pendingReplaceTimeline.containsInstant(instantTime)) {
return create(partitionPath, dataFileName, type, false);
}
return createWithEarlyConflictDetection(partitionPath, dataFileName, type, false, writeConfig, fileId, activeTimeline);
}
return create(partitionPath, dataFileName, type, false);
}
createWithEarlyConflictDetection中主要是获得策略,随后开始检测并解决冲突。这里的逻辑与OCC是一致的。
@Override
public Option<Path> createWithEarlyConflictDetection(String partitionPath, String dataFileName, IOType type, boolean checkIfExists,
HoodieWriteConfig config, String fileId, HoodieActiveTimeline activeTimeline) {
String strategyClassName = config.getEarlyConflictDetectionStrategyClassName();
if (!ReflectionUtils.isSubClass(strategyClassName, DirectMarkerBasedDetectionStrategy.class)) {
LOG.warn("Cannot use " + strategyClassName + " for direct markers.");
strategyClassName = getDefaultEarlyConflictDetectionStrategy(MarkerType.DIRECT);
LOG.warn("Falling back to " + strategyClassName);
}
DirectMarkerBasedDetectionStrategy strategy =
(DirectMarkerBasedDetectionStrategy) ReflectionUtils.loadClass(strategyClassName,
fs, partitionPath, fileId, instantTime, activeTimeline, config);
strategy.detectAndResolveConflictIfNecessary();
return create(getMarkerPath(partitionPath, dataFileName, type), checkIfExists);
}
这里也是先介绍一下Marker冲突检测的策略。
他们的关系是这样的:
默认值是根据marker写入方式决定的 是SimpleDirectMarkerBasedDetectionStrategy
还是AsyncTimelineServerBasedDetectionStrategy
首先是最公共的接口:里面定义了策略类应该至少有三个方法① detectAndResolveConflictIfNecessary、②hasMarkerConflict和③resolveMarkerConflict,具体在哪些类有对应的实现也都标注在了图里。这些方法的能力从名字就能看出来,我们后续看一下他们的实现有什么不同。
@PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING)
public interface EarlyConflictDetectionStrategy {
/**
* Detects and resolves the write conflict if necessary.
*/
void detectAndResolveConflictIfNecessary() throws HoodieEarlyConflictDetectionException;
/**
* @return whether there's a write conflict based on markers.
*/
boolean hasMarkerConflict();
/**
* Resolves a write conflict.
*
* @param basePath Base path of the table.
* @param partitionPath Relative partition path.
* @param dataFileName Data file name.
*/
void resolveMarkerConflict(String basePath, String partitionPath, String dataFileName);
}
回到createWithEarlyConflictDetection方法,这里最主要的还是strategy.detectAndResolveConflictIfNecessary()方法,这里去实际检测了是否有冲突发生。
从策略的图中我们可以看到,这个方法在三个类中都有实现,其中AsyncTimelineServerBasedDetectionStrategy
和SimpleDirectMarkerBasedDetectionStrategy的实现是一样的:
@Override
public void detectAndResolveConflictIfNecessary() throws HoodieEarlyConflictDetectionException {
if (hasMarkerConflict()) {
resolveMarkerConflict(basePath, partitionPath, fileId);
}
}
而SimpleTransactionDirectMarkerBasedDetectionStrategy中则开启了事务,对操作的file id级别加了锁:
@Override
public void detectAndResolveConflictIfNecessary() throws HoodieEarlyConflictDetectionException {
DirectMarkerTransactionManager txnManager =
new DirectMarkerTransactionManager((HoodieWriteConfig) config, fs, partitionPath, fileId);
try {
// Need to do transaction before create marker file when using early conflict detection
txnManager.beginTransaction(instantTime);
super.detectAndResolveConflictIfNecessary();
} catch (Exception e) {
LOG.warn("Exception occurs during create marker file in early conflict detection mode within transaction.");
throw e;
} finally {
// End transaction after created marker file.
txnManager.endTransaction(instantTime);
txnManager.close();
}
}
抛去加锁流程,核心代码的逻辑就是先判断是否有冲突,有冲突就调用解决冲突的函数。
先看判断是否有冲突的方法:
SimpleDirectMarkerBasedDetectionStrategy中通过调用其他函数来判断是否存在冲突:
@Override
public boolean hasMarkerConflict() {
try {
return checkMarkerConflict(basePath, maxAllowableHeartbeatIntervalInMs)
|| (checkCommitConflict && MarkerUtils.hasCommitConflict(activeTimeline, Stream.of(fileId).collect(Collectors.toSet()), completedCommitInstants));
} catch (IOException e) {
LOG.warn("Exception occurs during create marker file in eager conflict detection mode.");
throw new HoodieIOException("Exception occurs during create marker file in eager conflict detection mode.", e);
}
}
而AsyncTimelineServerBasedDetectionStrategy则返回一个变量的结果。从名字也可以看到,这个是异步的检测,所以会在检测结束后更新这个变量:
@Override
public boolean hasMarkerConflict() {
return hasConflict.get();
}
这里我们主要看一下SimpleDirectMarkerBasedDetectionStrategy
这是其中checkMarkerConflict的代码
获得的候选instant:
跳过当前写入相关的instant(currentInstantTime)。
跳过当前Instant时间之后的所有Instant。
心跳已停止的写入相关的Instant。
跳过pending的compaction Instant(目前我们不会在compaction操作中进行早期冲突检查),因为我们不希望让pending的compaction操作阻塞常规写入。
这里主要是先获得候选instant后,如果有partition path则只检查partition path下的marker,否则检查全部的,看是否有重复的file id。
/**
* 检查是否存在冲突的标记文件。
* 为了尽量减少列表操作的压力,首先预先构建路径前缀:
* '$base_path/.temp/instant_time/partition_path',只列出我们需要的特定partition_path,
* 而不是列出所有的'$base_path/.temp/'。
*/
public boolean checkMarkerConflict(String basePath, long maxAllowableHeartbeatIntervalInMs) throws IOException {
String tempFolderPath = basePath + Path.SEPARATOR + HoodieTableMetaClient.TEMPFOLDER_NAME;
// 获取候选的Instant列表
List<String> candidateInstants = MarkerUtils.getCandidateInstants(activeTimeline, Arrays.stream(fs.listStatus(new Path(tempFolderPath))).map(FileStatus::getPath).collect(Collectors.toList()),
instantTime, maxAllowableHeartbeatIntervalInMs, fs, basePath);
// 遍历候选Instant,检查是否存在冲突的标记文件
long res = candidateInstants.stream().flatMap(currentMarkerDirPath -> {
try {
Path markerPartitionPath;
if (StringUtils.isNullOrEmpty(partitionPath)) {
markerPartitionPath = new Path(currentMarkerDirPath);
} else {
markerPartitionPath = new Path(currentMarkerDirPath, partitionPath);
}
// 如果指定的partitionPath不为空,并且markerPartitionPath不存在,则直接返回空流
if (!StringUtils.isNullOrEmpty(partitionPath) && !fs.exists(markerPartitionPath)) {
return Stream.empty();
} else {
// 列出markerPartitionPath下的文件,过滤出包含fileId的文件
return Arrays.stream(fs.listStatus(markerPartitionPath)).parallel()
.filter((path) -> path.toString().contains(fileId));
}
} catch (IOException e) {
throw new HoodieIOException("IOException occurs during checking marker file conflict");
}
}).count();
// 如果存在冲突的标记文件,则打印警告日志并返回true
if (res != 0L) {
LOG.warn("Detected conflict marker files: " + partitionPath + "/" + fileId + " for " + instantTime);
return true;
}
// 不存在冲突的标记文件,返回false
return false;
}
接下来看看在timeline server情况下如何检查:
首先从server中获得所有pending的marker,如果没有的话直接返回即可
随后把本次请求和pending的marker加到当前list里
然后再获得候选instant
检测这两个list的instant有没有重合,或者当前list里有没有重合,有的话就通过CAS的方式赋值为检测到冲突。
@Override
public void run() {
// If a conflict among multiple writers is already detected,
// there is no need to run the detection again.
if (hasConflict.get()) {
return;
}
try {
Set<String> pendingMarkers = markerHandler.getPendingMarkersToProcess(markerDir);
if (!fs.exists(new Path(markerDir)) && pendingMarkers.isEmpty()) {
return;
}
HoodieTimer timer = HoodieTimer.start();
Set<String> currentInstantAllMarkers = new HashSet<>();
// We need to check both the markers already written to the storage
// and the markers from the requests pending processing.
currentInstantAllMarkers.addAll(markerHandler.getAllMarkers(markerDir));
currentInstantAllMarkers.addAll(pendingMarkers);
Path tempPath = new Path(basePath + Path.SEPARATOR + HoodieTableMetaClient.TEMPFOLDER_NAME);
List<Path> instants = MarkerUtils.getAllMarkerDir(tempPath, fs);
HoodieTableMetaClient metaClient =
HoodieTableMetaClient.builder().setConf(new Configuration()).setBasePath(basePath)
.setLoadActiveTimelineOnLoad(true).build();
HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
List<String> candidate = MarkerUtils.getCandidateInstants(activeTimeline, instants,
MarkerUtils.markerDirToInstantTime(markerDir), maxAllowableHeartbeatIntervalInMs, fs, basePath);
Set<String> tableMarkers = candidate.stream().flatMap(instant -> {
return MarkerUtils.readTimelineServerBasedMarkersFromFileSystem(instant, fs, new HoodieLocalEngineContext(new Configuration()), 100)
.values().stream().flatMap(Collection::stream);
}).collect(Collectors.toSet());
Set<String> currentFileIDs = currentInstantAllMarkers.stream().map(MarkerUtils::makerToPartitionAndFileID).collect(Collectors.toSet());
Set<String> tableFilesIDs = tableMarkers.stream().map(MarkerUtils::makerToPartitionAndFileID).collect(Collectors.toSet());
currentFileIDs.retainAll(tableFilesIDs);
if (!currentFileIDs.isEmpty()
|| (checkCommitConflict && MarkerUtils.hasCommitConflict(activeTimeline,
currentInstantAllMarkers.stream().map(MarkerUtils::makerToPartitionAndFileID).collect(Collectors.toSet()), completedCommits))) {
LOG.warn("Conflict writing detected based on markers!\n"
+ "Conflict markers: " + currentInstantAllMarkers + "\n"
+ "Table markers: " + tableMarkers);
hasConflict.compareAndSet(false, true);
}
LOG.info("Finish batching marker-based conflict detection in " + timer.endTimer() + " ms");
} catch (IOException e) {
throw new HoodieIOException("IOException occurs during checking marker conflict");
}
}
而resolveMarkerConflict在两个实现中都是一致的,就是报错。
@Override
public void resolveMarkerConflict(String basePath, String markerDir, String markerName) {
throw new HoodieEarlyConflictDetectionException(new ConcurrentModificationException("Early conflict detected but cannot resolve conflicts for overlapping writes"));
}