flink cep 原理分析2(源码分析)

flink cep 源码分析

1 前文介绍

前文中我们已经介绍啦关于cep的原理和缓存以及数据结构
https://blog.csdn.net/u013052725/article/details/100631259
接着我们看一看如何flink当中是如何实现的。其中cep版本是1.6.0。

2 首先我们思考四个问题:

a)事件和事件间的关联以及版本是如何在flink当中存储的。
b)Computation state 在flink 中是如何实现的
c)事件到来是如何根据Computation state 来计算有哪些边(take,process,ignore) ,以及如何更新Computation state
d)事件间的版本号如何生成
e)如何根据缓存数据以及版本号进行回溯输出结果

3 源码分析

a)我们先看看flink中 shared version match buffer 是如何实现的。
flink中实现缓存的实现:SharedBuffer类

SharedBuffer类中有三个存储数据的结构如下:
注:其中MapState 是flink中的状态我们可以理解为就是一个map,不过其数据可能存储在内存也可能持久化在磁盘上,详细说明看官网:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/zh/dev/stream/state/state.html

	/** The buffer holding the unique events seen so far. */
	private MapState<EventId, Lockable<V>> eventsBuffer;

	/** The number of events seen so far in the stream per timestamp. */
	private MapState<Long, Integer> eventsCount;
	private MapState<NodeId, Lockable<SharedBufferNode>> entries;

EventId:每个事件会对应一个id
NodeId:包括EventId和事件所在state的名字
SharedBufferNode:存放一个节点所对应的边,可能有多条,以集合形式存在,其中每条边有版本号和target节点
eventsBuffer:key是EventId value是对应的值,存储所以匹配到的事件的id和对应的value值,用于输出的时候根据id取得数据。
eventsCount:key是事件的时间戳,value是这个事件在缓存中的数量
entries:key是NodeId 是事件在该缓存中的一个标志,value 是SharedBufferNode 存放这个节点所对应的各个版本的前置节点。匹配到的事件会存放在这个缓存当中,并且保存事件间的关系和版本号。匹配过程中主要使用的就是这个版本。

b)Computation state 在flink中的数据结构

Computation state 的数量代表啦当前run的数量每个 Computation state就是这个run的当前状态

// pointer to the NFA currentStateName of the computation
	private final String currentStateName;

	// The current version of the currentStateName to discriminate the valid pattern paths in the SharedBuffer
	private final DeweyNumber version;

	// Timestamp of the first element in the pattern
	private final long startTimestamp;

	@Nullable
	private final NodeId previousBufferEntry;

	@Nullable
	private final EventId startEventID;

currentStateName:描述当前run所在的state。可以根据state取得当前 state的事件迁移条件来判断事件的迁移情况。
version:当前state所对应的版本号。可以用于下个state版本号的计算和终止时回溯事件
startTimestamp:事件开始事件,用于判断是否超时
startEventID:开始事件id
previousBufferEntry:当前run的最后事件节点 下一个事件到达如果take会作为新的最后节点。新节点指向之前的节点。

c)事件到来的处理流程

如下方法doProcess 事件到达的处理流程

	private Collection<Map<String, List<T>>> doProcess(
			final SharedBuffer<T> sharedBuffer,
			final NFAState nfaState,
			final EventWrapper event,
			final AfterMatchSkipStrategy afterMatchSkipStrategy) throws Exception {
		//需要匹配的ComputationState列表
		final PriorityQueue<ComputationState> newPartialMatches = new PriorityQueue<>(NFAState.COMPUTATION_STATE_COMPARATOR);
		//匹配完成的ComputationState列表
		final PriorityQueue<ComputationState> potentialMatches = new PriorityQueue<>(NFAState.COMPUTATION_STATE_COMPARATOR);

		//迭代当前所有需要匹配的ComputationState
		for (ComputationState computationState : nfaState.getPartialMatches()) {
			//根据当前computationState和事件计算出下一个或者多个ComputationState,并且存储匹配到事件到sharebuffer中,这个方法很重要下面讲解
			final Collection<ComputationState> newComputationStates = computeNextStates(
				sharedBuffer,
				computationState,
				event,
				event.getTimestamp());

			if (newComputationStates.size() != 1) {
				nfaState.setStateChanged();
			} else if (!newComputationStates.iterator().next().equals(computationState)) {
				nfaState.setStateChanged();
			}

			//delay adding new computation states in case a stop state is reached and we discard the path.
			final Collection<ComputationState> statesToRetain = new ArrayList<>();
			//if stop state reached in this path
			boolean shouldDiscardPath = false;
			//遍历添加到添加到对应的列表中
			for (final ComputationState newComputationState : newComputationStates) {

				if (isFinalState(newComputationState)) {
					potentialMatches.add(newComputationState);
				} else if (isStopState(newComputationState)) {
					//reached stop state. release entry for the stop state
					shouldDiscardPath = true;
					sharedBuffer.releaseNode(newComputationState.getPreviousBufferEntry());
				} else {
					// add new computation state; it will be processed once the next event arrives
					statesToRetain.add(newComputationState);
				}
			}

			if (shouldDiscardPath) {
				// a stop state was reached in this branch. release branch which results in removing previous event from
				// the buffer
				for (final ComputationState state : statesToRetain) {
					sharedBuffer.releaseNode(state.getPreviousBufferEntry());
				}
			} else {
				newPartialMatches.addAll(statesToRetain);
			}
		}

		if (!potentialMatches.isEmpty()) {
			nfaState.setStateChanged();
		}

		List<Map<String, List<T>>> result = new ArrayList<>();
		if (afterMatchSkipStrategy.isSkipStrategy()) {
			processMatchesAccordingToSkipStrategy(sharedBuffer,
				nfaState,
				afterMatchSkipStrategy,
				potentialMatches,
				newPartialMatches,
				result);
		} else {
			for (ComputationState match : potentialMatches) {
				Map<EventId, T> eventsCache = new HashMap<>();
				Map<String, List<T>> materializedMatch =
					sharedBuffer.materializeMatch(
						//根据匹配完成的状态列表PartialMatches 回溯出事件序列,这个方法也很重要下完讲解。
						sharedBuffer.extractPatterns(
							match.getPreviousBufferEntry(),
							match.getVersion()).get(0),
						eventsCache
					);

				result.add(materializedMatch);
				sharedBuffer.releaseNode(match.getPreviousBufferEntry());
			}
		}

		nfaState.setNewPartialMatches(newPartialMatches);

		return result;
	}

我们接着看看computeNextStates 方法
根据当前computationState和事件计算出下一个或者多个新的ComputationState,并且存储匹配到事件到sharebuffer中

private Collection<ComputationState> computeNextStates(
			final SharedBuffer<T> sharedBuffer,
			final ComputationState computationState,
			final EventWrapper event,
			final long timestamp) throws Exception {
		
		final ConditionContext<T> context = new ConditionContext<>(this, sharedBuffer, computationState);
		//根据当前computationState 和事件 算出事件的所有迁移边。
		final OutgoingEdges<T> outgoingEdges = createDecisionGraph(context, computationState, event.getEvent());
		
		// Create the computing version based on the previously computed edges
		// We need to defer the creation of computation states until we know how many edges start
		// at this computation state so that we can assign proper version
		final List<StateTransition<T>> edges = outgoingEdges.getEdges();
		int takeBranchesToVisit = Math.max(0, outgoingEdges.getTotalTakeBranches() - 1);
		int ignoreBranchesToVisit = outgoingEdges.getTotalIgnoreBranches();
		int totalTakeToSkip = Math.max(0, outgoingEdges.getTotalTakeBranches() - 1);

		final List<ComputationState> resultingComputationStates = new ArrayList<>();
		//遍历所有的迁移边
		for (StateTransition<T> edge : edges) {
			switch (edge.getAction()) {
				case IGNORE: { //如果是ignore
					if (!isStartState(computationState)) {
						final DeweyNumber version;
						if (isEquivalentState(edge.getTargetState(), getState(computationState))) {
							//Stay in the same state (it can be either looping one or singleton)
							final int toIncrease = calculateIncreasingSelfState(
								outgoingEdges.getTotalIgnoreBranches(),
								outgoingEdges.getTotalTakeBranches());
							version = computationState.getVersion().increase(toIncrease);
						} else {
							//IGNORE after PROCEED
							version = computationState.getVersion()
								.increase(totalTakeToSkip + ignoreBranchesToVisit)
								.addStage();
							ignoreBranchesToVisit--;
						}

						addComputationState(
							sharedBuffer,
							resultingComputationStates,
							edge.getTargetState(),
							computationState.getPreviousBufferEntry(),
							version,
							computationState.getStartTimestamp(),
							computationState.getStartEventID()
						);
					}
				}
				break;
				case TAKE: //如果是take
					final State<T> nextState = edge.getTargetState();
					final State<T> currentState = edge.getSourceState();
					获取当前computationState的总存放的结束事件
					final NodeId previousEntry = computationState.getPreviousBufferEntry();
					//根据take的数量增加版本号。
					final DeweyNumber currentVersion = computationState.getVersion().increase(takeBranchesToVisit);
					//take事件之后增加版本的stage,即增加版本号长度,之前版本为该版本的前缀用于回溯时判断是否是在同一个run中。后面会举例说明。
					final DeweyNumber nextVersion = new DeweyNumber(currentVersion).addStage();
					takeBranchesToVisit--;
					//把当前的事件存入sharedBuffer 并且指向previousEntry 版本号为currentVersion 用于回溯事件序列
					final NodeId newEntry = sharedBuffer.put(
						currentState.getName(),
						event.getEventId(),
						previousEntry,
						currentVersion);

					final long startTimestamp;
					final EventId startEventId;
					if (isStartState(computationState)) {
						startTimestamp = timestamp;
						startEventId = event.getEventId();
					} else {
						startTimestamp = computationState.getStartTimestamp();
						startEventId = computationState.getStartEventID();
					}
					//更新当前的ComputationState previousEntry更新为newEntry(当前到达的),版本为增加stage的版本nextVersion,state为take后的nextState
					addComputationState(
							sharedBuffer,
							resultingComputationStates,
							nextState,
							newEntry,
							nextVersion,
							startTimestamp,
							startEventId);

					//check if newly created state is optional (have a PROCEED path to Final state)
					final State<T> finalState = findFinalStateAfterProceed(context, nextState, event.getEvent());
					if (finalState != null) {
						addComputationState(
								sharedBuffer,
								resultingComputationStates,
								finalState,
								newEntry,
								nextVersion,
								startTimestamp,
								startEventId);
					}
					break;
			}
		}

		if (isStartState(computationState)) {
			int totalBranches = calculateIncreasingSelfState(
					outgoingEdges.getTotalIgnoreBranches(),
					outgoingEdges.getTotalTakeBranches());

			DeweyNumber startVersion = computationState.getVersion().increase(totalBranches);
			ComputationState startState = ComputationState.createStartState(computationState.getCurrentStateName(), startVersion);
			resultingComputationStates.add(startState);
		}

		if (computationState.getPreviousBufferEntry() != null) {
			// release the shared entry referenced by the current computation state.
			sharedBuffer.releaseNode(computationState.getPreviousBufferEntry());
		}
		//返回跟新后的ComputationState
		return resultingComputationStates;
	}

接着我们看看sharedBuffer.extractPatterns(
match.getPreviousBufferEntry(),
match.getVersion()) 方法

主要用于根据potentialMatches中到达finish状态的ComputationState来回溯匹配完成的事件序列

	public List<Map<String, List<EventId>>> extractPatterns(
			final NodeId nodeId,
			final DeweyNumber version) throws Exception {

		List<Map<String, List<EventId>>> result = new ArrayList<>();

		// stack to remember the current extraction states
		Stack<ExtractionState> extractionStates = new Stack<>();

		// get the starting shared buffer entry for the previous relation
		Lockable<SharedBufferNode> entryLock = entries.get(nodeId);

		if (entryLock != null) {
			SharedBufferNode entry = entryLock.getElement();
			extractionStates.add(new ExtractionState(Tuple2.of(nodeId, entry), version, new Stack<>()));

			// use a depth first search to reconstruct the previous relations
			while (!extractionStates.isEmpty()) {
				final ExtractionState extractionState = extractionStates.pop();
				// current path of the depth first search
				final Stack<Tuple2<NodeId, SharedBufferNode>> currentPath = extractionState.getPath();
				final Tuple2<NodeId, SharedBufferNode> currentEntry = extractionState.getEntry();

				// termination criterion
				if (currentEntry == null) {
					final Map<String, List<EventId>> completePath = new LinkedHashMap<>();

					while (!currentPath.isEmpty()) {
						final NodeId currentPathEntry = currentPath.pop().f0;

						String page = currentPathEntry.getPageName();
						List<EventId> values = completePath
							.computeIfAbsent(page, k -> new ArrayList<>());
						values.add(currentPathEntry.getEventId());
					}
					result.add(completePath);
				} else {

					// append state to the path
					currentPath.push(currentEntry);

					boolean firstMatch = true;
					for (SharedBufferEdge edge : currentEntry.f1.getEdges()) {
						// we can only proceed if the current version is compatible to the version
						// of this previous relation
						final DeweyNumber currentVersion = extractionState.getVersion();
						//这里的判断很关键,该条edge的版本是当前版本的前缀的则取出该条edge的target放入ExtractionState的Entry中作为下次迭代的开始并且存入currentPath中
						if (currentVersion.isCompatibleWith(edge.getDeweyNumber())) {
							final NodeId target = edge.getTarget();
							Stack<Tuple2<NodeId, SharedBufferNode>> newPath;

							if (firstMatch) {
								// for the first match we don't have to copy the current path
								newPath = currentPath;
								firstMatch = false;
							} else {
								newPath = new Stack<>();
								newPath.addAll(currentPath);
							}
							//满足条件的话用当前edge的target作为ExtractionState的entry 版本为当前边的版本放入extractionStates中作为下次的开始。迭代使用啦extractionStates
							extractionStates.push(new ExtractionState(
								target != null ? Tuple2.of(target, entries.get(target).getElement()) : null,
								edge.getDeweyNumber(),
								newPath));
						}
					}
				}

			}
		}
		return result;
	}

4. demo buffer和ComputationState 各个阶段数据

object DemoFlinkCep1 {

  case class EventData(value:Int) //这里用Long类型会有问题原因待查

  def main(args: Array[String]): Unit = {

    val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    environment.setParallelism(1)
    //输入数据
    val input: DataStream[String] = environment.socketTextStream("liangfeng02",9999)
    //模拟股票数据
    val sharesStream: DataStream[EventData] = input.map(x=>EventData(x.toInt))
    sharesStream.print()
    //创建匹配模式
    val pattern: Pattern[EventData, EventData] = createPattern
    //pattern 和输入流进行关联
    val patternStream: PatternStream[EventData] = CEP.pattern(sharesStream,pattern)
    //打印结果
    printPatternStreamMethod1(patternStream)
    environment.execute("flinkDemo")


  }
   /**
    * 创建匹配模式 10 20+ 10
    * @return
    */
  def createPattern() ={
    val pattern: Pattern[EventData, EventData] = Pattern
      .begin[EventData]("start").where(_.value.equals(10))
      .followedBy("next").where(_.value.equals(20)).oneOrMore
      .followedBy("next1").where(_.value.equals(30))
    pattern
  }
   /**
    * 打印匹配的事件
    * @param patternStream
    */
  def printPatternStreamMethod1(patternStream: PatternStream[EventData]): Unit ={
    val result: DataStream[Map[String, Iterable[EventData]]] = patternStream.select((data: Map[String, Iterable[EventData]]) => {
      data
    })
    result.print()
  }
}

事件序列
EventData(10)
EventData(20)
EventData(20)
EventData(30)
结果
Map(start -> List(EventData(10)), next -> List(EventData(20), EventData(20)), next1 -> List(EventData(30)))
Map(start -> List(EventData(10)), next -> List(EventData(20)), next1 -> List(EventData(30)))

代码中ComputationState和buffer数据变化流程

NFAState{partialMatches=[ComputationState{currentStateName=‘start’, version=1, startTimestamp=-1, previousBufferEntry=null, startEventID=null}], completedMatches=[], stateChanged=false}
ComputationState{currentStateName=‘start’, version=1, startTimestamp=-1, previousBufferEntry=null, startEventID=null}

partialMatches = {PriorityQueue@5127} size = 2
null null <1> NodeId{eventId=EventId{id=0, timestamp=1568879113365}, pageName=‘start’}
1 = {ComputationState@5918} “ComputationState{currentStateName=‘start’, version=2, startTimestamp=-1, previousBufferEntry=null, startEventID=null}”
0 = {ComputationState@5932} “ComputationState{currentStateName=‘next:1’, version=1.0, startTimestamp=1568879113365, previousBufferEntry=NodeId{eventId=EventId{id=0, timestamp=1568879113365}, pageName=‘start’}, startEventID=EventId{id=0, timestamp=1568879113365}}”

NodeId{eventId=EventId{id=0, timestamp=1568879113365}, pageName=‘start’} < 1.0 > NodeId{eventId=EventId{id=0, timestamp=1568882512873}, pageName=‘next’}

0 = {ComputationState@6594} “ComputationState{currentStateName=‘next’, version=1.0.0, startTimestamp=1568879113365, previousBufferEntry=NodeId{eventId=EventId{id=0, timestamp=1568882512873}, pageName=‘next’}, startEventID=EventId{id=0, timestamp=1568879113365}}”
1 = {ComputationState@6777} “ComputationState{currentStateName=‘start’, version=2, startTimestamp=-1, previousBufferEntry=null, startEventID=null}”


NodeId{eventId=EventId{id=0, timestamp=1568882512873}, pageName=‘next’} <1.0.0> NodeId{eventId=EventId{id=0, timestamp=1568883796484}, pageName=‘next’}

0 = {ComputationState@7417} “ComputationState{currentStateName=‘next’, version=1.0.0.0, startTimestamp=1568879113365, previousBufferEntry=NodeId{eventId=EventId{id=0, timestamp=1568883796484}, pageName=‘next’}, startEventID=EventId{id=0, timestamp=1568879113365}}”
1 = {ComputationState@7515} “ComputationState{currentStateName=‘next1’, version=1.0.1.0, startTimestamp=1568879113365, previousBufferEntry=NodeId{eventId=EventId{id=0, timestamp=1568882512873}, pageName=‘next’}, startEventID=EventId{id=0, timestamp=1568879113365}}”
2 = {ComputationState@6777} “ComputationState{currentStateName=‘start’, version=2, startTimestamp=-1, previousBufferEntry=null, startEventID=null}”


NodeId{eventId=EventId{id=0, timestamp=1568883796484}, pageName=‘next’} <1.0.0.0> NodeId{eventId=EventId{id=0, timestamp=1568884722992}, pageName=‘next1’}

NodeId{eventId=EventId{id=0, timestamp=1568882512873}, pageName=‘next’} <1.0.1.0> NodeId{eventId=EventId{id=0, timestamp=1568884722992}, pageName=‘next1’}

0 = {ComputationState@8481} “ComputationState{currentStateName=‘next:0’, version=1.0.0.1.0, startTimestamp=1568879113365, previousBufferEntry=NodeId{eventId=EventId{id=0, timestamp=1568883796484}, pageName=‘next’}, startEventID=EventId{id=0, timestamp=1568879113365}}”

1 = {ComputationState@8482} “ComputationState{currentStateName=’ e n d S t a t e endState endState’, version=1.0.0.0.0, startTimestamp=1568879113365, previousBufferEntry=NodeId{eventId=EventId{id=0, timestamp=1568884722992}, pageName=‘next1’}, startEventID=EventId{id=0, timestamp=1568879113365}}”
2 = ComputationState{currentStateName=’ e n d S t a t e endState endState’, version=1.0.1.0.0, startTimestamp=1568879113365, previousBufferEntry=NodeId{eventId=EventId{id=0, timestamp=1568884722992}, pageName=‘next1’}, startEventID=EventId{id=0, timestamp=1568879113365}}

3 = ComputationState{currentStateName=‘start’, version=2, startTimestamp=-1, previousBufferEntry=null, startEventID=null}

以上数据可以看到每个run take到事件都在增加版本号的stage 如果有多个take 或者 ignore after process 那么会先当前版本号加一再增加stage
以上数据可以观察最后两个flinsh的state 可以根据版本号回溯到事件序列。

  • 1
    点赞
  • 4
    收藏
  • 打赏
    打赏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

梁丰

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值