Activiti异步ServiceTask执行慢和重复执行的客户化解决方案

环境

  1. SpringBoot V2.1.2
  2. Avtiviti V7.1.0.M1
  3. Java SE V11

问题

默认情况下,Activiti异步ServiceTask会等待5分钟,然后才会被执行,而且如果ServiceTask执行耗时超过5分钟后,Activiti会认为这个Job执行失败,进而这个Job又会被重新,这样就出现了某个异步Job跑多次的现象。

我们的业务场景是用来执行Python/Per/Ansible等脚本,重复执行可能会造成异常,所以才有了本文。

才疏学浅,欢迎交流学习:)

Acitviti异步Job处理逻辑

Activiti异步Task由3个线程处理(实际上还有一个Timer线程,本文不涉及)。

activiti-reset-expired-jobs 重置线程

这个线程会提取已经到期的Job,然后插入一条对应的新的Job,再删除到期的Job。新插入的Job会被activiti-acquire-async-jobs抓取并处理。
SQL大体如下:

#获取到期的Job
select RES.ID_ from ACT_RU_JOB RES where RES.LOCK_EXP_TIME_ is not null and RES.LOCK_EXP_TIME_ < now() LIMIT 3 OFFSET 0;

#插入一条新的Job(新Job的LOCK_EXP_TIME_=null)
insert into ACT_RU_JOB ( ID_, REV_, TYPE_, LOCK_OWNER_, LOCK_EXP_TIME_, EXCLUSIVE_, EXECUTION_ID_, 
PROCESS_INSTANCE_ID_, PROC_DEF_ID_, RETRIES_, 
EXCEPTION_STACK_ID_, EXCEPTION_MSG_, DUEDATE_, 
REPEAT_, HANDLER_TYPE_, HANDLER_CFG_, TENANT_ID_) values 
('e9e5033d-c94f-11e9-957c-fa163ed9ce93', 1, 'message', NULL, NULL, 1, 'e41a50e9-c94f-11e9-957c-fa163ed9ce93', 
'e419185d-c94f-11e9-957c-fa163ed9ce93', 'test_python2:3:bbb9a75c-c94a-11e9-978a-fa163ed9ce93', 1, 
NULL, NULL, NULL, NULL, 'async-continuation', NULL, '' ) 

#删除到期的Job
delete from ACT_RU_JOB where ID_ = 'e41e6f9b-c94f-11e9-957c-fa163ed9ce93' and REV_ = 1 

activiti-acquire-async-jobs 锁定线程

这个线程获取待执行的Job(LOCK_EXP_TIME_ is null),然后给Job上锁(默认锁定5分钟)
SQL大体如下:

#获取待执行的Job
select RES.* from ACT_RU_JOB RES where LOCK_EXP_TIME_ is null LIMIT 1 OFFSET 0 ;

#上锁(LOCK_EXP_TIME_ = now()+5min)
update ACT_RU_JOB SET REV_ = 2, LOCK_EXP_TIME_ = '08/28/2019 12:54:42.366', LOCK_OWNER_ = '14f23e3a-6a4b-49f2-9ec2-5de7e8bbc517', 
RETRIES_ = 1, EXCEPTION_STACK_ID_ = NULL, EXCEPTION_MSG_ = NULL 
where ID_= 'e9e5033d-c94f-11e9-957c-fa163ed9ce93' and REV_ = 1 

activiti-async-job-executor-thread 执行线程

这个线程执行Job逻辑。输入源来自activiti-acquire-async-jobs或者上一个节点的调用。
处理步骤:
1)锁定ACT_RU_EXECUTION。此锁定是有逻辑条件的,并不是所有的都会锁定。
2 ) 执行业务逻辑
3)修改ACT_RU_EXECUTION使流程转到下一个节点
4 ) 删除Job
SQL大体如下:

#锁定execution
update ACT_RU_EXECUTION set LOCK_TIME_ = '08/28/2019 12:54:42.396' where ID_ = 'e419185d-c94f-11e9-957c-fa163ed9ce93' and (LOCK_TIME_ is null OR LOCK_TIME_ < '08/28/2019 12:54:37.396') 

#跳转到下个节点
update ACT_RU_EXECUTION set REV_ = 2, BUSINESS_KEY_ = NULL, PROC_DEF_ID_ = 'test_python2:3:bbb9a75c-c94a-11e9-978a-fa163ed9ce93', 
ACT_ID_ = 'EndEvent_1dj8vqa', IS_ACTIVE_ = 0, IS_CONCURRENT_ = 0, IS_SCOPE_ = 0, IS_EVENT_SCOPE_ 
= 0, IS_MI_ROOT_ = 0, PARENT_ID_ = 'e419185d-c94f-11e9-957c-fa163ed9ce93', SUPER_EXEC_ = NULL, 
ROOT_PROC_INST_ID_ = 'e419185d-c94f-11e9-957c-fa163ed9ce93', SUSPENSION_STATE_ = 1, NAME_ = 
NULL, IS_COUNT_ENABLED_ = 0, EVT_SUBSCR_COUNT_ = 0, TASK_COUNT_ = 0, JOB_COUNT_ = 0, TIMER_JOB_COUNT_ 
= 0, SUSP_JOB_COUNT_ = 0, DEADLETTER_JOB_COUNT_ = 0, VAR_COUNT_ = 0, ID_LINK_COUNT_ = 0 where 
ID_ = 'e41a50e9-c94f-11e9-957c-fa163ed9ce93' and REV_ = 1 

#删除Job
delete from ACT_RU_JOB where ID_ = 'e9e5033d-c94f-11e9-957c-fa163ed9ce93' and REV_ = 2 

问题分析

异步Job会等待5分钟,然后才会被处理

因为Activit新插入一条Job的时候,默认的锁定时间是5分钟(LOCK_EXP_TIME_=now()+5min),所以等到activiti-reset-expired-jobs能处理的时候已经是5分钟之后了。

异步Job重复执行

因为activiti-acquire-async-jobs获取待执行的Job后,会默认的Job上锁,锁定5分钟。如果一旦你的业务逻辑耗时超过5分钟,activiti-reset-expired-jobs线程又会抓取此Job,然后创建一个相应的新Job,这样你的业务逻辑就会被执行多次。

解决思路

首先Job等待5分钟的问题可以通过配置即可完成,我修改成了5秒钟(正是因为修改成了5秒,才发现了重复执行的问题)。

再者重复执行的问题,要是一旦进入Job执行线程的时候就锁定Job,而且是真正锁死的那种,那就解决了。(即不会被activiti-reset-expired-jobs重置的那种锁,因为activiti-reset-expired-jobs会判断LOCK_EXP_TIME_<now()进行解锁)。

遗憾的是,ACT_RU_JOB表并没有类似Status这样的字段。鉴于activiti-reset-expired-jobs判断的是字段LOCK_EXP_TIME_,那么我们可以在进入执行Job线程的时候更新LOCK_EXP_TIME_为一个遥遥无期的时间,这是一种方案。但是我想明确知道Job是在初始状态,还是在锁定等待执行状态,还是正在执行状态,所以有了如下最终的方案。

最终方案是利用字段REV_,这是Job版本号的意思。初始的值为1。
这里给REV_赋予新的含义,用来表示Job的状态:
1-表示新来的Job,初始状态。
2-表示已经锁定了的,等待执行的Job。
3-表示正在执行的Job。

这样再修改重置线程activiti-reset-expired-jobs,增加判断字段REV_=1,这样它就只会处理新创建的Job,对于准备执行/正在执行的Job不会再处理。这样重复问题就解决了。

前面提到过,执行Job的线程输入源有两个,一个是来自activiti-acquire-async-jobs,一个是来自上一个节点(这个Job不会经过reset/acquire的处理,是一个REV_=1的Job)。

对于经过了reset/acquire处理的Job,本来activiti-acquire-async-jobs在锁定Job的时候就会把REV_++(变成2),这已经满足了我们的方案。

对于没有经过reset/acquire处理的Job,我们需要对设置REV_为3(也就是从1直接到3)。

所以统一的做法就是在Job开始执行时便设置REV_为3(不管Job有没有经过reset/acquire处理)。

总结起来就是修改一下几点:

  1. 锁定时间由默认的5分钟修改成5秒
  2. 修改重置线程activiti-reset-expired-jobs的提取Job SQL,增加提取条件REV_=1
  3. 修改执行线程activiti-acquire-async-jobs,Update Job的REV_为3

上代码

设置客户化的AsyncExecutor到Activiti PE

@Component
public class OMPServerExeStarter implements CommandLineRunner
{
	private Logger logger = LoggerFactory.getLogger(OMPServerExeStarter.class);
	
	private final ProcessEngine processEngine;
	
	public OMPServerExeStarter(ProcessEngine processEngine)
	{
		this.processEngine = processEngine;
	}

	@Override
	public void run(String... args) throws Exception
	{
		System.out.println("--------OMPServerExeStarter--------");
		
		processEngine.getProcessEngineConfiguration().setAsyncExecutor(
				new MyAsyncExecutor((ProcessEngineConfigurationImpl)
						processEngine.getProcessEngineConfiguration()));
		processEngine.getProcessEngineConfiguration().setAsyncExecutorActivate(true);
		processEngine.getProcessEngineConfiguration().getAsyncExecutor().start();
	}
}

客户化的AsyncExecutor

public class MyAsyncExecutor extends DefaultAsyncJobExecutor
{
	public MyAsyncExecutor(ProcessEngineConfigurationImpl processEngineConfiguration)
	{
		super();
		
		this.setAsyncJobLockTimeInMillis(5*1000);//锁定5秒
		this.setResetExpiredJobsInterval(5*1000);
		this.setResetExpiredJobsRunnable(new MyResetExpiredJobsRunnable(this));
		processEngineConfiguration.setAsyncExecutorNumberOfRetries(1);
		this.setProcessEngineConfiguration(processEngineConfiguration);
	}
	
	protected Runnable createRunnableForJob(final Job job)
	{
		if (executeAsyncRunnableFactory == null)
		{
			return new MyExecuteAsyncRunnable(job, processEngineConfiguration);
		}
		else
		{
			return executeAsyncRunnableFactory.createExecuteAsyncRunnable(job, processEngineConfiguration);
		}
	}
	
}

客户化的重置线程

public class MyResetExpiredJobsRunnable extends ResetExpiredJobsRunnable
{
	@Autowired
	private ProcessSrv processSrv;
	
	private static Logger log = LoggerFactory.getLogger(MyResetExpiredJobsRunnable.class);
	
	public MyResetExpiredJobsRunnable(AsyncExecutor asyncExecutor)
	{
		super(asyncExecutor);
	}
	
	
	public synchronized void run()
	{
		log.info("{} starting to reset expired jobs");
		Thread.currentThread().setName("activiti-reset-expired-jobs");
		
		if(this.processSrv == null)
			this.processSrv = ApplicationContextHolder.getBean(ProcessSrv.class);

		while (!isInterrupted)
		{

			try
			{
				//=================客户化代码===================
				List<String> expiredJobIds = processSrv.getExpiredJobs();
				//=================客户化代码===================
				if (expiredJobIds.size() > 0)
				{
					for(String str : expiredJobIds)
						log.info("got expired jobs:"+str);
					asyncExecutor.getProcessEngineConfiguration().getCommandExecutor()
							.execute(new ResetExpiredJobsCmd(expiredJobIds));
				}

			}
			catch (Throwable e)
			{
				if (e instanceof ActivitiOptimisticLockingException)
				{
					log.debug("Optmistic lock exception while resetting locked jobs", e);
				}
				else
				{
					log.error("exception during resetting expired jobs", e.getMessage(), e);
				}
			}

			// Sleep
			try
			{

				synchronized (MONITOR)
				{
					if (!isInterrupted)
					{
						isWaiting.set(true);
						MONITOR.wait(asyncExecutor.getResetExpiredJobsInterval());
					}
				}

			}
			catch (InterruptedException e)
			{
				if (log.isDebugEnabled())
				{
					log.debug("async reset expired jobs wait interrupted");
				}
			}
			finally
			{
				isWaiting.set(false);
			}

		}

		log.info("{} stopped resetting expired jobs");
	}
}

客户化的Job执行线程

public class MyExecuteAsyncRunnable extends ExecuteAsyncRunnable
{
	private static Logger log = LoggerFactory.getLogger(MyExecuteAsyncRunnable.class);
	private ProcessSrv processSrv;

	public MyExecuteAsyncRunnable(String jobId, ProcessEngineConfigurationImpl processEngineConfiguration)
	{
		super(jobId, processEngineConfiguration);
		this.processSrv = ApplicationContextHolder.getBean(ProcessSrv.class);
	}
	
	public MyExecuteAsyncRunnable(Job job, ProcessEngineConfigurationImpl processEngineConfiguration)
	{
		super(job, processEngineConfiguration);
		this.processSrv = ApplicationContextHolder.getBean(ProcessSrv.class);
	}
	

	protected boolean lockJobIfNeeded()
	{
		try
		{
			if (job.isExclusive())
			{
				processEngineConfiguration.getCommandExecutor().execute(new LockExclusiveJobCmd(job));
			}
			
			log.info("=========[lockJobIfNeeded] try customize upgradeJobVersion======"+job.getId());
			//============客户化代码==========
			processSrv.upgradeJobVersion(job.getId());
			//============客户化代码==========
		}
		catch (Throwable lockException)
		{
			if (log.isDebugEnabled())
			{
				log.debug("Could not lock exclusive job. Unlocking job so it can be acquired again. Catched exception: "
						+ lockException.getMessage());
			}

			// Release the job again so it can be acquired later or by another node
			unacquireJob();

			return false;
		}

		return true;
	}

}
  • 3
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 9
    评论
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值