Activiti异步ServiceTask执行慢和重复执行的客户化解决方案
环境
- SpringBoot V2.1.2
- Avtiviti V7.1.0.M1
- Java SE V11
问题
默认情况下,Activiti异步ServiceTask会等待5分钟,然后才会被执行,而且如果ServiceTask执行耗时超过5分钟后,Activiti会认为这个Job执行失败,进而这个Job又会被重新,这样就出现了某个异步Job跑多次的现象。
我们的业务场景是用来执行Python/Per/Ansible等脚本,重复执行可能会造成异常,所以才有了本文。
才疏学浅,欢迎交流学习:)
Acitviti异步Job处理逻辑
Activiti异步Task由3个线程处理(实际上还有一个Timer线程,本文不涉及)。
activiti-reset-expired-jobs 重置线程
这个线程会提取已经到期的Job,然后插入一条对应的新的Job,再删除到期的Job。新插入的Job会被activiti-acquire-async-jobs抓取并处理。
SQL大体如下:
#获取到期的Job
select RES.ID_ from ACT_RU_JOB RES where RES.LOCK_EXP_TIME_ is not null and RES.LOCK_EXP_TIME_ < now() LIMIT 3 OFFSET 0;
#插入一条新的Job(新Job的LOCK_EXP_TIME_=null)
insert into ACT_RU_JOB ( ID_, REV_, TYPE_, LOCK_OWNER_, LOCK_EXP_TIME_, EXCLUSIVE_, EXECUTION_ID_,
PROCESS_INSTANCE_ID_, PROC_DEF_ID_, RETRIES_,
EXCEPTION_STACK_ID_, EXCEPTION_MSG_, DUEDATE_,
REPEAT_, HANDLER_TYPE_, HANDLER_CFG_, TENANT_ID_) values
('e9e5033d-c94f-11e9-957c-fa163ed9ce93', 1, 'message', NULL, NULL, 1, 'e41a50e9-c94f-11e9-957c-fa163ed9ce93',
'e419185d-c94f-11e9-957c-fa163ed9ce93', 'test_python2:3:bbb9a75c-c94a-11e9-978a-fa163ed9ce93', 1,
NULL, NULL, NULL, NULL, 'async-continuation', NULL, '' )
#删除到期的Job
delete from ACT_RU_JOB where ID_ = 'e41e6f9b-c94f-11e9-957c-fa163ed9ce93' and REV_ = 1
activiti-acquire-async-jobs 锁定线程
这个线程获取待执行的Job(LOCK_EXP_TIME_ is null),然后给Job上锁(默认锁定5分钟)
SQL大体如下:
#获取待执行的Job
select RES.* from ACT_RU_JOB RES where LOCK_EXP_TIME_ is null LIMIT 1 OFFSET 0 ;
#上锁(LOCK_EXP_TIME_ = now()+5min)
update ACT_RU_JOB SET REV_ = 2, LOCK_EXP_TIME_ = '08/28/2019 12:54:42.366', LOCK_OWNER_ = '14f23e3a-6a4b-49f2-9ec2-5de7e8bbc517',
RETRIES_ = 1, EXCEPTION_STACK_ID_ = NULL, EXCEPTION_MSG_ = NULL
where ID_= 'e9e5033d-c94f-11e9-957c-fa163ed9ce93' and REV_ = 1
activiti-async-job-executor-thread 执行线程
这个线程执行Job逻辑。输入源来自activiti-acquire-async-jobs或者上一个节点的调用。
处理步骤:
1)锁定ACT_RU_EXECUTION。此锁定是有逻辑条件的,并不是所有的都会锁定。
2 ) 执行业务逻辑
3)修改ACT_RU_EXECUTION使流程转到下一个节点
4 ) 删除Job
SQL大体如下:
#锁定execution
update ACT_RU_EXECUTION set LOCK_TIME_ = '08/28/2019 12:54:42.396' where ID_ = 'e419185d-c94f-11e9-957c-fa163ed9ce93' and (LOCK_TIME_ is null OR LOCK_TIME_ < '08/28/2019 12:54:37.396')
#跳转到下个节点
update ACT_RU_EXECUTION set REV_ = 2, BUSINESS_KEY_ = NULL, PROC_DEF_ID_ = 'test_python2:3:bbb9a75c-c94a-11e9-978a-fa163ed9ce93',
ACT_ID_ = 'EndEvent_1dj8vqa', IS_ACTIVE_ = 0, IS_CONCURRENT_ = 0, IS_SCOPE_ = 0, IS_EVENT_SCOPE_
= 0, IS_MI_ROOT_ = 0, PARENT_ID_ = 'e419185d-c94f-11e9-957c-fa163ed9ce93', SUPER_EXEC_ = NULL,
ROOT_PROC_INST_ID_ = 'e419185d-c94f-11e9-957c-fa163ed9ce93', SUSPENSION_STATE_ = 1, NAME_ =
NULL, IS_COUNT_ENABLED_ = 0, EVT_SUBSCR_COUNT_ = 0, TASK_COUNT_ = 0, JOB_COUNT_ = 0, TIMER_JOB_COUNT_
= 0, SUSP_JOB_COUNT_ = 0, DEADLETTER_JOB_COUNT_ = 0, VAR_COUNT_ = 0, ID_LINK_COUNT_ = 0 where
ID_ = 'e41a50e9-c94f-11e9-957c-fa163ed9ce93' and REV_ = 1
#删除Job
delete from ACT_RU_JOB where ID_ = 'e9e5033d-c94f-11e9-957c-fa163ed9ce93' and REV_ = 2
问题分析
异步Job会等待5分钟,然后才会被处理
因为Activit新插入一条Job的时候,默认的锁定时间是5分钟(LOCK_EXP_TIME_=now()+5min),所以等到activiti-reset-expired-jobs能处理的时候已经是5分钟之后了。
异步Job重复执行
因为activiti-acquire-async-jobs获取待执行的Job后,会默认的Job上锁,锁定5分钟。如果一旦你的业务逻辑耗时超过5分钟,activiti-reset-expired-jobs线程又会抓取此Job,然后创建一个相应的新Job,这样你的业务逻辑就会被执行多次。
解决思路
首先Job等待5分钟的问题可以通过配置即可完成,我修改成了5秒钟(正是因为修改成了5秒,才发现了重复执行的问题)。
再者重复执行的问题,要是一旦进入Job执行线程的时候就锁定Job,而且是真正锁死的那种,那就解决了。(即不会被activiti-reset-expired-jobs重置的那种锁,因为activiti-reset-expired-jobs会判断LOCK_EXP_TIME_<now()进行解锁)。
遗憾的是,ACT_RU_JOB表并没有类似Status这样的字段。鉴于activiti-reset-expired-jobs判断的是字段LOCK_EXP_TIME_,那么我们可以在进入执行Job线程的时候更新LOCK_EXP_TIME_为一个遥遥无期的时间,这是一种方案。但是我想明确知道Job是在初始状态,还是在锁定等待执行状态,还是正在执行状态,所以有了如下最终的方案。
最终方案是利用字段REV_,这是Job版本号的意思。初始的值为1。
这里给REV_赋予新的含义,用来表示Job的状态:
1-表示新来的Job,初始状态。
2-表示已经锁定了的,等待执行的Job。
3-表示正在执行的Job。
这样再修改重置线程activiti-reset-expired-jobs,增加判断字段REV_=1,这样它就只会处理新创建的Job,对于准备执行/正在执行的Job不会再处理。这样重复问题就解决了。
前面提到过,执行Job的线程输入源有两个,一个是来自activiti-acquire-async-jobs,一个是来自上一个节点(这个Job不会经过reset/acquire的处理,是一个REV_=1的Job)。
对于经过了reset/acquire处理的Job,本来activiti-acquire-async-jobs在锁定Job的时候就会把REV_++(变成2),这已经满足了我们的方案。
对于没有经过reset/acquire处理的Job,我们需要对设置REV_为3(也就是从1直接到3)。
所以统一的做法就是在Job开始执行时便设置REV_为3(不管Job有没有经过reset/acquire处理)。
总结起来就是修改一下几点:
- 锁定时间由默认的5分钟修改成5秒
- 修改重置线程activiti-reset-expired-jobs的提取Job SQL,增加提取条件REV_=1
- 修改执行线程activiti-acquire-async-jobs,Update Job的REV_为3
上代码
设置客户化的AsyncExecutor到Activiti PE
@Component
public class OMPServerExeStarter implements CommandLineRunner
{
private Logger logger = LoggerFactory.getLogger(OMPServerExeStarter.class);
private final ProcessEngine processEngine;
public OMPServerExeStarter(ProcessEngine processEngine)
{
this.processEngine = processEngine;
}
@Override
public void run(String... args) throws Exception
{
System.out.println("--------OMPServerExeStarter--------");
processEngine.getProcessEngineConfiguration().setAsyncExecutor(
new MyAsyncExecutor((ProcessEngineConfigurationImpl)
processEngine.getProcessEngineConfiguration()));
processEngine.getProcessEngineConfiguration().setAsyncExecutorActivate(true);
processEngine.getProcessEngineConfiguration().getAsyncExecutor().start();
}
}
客户化的AsyncExecutor
public class MyAsyncExecutor extends DefaultAsyncJobExecutor
{
public MyAsyncExecutor(ProcessEngineConfigurationImpl processEngineConfiguration)
{
super();
this.setAsyncJobLockTimeInMillis(5*1000);//锁定5秒
this.setResetExpiredJobsInterval(5*1000);
this.setResetExpiredJobsRunnable(new MyResetExpiredJobsRunnable(this));
processEngineConfiguration.setAsyncExecutorNumberOfRetries(1);
this.setProcessEngineConfiguration(processEngineConfiguration);
}
protected Runnable createRunnableForJob(final Job job)
{
if (executeAsyncRunnableFactory == null)
{
return new MyExecuteAsyncRunnable(job, processEngineConfiguration);
}
else
{
return executeAsyncRunnableFactory.createExecuteAsyncRunnable(job, processEngineConfiguration);
}
}
}
客户化的重置线程
public class MyResetExpiredJobsRunnable extends ResetExpiredJobsRunnable
{
@Autowired
private ProcessSrv processSrv;
private static Logger log = LoggerFactory.getLogger(MyResetExpiredJobsRunnable.class);
public MyResetExpiredJobsRunnable(AsyncExecutor asyncExecutor)
{
super(asyncExecutor);
}
public synchronized void run()
{
log.info("{} starting to reset expired jobs");
Thread.currentThread().setName("activiti-reset-expired-jobs");
if(this.processSrv == null)
this.processSrv = ApplicationContextHolder.getBean(ProcessSrv.class);
while (!isInterrupted)
{
try
{
//=================客户化代码===================
List<String> expiredJobIds = processSrv.getExpiredJobs();
//=================客户化代码===================
if (expiredJobIds.size() > 0)
{
for(String str : expiredJobIds)
log.info("got expired jobs:"+str);
asyncExecutor.getProcessEngineConfiguration().getCommandExecutor()
.execute(new ResetExpiredJobsCmd(expiredJobIds));
}
}
catch (Throwable e)
{
if (e instanceof ActivitiOptimisticLockingException)
{
log.debug("Optmistic lock exception while resetting locked jobs", e);
}
else
{
log.error("exception during resetting expired jobs", e.getMessage(), e);
}
}
// Sleep
try
{
synchronized (MONITOR)
{
if (!isInterrupted)
{
isWaiting.set(true);
MONITOR.wait(asyncExecutor.getResetExpiredJobsInterval());
}
}
}
catch (InterruptedException e)
{
if (log.isDebugEnabled())
{
log.debug("async reset expired jobs wait interrupted");
}
}
finally
{
isWaiting.set(false);
}
}
log.info("{} stopped resetting expired jobs");
}
}
客户化的Job执行线程
public class MyExecuteAsyncRunnable extends ExecuteAsyncRunnable
{
private static Logger log = LoggerFactory.getLogger(MyExecuteAsyncRunnable.class);
private ProcessSrv processSrv;
public MyExecuteAsyncRunnable(String jobId, ProcessEngineConfigurationImpl processEngineConfiguration)
{
super(jobId, processEngineConfiguration);
this.processSrv = ApplicationContextHolder.getBean(ProcessSrv.class);
}
public MyExecuteAsyncRunnable(Job job, ProcessEngineConfigurationImpl processEngineConfiguration)
{
super(job, processEngineConfiguration);
this.processSrv = ApplicationContextHolder.getBean(ProcessSrv.class);
}
protected boolean lockJobIfNeeded()
{
try
{
if (job.isExclusive())
{
processEngineConfiguration.getCommandExecutor().execute(new LockExclusiveJobCmd(job));
}
log.info("=========[lockJobIfNeeded] try customize upgradeJobVersion======"+job.getId());
//============客户化代码==========
processSrv.upgradeJobVersion(job.getId());
//============客户化代码==========
}
catch (Throwable lockException)
{
if (log.isDebugEnabled())
{
log.debug("Could not lock exclusive job. Unlocking job so it can be acquired again. Catched exception: "
+ lockException.getMessage());
}
// Release the job again so it can be acquired later or by another node
unacquireJob();
return false;
}
return true;
}
}