前言
前不久将一个quartz的单点系统改为拥有多个节点的系统,使用的方案也是quartz scheduler(mysql)的方案,因为确实简单好用,本文主要记录一些功能实现和踩的坑,重点是谈谈如何编写任务,如何中断恢复,以及避免重启导致任务出现问题。
问题记录
- job序列化的坑(Spring MethodInvoker)
项目中使用MethodInvokingJobDetailFactoryBean配置很多近百个任务,使用xml的形式,但是quartz分布式形式后,启动就报NotSerializableException,为什么呢?看一下源码,jobDataMap中放了methodInvoker指向this,这是spring上下文的一个bean,肯定无法序列化,也就不能落到数据库了。而且MethodInvokingJobDetailFactoryBean中注释也明确写了不支持持久化,如果需要,自己定制开发一个。
//MethodInvokingJobDetailFactoryBean#afterPropertiesSet
@Override
public void afterPropertiesSet() throws ClassNotFoundException, NoSuchMethodException {
prepare();
// Use specific name if given, else fall back to bean name.
String name = (this.name != null ? this.name : this.beanName);
// Consider the concurrent flag to choose between stateful and stateless job.
Class<? extends Job> jobClass = (this.concurrent ? MethodInvokingJob.class : StatefulMethodInvokingJob.class);
// Build JobDetail instance.
JobDetailImpl jdi = new JobDetailImpl();
jdi.setName(name != null ? name : toString());
jdi.setGroup(this.group);
//jobClass实际上是内部类MethodInvokingJob或StatefulMethodInvokingJob
jdi.setJobClass(jobClass);
jdi.setDurability(true);
//放一个bean,内部类MethodInvokingJob调用
jdi.getJobDataMap().put("methodInvoker", this);
this.jobDetail = jdi;
postProcessJobDetail(this.jobDetail);
}
public static class MethodInvokingJob extends QuartzJobBean {...}
@DisallowConcurrentExecution
public static class StatefulMethodInvokingJob extends MethodInvokingJob {
}
那就依葫芦画瓢写一个,增加了中断恢复,还有代理bean的处理。
public class CustomizedMethodInvokingJobDetailFactoryBean extends ArgumentConvertingMethodInvoker
implements FactoryBean<JobDetail>, BeanNameAware, BeanClassLoaderAware, BeanFactoryAware, InitializingBean, ApplicationContextAware {
//记录类名对应的代理bean
private static final ConcurrentHashMap<String, Object> realClassName2ProxyObject = new ConcurrentHashMap<>();
private static final Logger LOG = LoggerFactory.getLogger(CustomizedMethodInvokingJobDetailFactoryBean.class);
@Nullable
private String name;
private String group = Scheduler.DEFAULT_GROUP;
private boolean concurrent = true;
@Nullable
private String targetBeanName;
@Nullable
private String beanName;
@Nullable
private ClassLoader beanClassLoader = ClassUtils.getDefaultClassLoader();
@Nullable
private BeanFactory beanFactory;
@Nullable
private JobDetail jobDetail;
/**
* 被中断是否恢复
* 中断与否是根据数据库表的记录来确定的,若使用此属性请做好幂等
*/
private boolean requestsRecovery = false;
private static ApplicationContext applicationContext;
public void setName(String name) {
this.name = name;
}
public void setGroup(String group) {
this.group = group;
}
public void setRequestsRecovery(boolean requestsRecovery) {
this.requestsRecovery = requestsRecovery;
}
/**
* 任务是否可并发执行
*/
public void setConcurrent(boolean concurrent) {
this.concurrent = concurrent;
}
public void setTargetBeanName(String targetBeanName) {
this.targetBeanName = targetBeanName;
}
@Override
public void setBeanName(String beanName) {
this.beanName = beanName;
}
@Override
public void setBeanClassLoader(ClassLoader classLoader) {
this.beanClassLoader = classLoader;
}
@Override
public void setBeanFactory(BeanFactory beanFactory) {
this.beanFactory = beanFactory;
}
@Override
protected Class<?> resolveClassName(String className) throws ClassNotFoundException {
return ClassUtils.forName(className, this.beanClassLoader);
}
@Override
public void afterPropertiesSet() throws ClassNotFoundException, NoSuchMethodException {
prepare();
// Use specific name if given, else fall back to bean name.
String name = (this.name != null ? this.name : this.beanName);
// Consider the concurrent flag to choose between stateful and stateless job.
Class<? extends Job> jobClass = (this.concurrent ? BeanInvokingJob.class : StatefulBeanInvokingJob.class);
// Build JobDetail instance.
JobDetailImpl jdi = new JobDetailImpl();
jdi.setName(name != null ? name : toString());
jdi.setGroup(this.group);
jdi.setJobClass(jobClass);
jdi.setDurability(true);
jdi.setRequestsRecovery(this.requestsRecovery);
try {
LOG.info("targetObject类名称:{}", this.getTargetObject().getClass().getName());
Object realObject = AopTargetUtils.getTarget(this.getTargetObject());
jdi.getJobDataMap().put("targetClass", realObject.getClass().getName());
} catch (Exception e) {
LOG.error("获取真实类出错{}:{}", name, e);
jdi.getJobDataMap().put("targetClass", ClassUtils.getUserClass(this.getTargetObject()).getName());
}
String targetClass = jdi.getJobDataMap().getString("targetClass");
//保留真实类名和 bean 之间的关系
if (realClassName2ProxyObject.contains(targetClass)) {
LOG.error("目标类:{}有多个bean/代理bean", targetClass);
} else {
LOG.info("记录targetClass:{} targetObject:{}", targetClass, this.getTargetObject());
realClassName2ProxyObject.put(targetClass, this.getTargetObject());
}
jdi.getJobDataMap().put("targetMethod", this.getTargetMethod());
this.jobDetail = jdi;
postProcessJobDetail(this.jobDetail);
}
protected void postProcessJobDetail(JobDetail jobDetail) {
}
@Override
public Class<?> getTargetClass() {
Class<?> targetClass = super.getTargetClass();
if (targetClass == null && this.targetBeanName != null) {
Assert.state(this.beanFactory != null, "BeanFactory must be set when using 'targetBeanName'");
targetClass = this.beanFactory.getType(this.targetBeanName);
}
return targetClass;
}
@Override
public Object getTargetObject() {
Object targetObject = super.getTargetObject();
if (targetObject == null && this.targetBeanName != null) {
Assert.state(this.beanFactory != null, "BeanFactory must be set when using 'targetBeanName'");
targetObject = this.beanFactory.getBean(this.targetBeanName);
}
return targetObject;
}
@Override
@Nullable
public JobDetail getObject() {
return this.jobDetail;
}
@Override
public Class<? extends JobDetail> getObjectType() {
return (this.jobDetail != null ? this.jobDetail.getClass() : JobDetail.class);
}
@Override
public boolean isSingleton() {
return true;
}
@Override
public void setApplicationContext(ApplicationContext context) throws BeansException {
applicationContext = context;
}
public static class BeanInvokingJob implements Job {
@Override
public void execute(JobExecutionContext context) throws JobExecutionException {
try {
LOG.info("start");
String targetClass = context.getMergedJobDataMap().getString("targetClass");
Class clazz = Class.forName(targetClass);
String targetMethod = context.getMergedJobDataMap().getString("targetMethod");
if (targetMethod == null) {
throw new JobExecutionException("targetMethod cannot be null.", false);
}
Object argumentsObject = context.getMergedJobDataMap().get("arguments");
Object[] arguments = (argumentsObject instanceof String) ? null : (Object[]) argumentsObject;
Object bean = applicationContext.getBean(clazz);
if (realClassName2ProxyObject.contains(targetClass)) {
//获取代理类
bean = realClassName2ProxyObject.get(targetClass);
}
MethodInvoker beanMethod = new MethodInvoker();
beanMethod.setTargetObject(bean);
beanMethod.setTargetMethod(targetMethod);
beanMethod.setArguments(arguments);
beanMethod.prepare();
LOG.info("Invoking Bean: {} ; Method: {}", clazz, targetMethod);
beanMethod.invoke();
} catch (JobExecutionException e) {
throw e;
} catch (Exception e) {
throw new JobExecutionException(e);
} finally {
LOG.info("end");
}
}
}
@DisallowConcurrentExecution
public static class StatefulBeanInvokingJob extends BeanInvokingJob {}
}
- 动态代理问题
如果method invoke形式的任务对于的方法有切面,如日志,事务等,需要调用代理bean,上文的自定义类已经兼容两种代理方式 - misfire问题(错过执行)
misfire有很多文章讲的很棒,其实就是由于各种原因错过执行,以及补偿策略,分SimpleTrigger和CronTrigger,CronTringger情况下如果任务不允许并发,设置为MISFIRE_INSTRUCTION_DO_NOTHING即可。注意配置org.quartz.jobStore.misfireThreshold,用来限定多久算错过任务。 - 安全重启(scheduler shutdown)
这部分主要在下文讨论,测试中发现了一些问题,虽然quartz scheduler有waitForJobsToCompleteOnShutdown,也就是停止的时候等待任务执行完成,但是和spring集成似乎有问题,导致并不能很好等待任务执行完成。如果使用自定义的线程池,会出现另一个问题:任务被完成后,需要修改数据库,scheduler已经停止了,java.lang.IllegalStateException: JobStore is shutdown
。 后面调了一下,可以了,使用quartz默认线程池,使用spring的SchedulerFactoryBean,是可以满足任务执行完再停止。前面之所以有问题,是因为把quartzScheduler注册到spring中了,在上下文销毁时,发生了两次scheduler的销毁,细节再分析。 - 任务禁止并发执行
使用@DisallowConcurrentExecution
注解,上文自定义类中包含的有。需要注意的是,任务禁止并发在分布式环境下有效:
- 即使存在misfire补偿也有效
- 即使存在手动触发任务也有效,前提是使用quartz API手动触发
- 即使存在任务中断恢复也有效
- 任务失败恢复
上文自定义类有requestsRecovery属性,不过需要注意的是,任务抛异常也被认为是正常完成,失败恢复其实是根据数据库表qrtz_fired_triggers中的记录来实现的。 - 手动触发任务
使用scheduler.triggerJob(jobKey);
触发一次调度,但是不一定会立刻执行。
重点:编写定时任务的一些想法,纯讨论
- 短时间任务+高频次调度 VS 长时间任务+低频调度
鉴于quartz的线程池模型,一个长时间执行的任务是一种不友好的做法,而且长时间执行的任务在中断恢复,安全退出等方面都不太容易处理,短时间任务可能是个更好的做法。 - 任务异常处理
应该任务中捕获并处理几乎所有的异常,因为抛给任务调度平台它也不知道怎么办。 - 怎样安全停机,对中断做出响应
- 如果所有的任务时间短,可以设置为完成任务后再shutdown,spring SchedulerFactoryBean设置属性waitForJobsToCompleteOnShutdown为true,并且使用quartz自己的线程池。
@Override
public void destroy() throws SchedulerException {
if (this.scheduler != null) {
logger.info("Shutting down Quartz Scheduler");
this.scheduler.shutdown(this.waitForJobsToCompleteOnShutdown);
}
}
public void shutdown(boolean waitForJobsToComplete) {
//...
schedThread.halt(waitForJobsToComplete);
notifySchedulerListenersShuttingdown();
if( (resources.isInterruptJobsOnShutdown() && !waitForJobsToComplete) ||
(resources.isInterruptJobsOnShutdownWithWait() && waitForJobsToComplete)) {
List<JobExecutionContext> jobs = getCurrentlyExecutingJobs();
for(JobExecutionContext job: jobs) {
if(job.getJobInstance() instanceof InterruptableJob)
try {
((InterruptableJob)job.getJobInstance()).interrupt();
} catch (Throwable e) {
// do nothing, this was just a courtesy effort
getLog().warn("Encountered error when interrupting job {} during shutdown: {}", job.getJobDetail().getKey(), e);
}
}
}
//如果自定义线程池,这里啥也不做
resources.getThreadPool().shutdown(waitForJobsToComplete);
closed = true;
//...
}
- 如果有一部分运行时间较长的任务,那么不设置waitForJobsToCompleteOnShutdown属性,做好幂等或者记录任务执行的进度比较好,因为这个时候正在运行任务的线程如果一直不停下来,下一步就是kill -9,这会导致一些无法预知的问题。
- 可以实现可中断的任务,然后设置shutdown时等待任务完成,就能兼顾多种形式的任务,避免线上出现意外。
- quartz监听器
无论是job监听还是trigger监听,catch所有异常。 - 使用自定义的线程池?
自定义线程池和quartz交互可能有问题,比如自定义线程池等待所有任务完成,去修改db时发现quartz的job store已经停止,泪目... - 划分界限,调度归调度,任务归任务,做好幂等