目的
在springboot-Quartz 集成源码分析和demo中,需求是在分布式的集群环境中:
- 其中只能其中1台机器获取到锁,而其他的机器不能获取到。
- 但机器挂掉后,其他机器要能接管这个
在最开始考虑的调度功能的时候,就有涉及到这个功能,当时想自己用数据库实现分布式锁。后来发现 Quartz已经实现了,本着知其然知其所以然,研究下Quartz是如何实现的。
Quartz的线程模型
- ThreadExecutor 调度线程
- ThreadPool(SimpleThreadPool) 工作线程池
Quartz的表
Table name | Description |
---|---|
QRTZ_JOB_DETAILS | 具体工作的表 |
QRTZ_TRIGGERS | 记录触发器的表,通过TRIGGER_TYPE来区分是哪种trigger,包括了SIMPLE,CRON,DAILY_I,CAL_INT,BLOB。对应了SimpleTrigger,CronTirgger,DateIntervalTrigger,和 NthIncludedDayTrigger |
QRTZ_SIMPLE_TRIGGERS | SIMPLE类型的触发器表,只有一个触发的间隔时间的字段 |
QRTZ_BLOB_TRIGGERS | BLOBL类型的触发表 |
QRTZ_CRON_TRIGGERS | CRON类型的触发器表 |
QRTZ_FIRED_TRIGGERS | 存储与已激活的Trigger相关的状态信息,以及相联Job的执行信息 |
QRTZ_CALENDARS | 存储Quartz的Calendar信息 |
QRTZ_LOCKS | 提供行锁的表,QRTZ_LOCKS就是Quartz集群实现同步机制的行锁表 |
QRTZ_SCHEDULER_STATE | 记录scheduler的实例对象 |
StdSchedulerFactory启动
在springboot-Quartz 集成源码分析和demo
里分析得出最后Quartz启动StdSchedulerFactory,会调用到
StdSchedulerFactory#private Scheduler instantiate() throws SchedulerException {}
在instantiate()
里有几个比较重要的初始化属性,
除了JobStore外,还有ThreadPool和ThreadExecutor
private Scheduler instantiate() throws SchedulerException {
JobStore js = null;
ThreadPool tp = null;
ThreadExecutor threadExecutor;
}
ThreadPool
Quartz启动ThreadPool(SimpleThreadPool) 工作线程池
private Scheduler instantiate() throws SchedulerException {
ThreadPool tp = null;
//默认是SimpleThreadPool
String tpClass = cfg.getStringProperty(PROP_THREAD_POOL_CLASS, SimpleThreadPool.class.getName());
if (tpClass == null) {
initException = new SchedulerException(
"ThreadPool class not specified. ");
throw initException;
}
try {
tp = (ThreadPool) loadHelper.loadClass(tpClass).newInstance();
} catch (Exception e) {
...
}
tProps = cfg.getPropertyGroup(PROP_THREAD_POOL_PREFIX, true);
try {
setBeanProps(tp, tProps);
} catch (Exception e) {
...
}
...
tp.initialize();//启动threadpool
...
}
SimpleThreadPool的默认的一些属性配置在quartz.jar的
Quartz.propertis里,例如
org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount: 10
org.quartz.threadPool.threadPriority: 5
org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread: true
tp.initialize() 即SimpleThreadPool#initialize()
public class SimpleThreadPool implements ThreadPool {
private List<WorkerThread> workers;
private LinkedList<WorkerThread> availWorkers = new LinkedList<WorkerThread>();
private LinkedList<WorkerThread> busyWorkers = new LinkedList<WorkerThread>();
public void initialize() throws SchedulerConfigException {
...
// 创建worker线程数
Iterator<WorkerThread> workerThreads = createWorkerThreads(count).iterator();
// 循环启动
while(workerThreads.hasNext()) {
WorkerThread wt = workerThreads.next();
wt.start();
//availWorkers是空闲的线程,一开始肯定都是空闲的
availWorkers.add(wt);
}
}
}
WorkerThread
class WorkerThread extends Thread {
//AtomicBoolean来标识是否线程已经停止了
private AtomicBoolean run = new AtomicBoolean(true);
public void run(Runnable newRunnable) {
synchronized(lock) {
if(runnable != null) {
throw new IllegalStateException("Already running a Runnable!");
}
runnable = newRunnable;
lock.notifyAll();
}
}
public void run() {
boolean ran = false;
while (run.get()) {
try {
synchronized(lock) {
while (runnable == null && run.get()) {
lock.wait(500);
}
//启动传入的runnable的run(),不是start()
if (runnable != null) {
ran = true;
runnable.run();
}
}
} catch (InterruptedException unblock) {
...
} catch (Throwable exceptionInRunnable) {
...
} finally {
if (runOnce) {
run.set(false); clearFromBusyWorkersList(this);
} else if(ran) {
ran = false;
makeAvailable(this);
}
}
}
}
}
}
ThreadExecutor
ThreadExecutor默认是DefaultThreadExecutor,instantiate里
private Scheduler instantiate() throws SchedulerException {
ThreadExecutor threadExecutor;
String threadExecutorClass = cfg.getStringProperty(PROP_THREAD_EXECUTOR_CLASS);
if (threadExecutorClass != null) {
tProps = cfg.getPropertyGroup(PROP_THREAD_EXECUTOR, true);
try {
threadExecutor = (ThreadExecutor) loadHelper.loadClass(threadExecutorClass).newInstance();
log.info("Using custom implementation for ThreadExecutor: " + threadExecutorClass);
setBeanProps(threadExecutor, tProps);
} catch (Exception e) {
initException = new SchedulerException(
"ThreadExecutor class '" + threadExecutorClass + "' could not be instantiated.", e);
throw initException;
}
} else {
log.info("Using default implementation for ThreadExecutor");
//使用默认的DefaultThreadExecutor
threadExecutor = new DefaultThreadExecutor();
}
QuartzSchedulerResources rsrcs = new QuartzSchedulerResources();
rsrcs.setThreadExecutor(threadExecutor);
qs = new QuartzScheduler(rsrcs, idleWaitTime, dbFailureRetry);
}
看QuartzScheduler(rsrcs, idleWaitTime, dbFailureRetry)构造器
public QuartzScheduler(QuartzSchedulerResources resources, long idleWaitTime, @Deprecated long dbRetryInterval)
throws SchedulerException {
...
this.schedThread = new QuartzSchedulerThread(this, resources);
//schedThreadExecutor可以由
//org.quartz.threadExecutor.class来指定定义,如果没有定义默认是DefaultThreadExecutor
ThreadExecutor schedThreadExecutor = resources.getThreadExecutor();
//启动QuartzSchedulerThread
schedThreadExecutor.execute(this.schedThread);
...
}
public class DefaultThreadExecutor implements ThreadExecutor {
public void initialize() {
}
public void execute(Thread thread) {
thread.start();
}
}
QuartzSchedulerThread extends Thread ,主要看
QuartzSchedulerThread#run()方法,DefaultThreadExecutor#execute执行就是
QuartzSchedulerThread的run()
public void run() {
...
//返回0之前会一直阻塞
int availThreadCount = qsRsrcs.getThreadPool().blockForAvailableThreads();
if(availThreadCount > 0) {//一定是true
List<OperableTrigger> triggers;
long now = System.currentTimeMillis();
clearSignaledSchedulingChange();
try {
//获取到在一定空闲时间内的任务
triggers = qsRsrcs.getJobStore().acquireNextTriggers(
now + idleWaitTime, Math.min(availThreadCount, qsRsrcs.getMaxBatchSize()), qsRsrcs.getBatchTimeWindow());
...
List<TriggerFiredResult> res = qsRsrcs.getJobStore().triggersFired(triggers);
if(res != null)
bndles = res;
}
qsRsrcs.getJobStore().acquireNextTriggers()
qsRsrcs.getJobStore()=JobStoreSupport
public List<OperableTrigger> acquireNextTriggers(final long noLaterThan, final int maxCount, final long timeWindow)
throws JobPersistenceException {
String lockName;
if(isAcquireTriggersWithinLock() || maxCount > 1) {
lockName = LOCK_TRIGGER_ACCESS;
} else {
lockName = null;
}
//这个方法要先获取到锁,使用非管理的事务,所谓非管理,就是说事务需要代码手动提交,下面注释是源码里的注释
/**
* Execute the given callback having optionally acquired the given lock.
* This uses the non-managed transaction connection.
*
*
* @param lockName The name of the lock to acquire, for example
* "TRIGGER_ACCESS". If null, then no lock is acquired, but the
* lockCallback is still executed in a non-managed transaction.
*/
return executeInNonManagedTXLock(lockName,
new TransactionCallback<List<OperableTrigger>>() {
public List<OperableTrigger> execute(Connection conn) throws JobPersistenceException {
return acquireNextTrigger(conn, noLaterThan, maxCount, timeWindow);
}
},
new TransactionValidator<List<OperableTrigger>>() {
public Boolean validate(Connection conn, List<OperableTrigger> result) throws JobPersistenceException {
try {
List<FiredTriggerRecord> acquired = getDelegate().selectInstancesFiredTriggerRecords(conn, getInstanceId());
Set<String> fireInstanceIds = new HashSet<String>();
for (FiredTriggerRecord ft : acquired) {
fireInstanceIds.add(ft.getFireInstanceId());
}
for (OperableTrigger tr : result) {
if (fireInstanceIds.contains(tr.getFireInstanceId())) {
return true;
}
}
return false;
} catch (SQLException e) {
throw new JobPersistenceException("error validating trigger acquisition", e);
}
}
});
}
executeInNonManagedTXLock()
protected <T> T executeInNonManagedTXLock(
String lockName,
TransactionCallback<T> txCallback, final TransactionValidator<T> txValidator) throws JobPersistenceException {
boolean transOwner = false;
Connection conn = null;
try {
if (lockName != null) {
// If we aren't using db locks, then delay getting DB connection
// until after acquiring the lock since it isn't needed.
//可以不适用数据库做分布式锁,
//默认是使用的,getLockHandler()返回的是org.quartz.impl.jdbcjobstore.StdRowLockSemaphore
if (getLockHandler().requiresConnection()) {
conn = getNonManagedTXConnection();
}
//使用数据库来获取到锁,
//锁使用的是SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = 'quartzScheduler' AND LOCK_NAME = 'TRIGGER_ACCESS' FOR UPDATE来获取到 的行锁
transOwner = getLockHandler().obtainLock(conn, lockName);
}
if (conn == null) {
conn = getNonManagedTXConnection();
}
//获取到锁后,其它的线程或进程都会阻塞在这里。而获取到锁的线/程会继续执行任务
final T result = txCallback.execute(conn);
//完后需要手动提交,
try {
commitConnection(conn);
} catch (JobPersistenceException e) {
..
}
Long sigTime = clearAndGetSignalSchedulingChangeOnTxCompletion();
if(sigTime != null && sigTime >= 0) {
signalSchedulingChangeImmediately(sigTime);
}
return result;
} catch (JobPersistenceException e) {
rollbackConnection(conn);
throw e;
} catch (RuntimeException e) {
rollbackConnection(conn);
throw new JobPersistenceException("Unexpected runtime exception: "
+ e.getMessage(), e);
} finally {
try {
releaseLock(lockName, transOwner);
} finally {
cleanupConnection(conn);
}
}
}
transOwner = getLockHandler().obtainLock(conn, lockName);
getLockHandler() 是
org.quartz.impl.jdbcjobstore.StdRowLockSemaphore
StdRowLockSemaphore.obtainLock(conn, lockName)
public boolean obtainLock(Connection conn, String lockName)
throws LockException {
//如果所已经是自己的,不需要再去获取数据库锁
if (!isLockOwner(lockName)) {
//!!!关键的获取锁的方法
executeSQL(conn, lockName, expandedSQL, expandedInsertSQL);
if(log.isDebugEnabled()) {
log.debug(
"Lock '" + lockName + "' given to: "
+ Thread.currentThread().getName());
}
getThreadLocks().add(lockName);
} else if(log.isDebugEnabled()) {
log.debug(
"Lock '" + lockName + "' Is already owned by: "
+ Thread.currentThread().getName());
}
return true;
}
executeSQL(conn, lockName, expandedSQL, expandedInsertSQL)是通过获取锁来做的,关于MySQL中InnoDB引擎的行锁的知识了 解如下:
MySQL中InnoDB引擎的行锁是通过加在什么上完成(或称实现)的?为什么是这样子的?
答:InnoDB是基于索引来完成行锁
例: select * from tab_with_index where id = 1 for update;
for update 可以根据条件来完成行锁锁定,并且 id 是有索引键的列,
如果 id 不是索引键那么InnoDB将完成表锁,并发将无从谈起。
获取到锁后的代码acquireNextTrigger(conn, noLaterThan, maxCount, timeWindow); 这个方法是获取到所有的执行的Trigger集合,这些集合自然就交给了 ThreadPool里让WorkerTheard去处理
protected List<OperableTrigger> acquireNextTrigger(Connection conn, long noLaterThan, int maxCount, long timeWindow)
throws JobPersistenceException {
if (timeWindow < 0) {
throw new IllegalArgumentException();
}
List<OperableTrigger> acquiredTriggers = new ArrayList<OperableTrigger>();
Set<JobKey> acquiredJobKeysForNoConcurrentExec = new HashSet<JobKey>();
final int MAX_DO_LOOP_RETRY = 3;
int currentLoopCount = 0;
do {
currentLoopCount ++;
try {
//获取到一定时间段内的Trigger,时间段是可以配置的
List<TriggerKey> keys = getDelegate().selectTriggerToAcquire(conn, noLaterThan + timeWindow, getMisfireTime(), maxCount);
// No trigger is ready to fire yet.
if (keys == null || keys.size() == 0)
return acquiredTriggers;
long batchEnd = noLaterThan;
for(TriggerKey triggerKey: keys) {
//再查一次封装成OperableTrigger,为什么不在getDelegate().selectTriggerToAcquire(conn, noLaterThan + timeWindow, getMisfireTime(), maxCount)里全部查出来?
OperableTrigger nextTrigger = retrieveTrigger(conn, triggerKey);
if(nextTrigger == null) {
continue; // next trigger
}
JobKey jobKey = nextTrigger.getJobKey();
JobDetail job;
try {
//根据trigger里对应的jobKey获取到jobDetail表里对应的job
job = retrieveJob(conn, jobKey);
} catch (JobPersistenceException jpe) {
...
continue;
}
...
//更新QRTZ_TRIGGERS里的状态,从STATE_WAITING改成STATE_ACQUIRED
int rowsUpdated = getDelegate().updateTriggerStateFromOtherState(conn, triggerKey, STATE_ACQUIRED, STATE_WAITING);
...
nextTrigger.setFireInstanceId(getFiredTriggerRecordId());
//插入一条即将触发的Trigger到QRTZ_FIRED_TRIGGERS表里
getDelegate().insertFiredTrigger(conn, nextTrigger, STATE_ACQUIRED, null);
...
acquiredTriggers.add(nextTrigger);
}
...
// We are done with the while loop.
break;
} catch (Exception e) {
...
}
} while (true);
// Return the acquired trigger list
return acquiredTriggers;
}
多长时间段的trigger会被拿到
调度线程会去表qrtz_triggers
获取NEXT_FIRE_TIME<当前时间+30s+m 的 所有的trigger,按触发事件排序,取出前n条。 m和n分别对应
可配置
- org.quartz.scheduler.batchTriggerAcquisitionMaxCount=n
- org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow=m
sql分析
本地使用的是Mysql数据库,使用下面来开启sql记录
SET GLOBAL log_output = "FILE";
SET GLOBAL general_log_file = "C:/logs/query.log";
SET GLOBAL general_log = 'ON';
记录到的sql分类如下
-- 获取锁
SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = 'quartzScheduler' AND LOCK_NAME = 'TRIGGER_ACCESS'
for update;
-- 获取得到一定时间内的trigger
SELECT TRIGGER_NAME, TRIGGER_GROUP, NEXT_FIRE_TIME, PRIORITY FROM QRTZ_TRIGGERS WHERE SCHED_NAME = 'quartzScheduler' AND TRIGGER_STATE = 'WAITING' AND NEXT_FIRE_TIME <= 1545293594800 AND (MISFIRE_INSTR = -1 OR (MISFIRE_INSTR != -1 AND NEXT_FIRE_TIME >= 1545293504801)) ORDER BY NEXT_FIRE_TIME ASC, PRIORITY DESC
-- 上面如果有获取到TRIGGER_NAME, TRIGGER_GROUP,则使用下面的sql获取到QRTZ_TRIGGERS
SELECT * FROM QRTZ_TRIGGERS WHERE SCHED_NAME = 'quartzScheduler' AND TRIGGER_NAME = 'trigger-job2' AND TRIGGER_GROUP = 'group1'
SELECT * FROM QRTZ_SIMPLE_TRIGGERS WHERE SCHED_NAME = 'quartzScheduler' AND TRIGGER_NAME = 'trigger-job2' AND TRIGGER_GROUP = 'group1'
SELECT * FROM QRTZ_JOB_DETAILS WHERE SCHED_NAME = 'quartzScheduler' AND JOB_NAME = 'job2' AND JOB_GROUP = 'group1'
-- 更新QRTZ_TRIGGERS里的trigger的状态
UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = 'ACQUIRED' WHERE SCHED_NAME = 'quartzScheduler' AND TRIGGER_NAME = 'trigger-job2' AND TRIGGER_GROUP = 'group1' AND TRIGGER_STATE = 'WAITING'
-- 插入QRTZ_FIRED_TRIGGERS的待触发状态
INSERT INTO QRTZ_FIRED_TRIGGERS (SCHED_NAME, ENTRY_ID, TRIGGER_NAME, TRIGGER_GROUP, INSTANCE_NAME, FIRED_TIME, SCHED_TIME, STATE, JOB_NAME, JOB_GROUP, IS_NONCONCURRENT, REQUESTS_RECOVERY, PRIORITY) VALUES('quartzScheduler', 'DESKTOP-TTESTAQ15452876685211545287681141', 'trigger-job2', 'group1', 'DESKTOP-TTESTAQ1545287668521', 1545293564804, 1545293566392, 'ACQUIRED', null, null, 0, 0, 5)
-- connection.commit()
结论
- 集群中,每台机器都会有一个scheduler,
scheduler里有一个调度线程,调度线程会去表qrtz_triggers
获取NEXT_FIRE_TIME<当前时间+30s+m 的 所有的trigger,按触发事件排序,取出前n条。 m和n分别对应
可配置
- org.quartz.scheduler.batchTriggerAcquisitionMaxCount=n
- org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow=m
- 使用 for update 来获取行锁
- UPDATE QRTZ_TRIGGERS 状态从
WAITING
到ACQUIRED
- INERT INTO
QRTZ_FIRED_TRIGGERS
写入已经FIRED的TRIGGER - connection.commit()释放锁
- 取出的trigger会交给threadpool里workerThread去处理