一、启动流程
调度中心启动类是XxlJobScheduler,源码如下
public void init() throws Exception {
// 初始化国际化文件
initI18n();
// 实时更新调度中心中的执行器状态
// 1、每30秒,从xxl_job_registry表中获取注册的执行器记录,根据update_time判断是否存活
// 2、如果update_time超过90秒没有更新,就视作离线,从xxl_job_registry中移除
// 3、然后将xxl_job_registry的registry_key和xxl_job_group的app_name相同的匹配上,生成dao对象,这个对象中有address_list的值,最后会更新到xxl_job_group中的address_list中
// 4、如果已经从xxl_job_registry移除了,那么xxl_job_group就匹配不到,dao对象的address_list就是null,所以会把xxl_job_group中的address_list改为null
JobRegistryMonitorHelper.getInstance().start();
// 负责处理调度失败的任务,AlarmStatus标记该失败任务有没有处理,0表示未处理
// 1、查询AlarmStatus等于0且调度失败的任务(包括没发送成功调度任务和任务执行失败)
// 2、把AlarmStatus从0改成-1,表示正在处理,根据失败任务找到它的任务
// 3、重新调度任务(重新执行次数等于executor_fail_retry_count值)
// 4、如果需要邮件提醒,那么就发邮件。如果发送成功,就把alarmStatus改成2,发送失败改成3,不需要发邮件就改成1
JobFailMonitorHelper.getInstance().start();
// 负责处理发出了请求,但调度器没有响应的任务
// 1、查询调度成功,但超过10分钟没有响应的任务,并且目标执行器地址不在xxl_job_registry表中,即目标执行器离线了
// 2、将该任务的handleCode设置成500
JobLosedMonitorHelper.getInstance().start();
// 创建2个线程池,为下面的发送调度任务做准备
JobTriggerPoolHelper.toStart();
// 统计每天运行成功的任务、运行失败的任务。把统计结果存入xxl_job_log_report表
JobLogReportHelper.getInstance().start();
// 发送调度任务,具体实现逻辑看下面
JobScheduleHelper.getInstance().start();
logger.info(">>>>>>>>> init xxl-job admin success.");
}
二、发布调度流程
核心代码在XxlJobTrigger类中,源码如下
// jobId:任务id
// triggerType:触发类型:手动触发还是定时器触发
// failRetryCount:失败重复次数
// executorShardingParam:执行器任务分片参数,正常运行都是null,只有JobFailMonitorHelper
// 重复执行失败任务才会从xxl_job_log取出executor_sharding_param的值传入
// executorParam:参数
// addressList:执行器地址,一般为空
public static void trigger(int jobId,
TriggerTypeEnum triggerType,
int failRetryCount,
String executorShardingParam,
String executorParam,
String addressList) {
// 根据任务id从数据库获取任务信息
XxlJobInfo jobInfo = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(jobId);
if (jobInfo == null) {
logger.warn(">>>>>>>>>>>> trigger fail, jobId invalid,jobId={}", jobId);
return;
}
// 赋值参数
if (executorParam != null) {
jobInfo.setExecutorParam(executorParam);
}
int finalFailRetryCount = failRetryCount>=0?failRetryCount:jobInfo.getExecutorFailRetryCount();
// 根据jobGroup的id获取执行器的地址
XxlJobGroup group = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().load(jobInfo.getJobGroup());
// 如果指定执行器地址,就用这个地址覆盖原来的
if (addressList!=null && addressList.trim().length()>0) {
group.setAddressType(1);
group.setAddressList(addressList.trim());
}
// 如果指定了执行器任务分片参数,就使用这些参数
int[] shardingParam = null;
if (executorShardingParam!=null){
String[] shardingArr = executorShardingParam.split("/");
if (shardingArr.length==2 && isNumeric(shardingArr[0]) && isNumeric(shardingArr[1])) {
shardingParam = new int[2];
shardingParam[0] = Integer.valueOf(shardingArr[0]);
shardingParam[1] = Integer.valueOf(shardingArr[1]);
}
}
if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST==ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null)
&& group.getRegistryList()!=null && !group.getRegistryList().isEmpty()
&& shardingParam==null) {
// 如果路由策略是分片广播,并且有执行器地址,并且不是重复执行失败任务,
// 就对每个执行器地址执行processTrigger方法
for (int i = 0; i < group.getRegistryList().size(); i++) {
processTrigger(group, jobInfo, finalFailRetryCount, triggerType, i, group.getRegistryList().size());
}
} else {
// 否则就只传0,1进去
if (shardingParam == null) {
shardingParam = new int[]{0, 1};
}
processTrigger(group, jobInfo, finalFailRetryCount, triggerType, shardingParam[0], shardingParam[1]);
}
}
上下两段发布调度任务的代码中,主要在于分片广播的特殊处理。分片广播会让所有执行器执行一次任务。其他的路由策略都是由ExecutorRouter的实现类实现的,通过各种方式从执行器集群中获取一个执行器地址
private static void processTrigger(XxlJobGroup group, XxlJobInfo jobInfo, int finalFailRetryCount, TriggerTypeEnum triggerType, int index, int total){
// 获取阻塞处理策略,默认是单机串行
ExecutorBlockStrategyEnum blockStrategy = ExecutorBlockStrategyEnum.match(jobInfo.getExecutorBlockStrategy(), ExecutorBlockStrategyEnum.SERIAL_EXECUTION);
// 获取路由策略
ExecutorRouteStrategyEnum executorRouteStrategyEnum = ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null);
// 获取分片广播的分片参数,如0/1,0/2
String shardingParam = (ExecutorRouteStrategyEnum.SHARDING_BROADCAST==executorRouteStrategyEnum)?String.valueOf(index).concat("/").concat(String.valueOf(total)):null;
// 存储日志到xxl_job_log
XxlJobLog jobLog = new XxlJobLog();
jobLog.setJobGroup(jobInfo.getJobGroup());
jobLog.setJobId(jobInfo.getId());
jobLog.setTriggerTime(new Date());
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().save(jobLog);
logger.debug(">>>>>>>>>>> xxl-job trigger start, jobId:{}", jobLog.getId());
// 生成对象
TriggerParam triggerParam = new TriggerParam();
triggerParam.setJobId(jobInfo.getId());
triggerParam.setExecutorHandler(jobInfo.getExecutorHandler());
triggerParam.setExecutorParams(jobInfo.getExecutorParam());
triggerParam.setExecutorBlockStrategy(jobInfo.getExecutorBlockStrategy());
triggerParam.setExecutorTimeout(jobInfo.getExecutorTimeout());
triggerParam.setLogId(jobLog.getId());
triggerParam.setLogDateTime(jobLog.getTriggerTime().getTime());
triggerParam.setGlueType(jobInfo.getGlueType());
triggerParam.setGlueSource(jobInfo.getGlueSource());
triggerParam.setGlueUpdatetime(jobInfo.getGlueUpdatetime().getTime());
triggerParam.setBroadcastIndex(index);
triggerParam.setBroadcastTotal(total);
String address = null;
ReturnT<String> routeAddressResult = null;
if (group.getRegistryList()!=null && !group.getRegistryList().isEmpty()) {
if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST == executorRouteStrategyEnum) {
// 如果是分片广播,就根据index获取执行器地址
if (index < group.getRegistryList().size()) {
address = group.getRegistryList().get(index);
} else {
address = group.getRegistryList().get(0);
}
} else {
// 否则就根据指定的路由策略,获取执行器地址
routeAddressResult = executorRouteStrategyEnum.getRouter().route(triggerParam, group.getRegistryList());
if (routeAddressResult.getCode() == ReturnT.SUCCESS_CODE) {
address = routeAddressResult.getContent();
}
}
} else {
routeAddressResult = new ReturnT<String>(ReturnT.FAIL_CODE, I18nUtil.getString("jobconf_trigger_address_empty"));
}
// http请求发送调度命令
ReturnT<String> triggerResult = null;
if (address != null) {
triggerResult = runExecutor(triggerParam, address);
} else {
triggerResult = new ReturnT<String>(ReturnT.FAIL_CODE, null);
}
StringBuffer triggerMsgSb = new StringBuffer();
triggerMsgSb.append(I18nUtil.getString("jobconf_trigger_type")).append(":").append(triggerType.getTitle());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_admin_adress")).append(":").append(IpUtil.getIp());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regtype")).append(":")
.append( (group.getAddressType() == 0)?I18nUtil.getString("jobgroup_field_addressType_0"):I18nUtil.getString("jobgroup_field_addressType_1") );
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regaddress")).append(":").append(group.getRegistryList());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorRouteStrategy")).append(":").append(executorRouteStrategyEnum.getTitle());
if (shardingParam != null) {
triggerMsgSb.append("("+shardingParam+")");
}
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorBlockStrategy")).append(":").append(blockStrategy.getTitle());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_timeout")).append(":").append(jobInfo.getExecutorTimeout());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorFailRetryCount")).append(":").append(finalFailRetryCount);
triggerMsgSb.append("<br><br><span style=\"color:#00c0ef;\" > >>>>>>>>>>>"+ I18nUtil.getString("jobconf_trigger_run") +"<<<<<<<<<<< </span><br>")
.append((routeAddressResult!=null&&routeAddressResult.getMsg()!=null)?routeAddressResult.getMsg()+"<br><br>":"").append(triggerResult.getMsg()!=null?triggerResult.getMsg():"");
jobLog.setExecutorAddress(address);
jobLog.setExecutorHandler(jobInfo.getExecutorHandler());
jobLog.setExecutorParam(jobInfo.getExecutorParam());
jobLog.setExecutorShardingParam(shardingParam);
jobLog.setExecutorFailRetryCount(finalFailRetryCount);
jobLog.setTriggerCode(triggerResult.getCode());
jobLog.setTriggerMsg(triggerMsgSb.toString());
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(jobLog);
logger.debug(">>>>>>>>>>> xxl-job trigger end, jobId:{}", jobLog.getId());
}
三、配置表说明
1、xxl_job_regirsty
执行器信息表,执行器在启动时会将自身信息发给调度中心,然后调度中心存入该表
2、xxl_job_group
执行器配置表,包括执行器的注册名,ip地址等。与上面的xxl_job_registry关联使用
3、xxl_job_info
任务信息表,配置的任务都存在这
4、xxl_job_log
调度日志表,会将调度失败的任务邮件报警,详见JobFailMonitorHelper类
四、解决问题方案
1、调度中心集群如何保证集群中的每台机器获取的执行器一致性?
通过数据库保证。执行器启动时会向调度中心注册自己,并且每隔几十秒都会发送心跳。调度中心接收到注册后,会往xxl_job_registry中添加执行器记录,接收到心跳时,会更新记录的updateTime。调度中心会有线程循环检查xxl_job_registry表中的执行器记录,把死掉的记录去除(详见JobRegistryMonitorHelper)。 而为了保证多个调度中心集群不会重复往xxl_job_registry中添加执行器记录,所以是先update,如果没有update到再insert。但我觉的还是会有并发问题导致重复插入执行器记录2、一个节点的调度中心添加任务后,另一个节点的调度中心会获取到吗
会获取到。因为每个节点都会有线程不断查询xxl_job_info表,所以通过数据库保证各节点一致性。调度中心的任务管理页面也是每次查询数据库的,所以即使手动改表,刷新页面后,页面上的内容和发送调度的内容都是和数据库一致的3、调度中心集群如何保证不重复发送调度
调度中心每次会计算任务的下次执行时间,通过给表加排他锁的方法,使得每次只有一个节点会发调度指令4、调度中心定时器设计方案
while (!scheduleThreadToStop) {
// Scan Job
long start = System.currentTimeMillis();
Connection conn = null;
Boolean connAutoCommit = null;
PreparedStatement preparedStatement = null;
boolean preReadSuc = true;
try {
conn = XxlJobAdminConfig.getAdminConfig().getDataSource().getConnection();
connAutoCommit = conn.getAutoCommit();
conn.setAutoCommit(false);
preparedStatement = conn.prepareStatement( "select * from xxl_job_lock where lock_name = 'schedule_lock' for update" );
preparedStatement.execute();
// tx start
// 1、pre read
long nowTime = System.currentTimeMillis();
List<XxlJobInfo> scheduleList = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleJobQuery(nowTime + PRE_READ_MS, preReadCount);
if (scheduleList!=null && scheduleList.size()>0) {
// 2、push time-ring
for (XxlJobInfo jobInfo: scheduleList) {
// time-ring jump
if (nowTime > jobInfo.getTriggerNextTime() + PRE_READ_MS) {
// 2.1、trigger-expire > 5s:pass && make next-trigger-time
logger.warn(">>>>>>>>>>> xxl-job, schedule misfire, jobId = " + jobInfo.getId());
// fresh next
refreshNextValidTime(jobInfo, new Date());
} else if (nowTime > jobInfo.getTriggerNextTime()) {
// 2.2、trigger-expire < 5s:direct-trigger && make next-trigger-time
// 1、trigger
JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.CRON, -1, null, null, null);
logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );
// 2、fresh next
refreshNextValidTime(jobInfo, new Date());
// next-trigger-time in 5s, pre-read again
if (jobInfo.getTriggerStatus()==1 && nowTime + PRE_READ_MS > jobInfo.getTriggerNextTime()) {
// 1、make ring second
int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);
// 2、push time ring
pushTimeRing(ringSecond, jobInfo.getId());
// 3、fresh next
refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));
}
} else {
// 2.3、trigger-pre-read:time-ring trigger && make next-trigger-time
// 1、make ring second
int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);
// 2、push time ring
pushTimeRing(ringSecond, jobInfo.getId());
// 3、fresh next
refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));
}
}
// 3、update trigger info
for (XxlJobInfo jobInfo: scheduleList) {
XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleUpdate(jobInfo);
}
} else {
preReadSuc = false;
}
} catch (Exception e) {
if (!scheduleThreadToStop) {
logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread error:{}", e);
}
} finally {
if (conn != null) {
try {
conn.commit();
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
try {
conn.setAutoCommit(connAutoCommit);
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
try {
conn.close();
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
}
// close PreparedStatement
if (null != preparedStatement) {
try {
preparedStatement.close();
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
}
}
long cost = System.currentTimeMillis()-start;
// Wait seconds, align second
if (cost < 1000) { // scan-overtime, not wait
try {
// pre-read period: success > scan each second; fail > skip this period;
TimeUnit.MILLISECONDS.sleep((preReadSuc?1000:PRE_READ_MS) - System.currentTimeMillis()%1000);
} catch (InterruptedException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
}
}
1)每次循环,判断任务的下次运行时间nextTime和当前时间nowTime的间隔。
如果 nowTime - nextTime > 5000,说明调度中心异常,之前有多个调度没有发送出去,此时只更新下次运行时间,不补发调度请求。
如果 0 < nowTime - nextTime <= 5000,说明上一个的调度没有发出去。先通过线程池发送调度请求,再更新下次运行时间。如果 nowTime + 5000 > 更新后的下次运行时间,就计算下次运行时间的秒数,把秒数作为key存入,任务id作为value存入ringData中,会有另外一个线程循环遍历ringData,取出里面的任务并执行。最后会再次更新下次运行时间。
如果 nowTime - nextTime <= 0,说明一切正常,计算下次运行时间的秒数,把秒数作为key存入,任务id作为value存入ringData中,会有另外一个线程循环遍历ringData,取出里面的任务并执行。
2)每次循环,判断有无任务要执行,如果没有,就睡眠 (5000 - System.currentTimeMillis() % 1000) 毫秒,即 (4000,5000]毫秒。如果有任务执行,就睡眠 (1000 - System.currentTimeMillis() % 1000) 毫秒,即 (0,1000]毫秒。