文章目录
1. xxl-job源码及技术方案分析
-
分析的基础,是按照官网的文档,先有大概的认知:https://www.xuxueli.com/xxl-job/
-
当前的版本是2.3.0
-
宏观方向把握xxl-job的架构
先摘抄一段内容
大致描述:
xxl-job整体架构采用中心化设计,分为调度中心Admin和执行器两部分;
调度中心Admin模块提供trigger触发接口进行作业调度,然后根据作业历史统计下发耗时将作业分配到两个线程池中的一个进行执行;
执行前将作业启动日志记录到xxl_job_log表中,然后利用路由组件选取执行器地址,并利用执行器代理ExecutorBiz将执行下发到路由的执行器上,执行器代理ExecutorBiz实现很简单:就是发送http请求;
执行器在启动时会利用netty初始化一个内嵌http server容器,当接收到调度中心发送过来的指令后,将其转交给EmbedHttpServerHandler处理器进行处理;
EmbedHttpServerHandler处理器在处理作业运行指令时,会根据jobId从缓存中查找对应的JobThread,然后将作业执行指令投递到JobThread实例中triggerQueue队列中排队;
JobThread线程不停循环从triggerQueue队列中提取等待执行的作业信息,然后将其交由IJobHandler真正处理作业调用,JobThread将IJobHandler处理结果解析后投递给TriggerCallbackThread线程中callBackQueue队列中排队;
TriggerCallbackThread内部也是线程不停循环从callBackQueue提取回调任务,然后转交给doCallback方法,这个方法内部通过Admin代理类AdminBizClient叫结果回调发送给调用中心的回调接口,即完成作业完成通知。
上面就是xxl-job作业执行的整体大致流程,将其抽象出来的几个核心组件串联起来看清其脉络,则整个逻辑就比较清晰了。这里理解关键点是JobThread组件,每个作业在每个执行器中会对应一个JobThread实例,当作业下发到执行器上时,找到对应的JobThread进行处理。JobThread采用懒加载和缓存模式设计,只有作业下发执行器未找到对应的JobThread才会创建并返回起来,待下次同一个作业过来执行时直接使用该JobThread即可。
网上一些比较好的源码分析文章
-
执行一次调度的源码分析文章
https://blog.csdn.net/god_86/article/details/114650439
-
项目整体介绍的源码分析文章
https://www.cnblogs.com/hanwenbo/p/14323024.html
1.1. 调度中心(admin)
admin代码的层次结构还是比较简单的,跟常规的项目一致。
admin主要是面向的页面相关功能。
调度核心,还是在xxl-job-core项目里,这个core会被maven打包到admin和core包里。
1.1.1. 启动过程
admin项目启动后,是通过com.xxl.job.admin.core.conf.XxlJobAdminConfig#afterPropertiesSet作为入口。
InitializingBean接口为bean提供了初始化执行的方法,即afterPropertiesSet方法,凡是继承该接口的类,在初始化bean的时候会执行该方法。
进入到xxlJobScheduler.init();
查看源代码,其实从这里就可以看出,这些方法,就是各种功能的启动入口了。
public void init() throws Exception {
// init i18n 初始化语言配置(application.properties中的xxl.job.i18n)
initI18n();
// admin trigger pool start 配置任务触发线程池
JobTriggerPoolHelper.toStart();
// admin registry monitor run 执行器管理:及时新增30s内新注册的执行器,清除90s内未再次注册的执行器(默认心跳保活时间30s)
JobRegistryHelper.getInstance().start();
// admin fail-monitor run 任务失败重试处理
JobFailMonitorHelper.getInstance().start();
// admin lose-monitor run ( depend on JobTriggerPoolHelper ) 执行器执行任务10min内没有给出结果回复,终止该任务
JobCompleteHelper.getInstance().start();
// admin log report start 任务执行状态归总,用于后台管理”运行报表"数据显示;同时,如果日志到达清除时间,则清除日志(清除数据库中的日志)
JobLogReportHelper.getInstance().start();
// start-schedule ( depend on JobTriggerPoolHelper ) 核心中的核心,用于任务触发
JobScheduleHelper.getInstance().start();
logger.info(">>>>>>>>> init xxl-job admin success.");
}
1.1.1.1. 持续运行的线程
这个项目,主要的特点是线程的运用,将调用、执行隔离开。所以分析的重点,是识别线程池,以及线程何时、如何运行。
1.1.1.1.1. JobTriggerPoolHelper配置任务触发线程池
从JobTriggerPoolHelper.toStart();
跟踪代码,中间没有其他抽象类或接口,比较简单,找到核心的代码com.xxl.job.admin.core.thread.JobTriggerPoolHelper#start
public void start(){
// 这里是定义了2个线程池,看名字是快和慢的trigger调度线程池。是根据job的历史执行情况,将不同的任务放入不同的线程池,供后续执行
// 这里只是定义了线程池,在哪里将job工作线程放入和取出去调用executor执行器的呢?重点在JobScheduleHelper中的scheduleThread和ringThread。
fastTriggerPool = new ThreadPoolExecutor(
10,
XxlJobAdminConfig.getAdminConfig().getTriggerPoolFastMax(),
60L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(1000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-fastTriggerPool-" + r.hashCode());
}
});
slowTriggerPool = new ThreadPoolExecutor(
10,
XxlJobAdminConfig.getAdminConfig().getTriggerPoolSlowMax(),
60L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(2000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-slowTriggerPool-" + r.hashCode());
}
});
}
1.1.1.1.2. JobRegistryHelper执行器管理
及时新增30s内新注册的执行器,清除90s内未再次注册的执行器(默认心跳保活时间30s)
下面的代码比较长,但是其实比较简单:
registryOrRemoveThreadPool,这个还没有具体的逻辑,就是一个线程池,后面再看怎么使用的。
registryMonitorThread,这个是执行器注册和清理的线程。XxlJobGroup是执行器的分组表,同样名称的执行器,会将集群的ip:port信息放在一条XxlJobGroup记录中。当executor下线没有继续报送registry的时候,将其从XxlJobGroup移出,那么调度的时候就不会发到那个ip上了。
public void start(){
// for registry or remove
registryOrRemoveThreadPool = new ThreadPoolExecutor(
2,
10,
30L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(2000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-job, admin JobRegistryMonitorHelper-registryOrRemoveThreadPool-" + r.hashCode());
}
},
new RejectedExecutionHandler() {
@Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
r.run();
logger.warn(">>>>>>>>>>> xxl-job, registry or remove too fast, match threadpool rejected handler(run now).");
}
});
// for monitor
registryMonitorThread = new Thread(new Runnable() {
@Override
public void run() {
while (!toStop) {
try {
// auto registry group
List<XxlJobGroup> groupList = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().findByAddressType(0);
if (groupList!=null && !groupList.isEmpty()) {
// remove dead address (admin/executor)
List<Integer> ids = XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().findDead(RegistryConfig.DEAD_TIMEOUT, new Date());
if (ids!=null && ids.size()>0) {
XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().removeDead(ids);
}
// fresh online address (admin/executor)
HashMap<String, List<String>> appAddressMap = new HashMap<String, List<String>>();
List<XxlJobRegistry> list = XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().findAll(RegistryConfig.DEAD_TIMEOUT, new Date());
if (list != null) {
for (XxlJobRegistry item: list) {
if (RegistryConfig.RegistType.EXECUTOR.name().equals(item.getRegistryGroup())) {
String appname = item.getRegistryKey();
List<String> registryList = appAddressMap.get(appname);
if (registryList == null) {
registryList = new ArrayList<String>();
}
if (!registryList.contains(item.getRegistryValue())) {
registryList.add(item.getRegistryValue());
}
appAddressMap.put(appname, registryList);
}
}
}
// fresh group address
for (XxlJobGroup group: groupList) {
List<String> registryList = appAddressMap.get(group.getAppname());
String addressListStr = null;
if (registryList!=null && !registryList.isEmpty()) {
Collections.sort(registryList);
StringBuilder addressListSB = new StringBuilder();
for (String item:registryList) {
addressListSB.append(item).append(",");
}
addressListStr = addressListSB.toString();
addressListStr = addressListStr.substring(0, addressListStr.length()-1);
}
group.setAddressList(addressListStr);
group.setUpdateTime(new Date());
XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().update(group);
}
}
} catch (Exception e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
}
}
try {
TimeUnit.SECONDS.sleep(RegistryConfig.BEAT_TIMEOUT);
} catch (InterruptedException e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
}
}
}
logger.info(">>>>>>>>>>> xxl-job, job registry monitor thread stop");
}
});
registryMonitorThread.setDaemon(true);
registryMonitorThread.setName("xxl-job, admin JobRegistryMonitorHelper-registryMonitorThread");
registryMonitorThread.start();
}
1.1.1.1.3. JobScheduleHelper用于任务触发(重点)
JobScheduleHelper.getInstance().start();
先简单看看,更详细的过程,参考**“preRead预读取jobInfo”**
// 下面是包含了2个线程,scheduleThread和ringThread
// scheduleThread通过预读取待执行的jobInfo,并将jobId放到ringData这个数据结构中
// 而ringThread是实际将要执行的任务从ringData中取出来并放到triggerPool中去。
public void start(){
// schedule thread
scheduleThread = new Thread(new Runnable() {
@Override
public void run() {
try {
TimeUnit.MILLISECONDS.sleep(5000 - System.currentTimeMillis()%1000 );
} catch (InterruptedException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
logger.info(">>>>>>>>> init xxl-job admin scheduler success.");
// pre-read count: treadpool-size * trigger-qps (each trigger cost 50ms, qps = 1000/50 = 20)
// 一次性预读取jobInfo的最大数量,2个pool的和 * 20
int preReadCount = (XxlJobAdminConfig.getAdminConfig().getTriggerPoolFastMax() + XxlJobAdminConfig.getAdminConfig().getTriggerPoolSlowMax()) * 20;
while (!scheduleThreadToStop) {
// Scan Job
long start = System.currentTimeMillis();
Connection conn = null;
Boolean connAutoCommit = null;
PreparedStatement preparedStatement = null;
boolean preReadSuc = true;
try {
conn = XxlJobAdminConfig.getAdminConfig().getDataSource().getConnection();
connAutoCommit = conn.getAutoCommit();
conn.setAutoCommit(false);
// 悲观锁,获取到锁的机器开始执行jobInfo的读取以及后续调度工作,admin集群中没有获取到锁的机器继续等待下一次预读取
preparedStatement = conn.prepareStatement( "select * from xxl_job_lock where lock_name = 'schedule_lock' for update" );
preparedStatement.execute();
// tx start
// 1、pre read,这里就用到了preReadCount和时间PRE_READ_MS(5000),就是读取5s内将要执行的jobInfo
long nowTime = System.currentTimeMillis();
List<XxlJobInfo> scheduleList = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleJobQuery(nowTime + PRE_READ_MS, preReadCount);
if (scheduleList!=null && scheduleList.size()>0) {
// 2、push time-ring
for (XxlJobInfo jobInfo: scheduleList) {
// time-ring jump
if (nowTime > jobInfo.getTriggerNextTime() + PRE_READ_MS) {
// 2.1、trigger-expire > 5s:pass && make next-trigger-time
// 如果当前时间已经超过原先job的触发时间5s了,那么本次就不misfire了,设置下一次时间。
logger.warn(">>>>>>>>>>> xxl-job, schedule misfire, jobId = " + jobInfo.getId());
// 1、misfire match
MisfireStrategyEnum misfireStrategyEnum = MisfireStrategyEnum.match(jobInfo.getMisfireStrategy(), MisfireStrategyEnum.DO_NOTHING);
if (MisfireStrategyEnum.FIRE_ONCE_NOW == misfireStrategyEnum) {
// FIRE_ONCE_NOW 》 trigger
// 如果过期没有触发的策略是立即执行一次,那么就丢到triggerPool中去。
JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.MISFIRE, -1, null, null, null);
logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );
}
// 2、fresh next
refreshNextValidTime(jobInfo, new Date());
} else if (nowTime > jobInfo.getTriggerNextTime()) {
// 2.2、trigger-expire < 5s:direct-trigger && make next-trigger-time
// 1、trigger
JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.CRON, -1, null, null, null);
logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );
// 2、fresh next
refreshNextValidTime(jobInfo, new Date());
// next-trigger-time in 5s, pre-read again
if (jobInfo.getTriggerStatus()==1 && nowTime + PRE_READ_MS > jobInfo.getTriggerNextTime()) {
// 1、make ring second
int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);
// 2、push time ring
pushTimeRing(ringSecond, jobInfo.getId());
// 3、fresh next
refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));
}
} else {
// 2.3、trigger-pre-read:time-ring trigger && make next-trigger-time
// 1、make ring second
int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);
// 2、push time ring这里是将要执行的jobId放到ringData中去,供ringThread使用。
pushTimeRing(ringSecond, jobInfo.getId());
// 3、fresh next
refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));
}
}
// 3、update trigger info
for (XxlJobInfo jobInfo: scheduleList) {
XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleUpdate(jobInfo);
}
} else {
preReadSuc = false;
}
// tx stop
} catch (Exception e) {
if (!scheduleThreadToStop) {
logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread error:{}", e);
}
} finally {
// commit
if (conn != null) {
try {
conn.commit();
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
try {
conn.setAutoCommit(connAutoCommit);
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
try {
conn.close();
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
}
// close PreparedStatement
if (null != preparedStatement) {
try {
preparedStatement.close();
} catch (SQLException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
}
}
long cost = System.currentTimeMillis()-start;
// Wait seconds, align second
if (cost < 1000) { // scan-overtime, not wait
try {
// pre-read period: success > scan each second; fail > skip this period;
TimeUnit.MILLISECONDS.sleep((preReadSuc?1000:PRE_READ_MS) - System.currentTimeMillis()%1000);
} catch (InterruptedException e) {
if (!scheduleThreadToStop) {
logger.error(e.getMessage(), e);
}
}
}
}
logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread stop");
}
});
scheduleThread.setDaemon(true);
scheduleThread.setName("xxl-job, admin JobScheduleHelper#scheduleThread");
scheduleThread.start();
// ring thread
// 这里是说,如果通过preRead拿到了一些jobInfo去执行,当nowTime <= jobInfo.getTriggerNextTime()时,会丢到`pushTimeRing(ringSecond, jobInfo.getId());`,也就是放到Map<Integer, List<Integer>> ringData里面,这里面的数据,就是被下面这个ringThread消耗的。ringThread取出ringData数据,触发JobTriggerPoolHelper.trigger调用。后面“如何确保精准调度”还有更详细的分析。
ringThread = new Thread(new Runnable() {
@Override
public void run() {
while (!ringThreadToStop) {
// align second
try {
TimeUnit.MILLISECONDS.sleep(1000 - System.currentTimeMillis() % 1000);
} catch (InterruptedException e) {
if (!ringThreadToStop) {
logger.error(e.getMessage(), e);
}
}
try {
// second data
List<Integer> ringItemData = new ArrayList<>();
int nowSecond = Calendar.getInstance().get(Calendar.SECOND); // 避免处理耗时太长,跨过刻度,向前校验一个刻度;
for (int i = 0; i < 2; i++) {
List<Integer> tmpData = ringData.remove( (nowSecond+60-i)%60 );
if (tmpData != null) {
ringItemData.addAll(tmpData);
}
}
// ring trigger
logger.debug(">>>>>>>>>>> xxl-job, time-ring beat : " + nowSecond + " = " + Arrays.asList(ringItemData) );
if (ringItemData.size() > 0) {
// do trigger
for (int jobId: ringItemData) {
// do trigger
JobTriggerPoolHelper.trigger(jobId, TriggerTypeEnum.CRON, -1, null, null, null);
}
// clear
ringItemData.clear();
}
} catch (Exception e) {
if (!ringThreadToStop) {
logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread error:{}", e);
}
}
}
logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread stop");
}
});
ringThread.setDaemon(true);
ringThread.setName("xxl-job, admin JobScheduleHelper#ringThread");
ringThread.start();
}
1.1.2. 调度过程
1.1.2.1. preRead预读取jobInfo
int preReadCount = (XxlJobAdminConfig.getAdminConfig().getTriggerPoolFastMax() + XxlJobAdminConfig.getAdminConfig().getTriggerPoolSlowMax()) * 20;
List<XxlJobInfo> scheduleList = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleJobQuery(nowTime + PRE_READ_MS, preReadCount);
PRE_READ_MS是5000
<select id="scheduleJobQuery" parameterType="java.util.HashMap" resultMap="XxlJobInfo">
SELECT <include refid="Base_Column_List" />
FROM xxl_job_info AS t
WHERE t.trigger_status = 1
and t.trigger_next_time <![CDATA[ <= ]]> #{maxNextTime}
ORDER BY id ASC
LIMIT #{pagesize}
</select>
可见,通过预读取的方式,将jobInfo表trigger_next_time < 当前时间+5s的,并且有个limit,是根据triggerPool*20大小算的。
将这些即将被执行的jobInfo读取出来。
对于这些读取出来的数据:
-
2.1、trigger-expire > 5s:pass && make next-trigger-time(if nowTime > jobInfo.getTriggerNextTime() + PRE_READ_MS)
超时了,并且超时大于5s,超时过多,直接pass(pass之后根据过期策略来走)
-
2.2、trigger-expire < 5s:direct-trigger && make next-trigger-time(else if nowTime > jobInfo.getTriggerNextTime())
超时了,但是超时小于5s,直接触发调度执行(执行JobTriggerPoolHelper.trigger)
-
其他情况:2.3、trigger-pre-read:time-ring trigger && make next-trigger-time(else)
没有超时。此时,就会丢到
pushTimeRing(ringSecond, jobInfo.getId());
,也就是放到Map<Integer, List> ringData里面,这里面的数据,就是被下面这个ringThread消耗的。ringThread取出ringData数据,触发JobTriggerPoolHelper.trigger调用。这里scheduleThread和ringThread是相互配合使用的。scheduleThread是预读取数据,先丢到ringData中,比如在第30s,知道有个第33s要执行的任务,此时ringThread的心跳是30,ringThread往后每1s一个心跳,调到33s的时候,正好读取33s的任务数据。具体的case,可以参考**“如何确保精准调度”*章节。
1.1.2.2. db锁
com.xxl.job.admin.core.thread.JobScheduleHelper#start中scheduleThread,select * from xxl_job_lock where lock_name = 'schedule_lock' for update
悲观锁方案,用在admin集群环境下,保证只有一个admin实例来读取待执行的jobInfo。
当执行select status from t_goods where id=1 for update;后。在另外的事务中如果再次执行select status from t_goods where id=1 for update;则第二个事务会一直等待第一个事务的提交,此时第二个查询处于阻塞的状态,但是如果是在第二个事务中执行select status from t_goods where id=1;则能正常查询出数据,不会受第一个事务的影响。
1.1.2.3. 密集调度时的方案
这个就是com.xxl.job.admin.core.thread.JobScheduleHelper#start中ringThread的处理方案,上面已经分析过了。
1.1.2.4. 错过调度时间
// 1、pre read
long nowTime = System.currentTimeMillis();
List<XxlJobInfo> scheduleList = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleJobQuery(nowTime + PRE_READ_MS, preReadCount);
if (scheduleList!=null && scheduleList.size()>0) {
// 2、push time-ring
for (XxlJobInfo jobInfo: scheduleList) {
// time-ring jump
if (nowTime > jobInfo.getTriggerNextTime() + PRE_READ_MS) {
// 2.1、trigger-expire > 5s:pass && make next-trigger-time
// 当前的时间,比原先预计要执行的时间+5s都大,已经错过了比较多了。
logger.warn(">>>>>>>>>>> xxl-job, schedule misfire, jobId = " + jobInfo.getId());
// 1、misfire match
MisfireStrategyEnum misfireStrategyEnum = MisfireStrategyEnum.match(jobInfo.getMisfireStrategy(), MisfireStrategyEnum.DO_NOTHING);
if (MisfireStrategyEnum.FIRE_ONCE_NOW == misfireStrategyEnum) {
// FIRE_ONCE_NOW 》 trigger 根据策略,如果是过期执行一次,那么就直接进行一次调度
JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.MISFIRE, -1, null, null, null);
logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );
}
// 2、fresh next
refreshNextValidTime(jobInfo, new Date());
} else if (nowTime > jobInfo.getTriggerNextTime()) {
// 2.2、trigger-expire < 5s:direct-trigger && make next-trigger-time
1.1.2.5. 触发调度JobTriggerPoolHelper.trigger
JobTriggerPoolHelper.trigger提交jobInfo之后的逻辑分析
JobTriggerPoolHelper.trigger
com.xxl.job.admin.core.thread.JobTriggerPoolHelper#addTrigger
/**
* add trigger
*/
public void addTrigger(final int jobId,
final TriggerTypeEnum triggerType,
final int failRetryCount,
final String executorShardingParam,
final String executorParam,
final String addressList) {
// choose thread pool 这里是根据“job-timeout 10 times in 1 min”来判断丢到哪个jobTriggerPool中去
ThreadPoolExecutor triggerPool_ = fastTriggerPool;
AtomicInteger jobTimeoutCount = jobTimeoutCountMap.get(jobId);
if (jobTimeoutCount!=null && jobTimeoutCount.get() > 10) { // job-timeout 10 times in 1 min
triggerPool_ = slowTriggerPool;
}
// trigger。判断使用哪个线程池之后,就直接调用线程池的execute方法去执行此线程了。那么这里就有个疑问,是否是executor收到执行消息后,再来确定是不是正点调度的呢?
triggerPool_.execute(new Runnable() {
@Override
public void run() {
long start = System.currentTimeMillis();
try {
// do trigger
XxlJobTrigger.trigger(jobId, triggerType, failRetryCount, executorShardingParam, executorParam, addressList);
} catch (Exception e) {
logger.error(e.getMessage(), e);
} finally {
// check timeout-count-map
long minTim_now = System.currentTimeMillis()/60000;
if (minTim != minTim_now) {
minTim = minTim_now;
jobTimeoutCountMap.clear();
}
// incr timeout-count-map
long cost = System.currentTimeMillis()-start;
if (cost > 500) { // ob-timeout threshold 500ms
AtomicInteger timeoutCount = jobTimeoutCountMap.putIfAbsent(jobId, new AtomicInteger(1));
if (timeoutCount != null) {
timeoutCount.incrementAndGet();
}
}
}
}
});
}
继续看com.xxl.job.admin.core.trigger.XxlJobTrigger#trigger
public static void trigger(int jobId,
TriggerTypeEnum triggerType,
int failRetryCount,
String executorShardingParam,
String executorParam,
String addressList) {
// load data 加载db数据
XxlJobInfo jobInfo = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(jobId);
if (jobInfo == null) {
logger.warn(">>>>>>>>>>>> trigger fail, jobId invalid,jobId={}", jobId);
return;
}
if (executorParam != null) {
jobInfo.setExecutorParam(executorParam);
}
int finalFailRetryCount = failRetryCount>=0?failRetryCount:jobInfo.getExecutorFailRetryCount();
XxlJobGroup group = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().load(jobInfo.getJobGroup());
// cover addressList
if (addressList!=null && addressList.trim().length()>0) {
group.setAddressType(1);
group.setAddressList(addressList.trim());
}
// sharding param 分片
int[] shardingParam = null;
if (executorShardingParam!=null){
String[] shardingArr = executorShardingParam.split("/");
if (shardingArr.length==2 && isNumeric(shardingArr[0]) && isNumeric(shardingArr[1])) {
shardingParam = new int[2];
shardingParam[0] = Integer.valueOf(shardingArr[0]);
shardingParam[1] = Integer.valueOf(shardingArr[1]);
}
}
// 如果是分片广播模式,那就循环registryList调用processTrigger(注意这个方法里有个i,index,会构造成类似 2/5 这样的参数,给到executor,提高执行器并行处理的吞吐量。)
if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST==ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null)
&& group.getRegistryList()!=null && !group.getRegistryList().isEmpty()
&& shardingParam==null) {
for (int i = 0; i < group.getRegistryList().size(); i++) {
// processTrigger下面继续分析
processTrigger(group, jobInfo, finalFailRetryCount, triggerType, i, group.getRegistryList().size());
}
} else {
if (shardingParam == null) {
shardingParam = new int[]{0, 1};
}
processTrigger(group, jobInfo, finalFailRetryCount, triggerType, shardingParam[0], shardingParam[1]);
}
}
继续分析processTrigger,com.xxl.job.admin.core.trigger.XxlJobTrigger#processTrigger
private static void processTrigger(XxlJobGroup group, XxlJobInfo jobInfo, int finalFailRetryCount, TriggerTypeEnum triggerType, int index, int total){
// param
ExecutorBlockStrategyEnum blockStrategy = ExecutorBlockStrategyEnum.match(jobInfo.getExecutorBlockStrategy(), ExecutorBlockStrategyEnum.SERIAL_EXECUTION); // block strategy
ExecutorRouteStrategyEnum executorRouteStrategyEnum = ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null); // route strategy 如果是分区广播路由策略,那么构造一下参数,例如:"2/5"
String shardingParam = (ExecutorRouteStrategyEnum.SHARDING_BROADCAST==executorRouteStrategyEnum)?String.valueOf(index).concat("/").concat(String.valueOf(total)):null;
// 1、save log-id,jobLog的信息,会放到triggerParam中,传递给executor。2者据此可以关联起来
XxlJobLog jobLog = new XxlJobLog();
jobLog.setJobGroup(jobInfo.getJobGroup());
jobLog.setJobId(jobInfo.getId());
jobLog.setTriggerTime(new Date());
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().save(jobLog);
logger.debug(">>>>>>>>>>> xxl-job trigger start, jobId:{}", jobLog.getId());
// 2、init trigger-param。这个参数最终会通过http调用给到executor
TriggerParam triggerParam = new TriggerParam();
triggerParam.setJobId(jobInfo.getId());
triggerParam.setExecutorHandler(jobInfo.getExecutorHandler());
triggerParam.setExecutorParams(jobInfo.getExecutorParam());
triggerParam.setExecutorBlockStrategy(jobInfo.getExecutorBlockStrategy());
triggerParam.setExecutorTimeout(jobInfo.getExecutorTimeout());
triggerParam.setLogId(jobLog.getId());
triggerParam.setLogDateTime(jobLog.getTriggerTime().getTime());
triggerParam.setGlueType(jobInfo.getGlueType());
triggerParam.setGlueSource(jobInfo.getGlueSource());
triggerParam.setGlueUpdatetime(jobInfo.getGlueUpdatetime().getTime());
triggerParam.setBroadcastIndex(index);
triggerParam.setBroadcastTotal(total);
triggerParam.setHandlerMethod(jobInfo.getHandlerMethod());
// 3、init address。选择一个executor。
String address = null;
ReturnT<String> routeAddressResult = null;
if (group.getRegistryList()!=null && !group.getRegistryList().isEmpty()) {
if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST == executorRouteStrategyEnum) {
if (index < group.getRegistryList().size()) {
address = group.getRegistryList().get(index);
} else {
address = group.getRegistryList().get(0);
}
} else {
// executorRouteStrategyEnum.getRouter().route方法要重点分析一下。看下一小节。
routeAddressResult = executorRouteStrategyEnum.getRouter().route(triggerParam, group.getRegistryList());
if (routeAddressResult.getCode() == ReturnT.SUCCESS_CODE) {
address = routeAddressResult.getContent();
}
}
} else {
routeAddressResult = new ReturnT<String>(ReturnT.FAIL_CODE, I18nUtil.getString("jobconf_trigger_address_empty"));
}
// 4、trigger remote executor
ReturnT<String> triggerResult = null;
if (address != null) {
// 这里就是http调用executor的入口了
triggerResult = runExecutor(triggerParam, address);
} else {
triggerResult = new ReturnT<String>(ReturnT.FAIL_CODE, null);
}
// 5、collection trigger info
StringBuffer triggerMsgSb = new StringBuffer();
triggerMsgSb.append(I18nUtil.getString("jobconf_trigger_type")).append(":").append(triggerType.getTitle());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_admin_adress")).append(":").append(IpUtil.getIp());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regtype")).append(":")
.append( (group.getAddressType() == 0)?I18nUtil.getString("jobgroup_field_addressType_0"):I18nUtil.getString("jobgroup_field_addressType_1") );
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regaddress")).append(":").append(group.getRegistryList());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorRouteStrategy")).append(":").append(executorRouteStrategyEnum.getTitle());
if (shardingParam != null) {
triggerMsgSb.append("("+shardingParam+")");
}
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorBlockStrategy")).append(":").append(blockStrategy.getTitle());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_timeout")).append(":").append(jobInfo.getExecutorTimeout());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorFailRetryCount")).append(":").append(finalFailRetryCount);
triggerMsgSb.append("<br><br><span style=\"color:#00c0ef;\" > >>>>>>>>>>>"+ I18nUtil.getString("jobconf_trigger_run") +"<<<<<<<<<<< </span><br>")
.append((routeAddressResult!=null&&routeAddressResult.getMsg()!=null)?routeAddressResult.getMsg()+"<br><br>":"").append(triggerResult.getMsg()!=null?triggerResult.getMsg():"");
// 6、save log trigger-info
jobLog.setExecutorAddress(address);
jobLog.setExecutorHandler(jobInfo.getExecutorHandler());
jobLog.setExecutorParam(jobInfo.getExecutorParam());
jobLog.setExecutorShardingParam(shardingParam);
jobLog.setExecutorFailRetryCount(finalFailRetryCount);
//jobLog.setTriggerTime();
jobLog.setTriggerCode(triggerResult.getCode());
jobLog.setTriggerMsg(triggerMsgSb.toString());
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(jobLog);
logger.debug(">>>>>>>>>>> xxl-job trigger end, jobId:{}", jobLog.getId());
}
admin通过http调用executor的run方法,来通知执行任务com.xxl.job.admin.core.trigger.XxlJobTrigger#runExecutor
public static ReturnT<String> runExecutor(TriggerParam triggerParam, String address){
// triggerParam是封装的job执行参数,address是根据策略选择出来的executor的ip:port信息
ReturnT<String> runResult = null;
try {
ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(address);
runResult = executorBiz.run(triggerParam);
} catch (Exception e) {
logger.error(">>>>>>>>>>> xxl-job trigger error, please check if the executor[{}] is running.", address, e);
runResult = new ReturnT<String>(ReturnT.FAIL_CODE, ThrowableUtil.toString(e));
}
StringBuffer runResultSB = new StringBuffer(I18nUtil.getString("jobconf_trigger_run") + ":");
runResultSB.append("<br>address:").append(address);
runResultSB.append("<br>code:").append(runResult.getCode());
runResultSB.append("<br>msg:").append(runResult.getMsg());
runResult.setMsg(runResultSB.toString());
return runResult;
}
接下来是com.xxl.job.core.biz.client.ExecutorBizClient#run
@Override
public ReturnT<String> run(TriggerParam triggerParam) {
// 这个addressUrl,就是通过策略选出来的,要执行任务的executor地址。
return XxlJobRemotingUtil.postBody(addressUrl + "run", accessToken, timeout, triggerParam, String.class);
}
postBody方法,代码比较长,但是都比较常规:
public static ReturnT postBody(String url, String accessToken, int timeout, Object requestObj, Class returnTargClassOfT) {
HttpURLConnection connection = null;
BufferedReader bufferedReader = null;
try {
// connection
URL realUrl = new URL(url);
connection = (HttpURLConnection) realUrl.openConnection();
// trust-https
boolean useHttps = url.startsWith("https");
if (useHttps) {
HttpsURLConnection https = (HttpsURLConnection) connection;
trustAllHosts(https);
}
// connection setting
connection.setRequestMethod("POST");
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setUseCaches(false);
connection.setReadTimeout(timeout * 1000);
connection.setConnectTimeout(3 * 1000);
connection.setRequestProperty("connection", "Keep-Alive");
connection.setRequestProperty("Content-Type", "application/json;charset=UTF-8");
connection.setRequestProperty("Accept-Charset", "application/json;charset=UTF-8");
if(accessToken!=null && accessToken.trim().length()>0){
connection.setRequestProperty(XXL_JOB_ACCESS_TOKEN, accessToken);
}
// do connection
connection.connect();
// write requestBody
if (requestObj != null) {
String requestBody = GsonTool.toJson(requestObj);
DataOutputStream dataOutputStream = new DataOutputStream(connection.getOutputStream());
dataOutputStream.write(requestBody.getBytes("UTF-8"));
dataOutputStream.flush();
dataOutputStream.close();
}
/*byte[] requestBodyBytes = requestBody.getBytes("UTF-8");
connection.setRequestProperty("Content-Length", String.valueOf(requestBodyBytes.length));
OutputStream outwritestream = connection.getOutputStream();
outwritestream.write(requestBodyBytes);
outwritestream.flush();
outwritestream.close();*/
// valid StatusCode
int statusCode = connection.getResponseCode();
if (statusCode != 200) {
return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting fail, StatusCode("+ statusCode +") invalid. for url : " + url);
}
// result
bufferedReader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
StringBuilder result = new StringBuilder();
String line;
while ((line = bufferedReader.readLine()) != null) {
result.append(line);
}
String resultJson = result.toString();
// parse returnT
try {
ReturnT returnT = GsonTool.fromJson(resultJson, ReturnT.class, returnTargClassOfT);
return returnT;
} catch (Exception e) {
logger.error("xxl-rpc remoting (url="+url+") response content invalid("+ resultJson +").", e);
return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting (url="+url+") response content invalid("+ resultJson +").");
}
} catch (Exception e) {
logger.error(e.getMessage(), e);
return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting error("+ e.getMessage() +"), for url : " + url);
} finally {
try {
if (bufferedReader != null) {
bufferedReader.close();
}
if (connection != null) {
connection.disconnect();
}
} catch (Exception e2) {
logger.error(e2.getMessage(), e2);
}
}
}
至此,一个jobInfo的trigger流程就大体上看完了。
1.1.2.6. 选择一个执行器executor
选择执行器的代码还是在com.xxl.job.admin.core.trigger.XxlJobTrigger#processTrigger
executorRouteStrategyEnum.getRouter().route
这里的实现类比较多,是根据页面选择的策略来做的,有这么多实现类:
我们选择随机的来看看com.xxl.job.admin.core.route.strategy.ExecutorRouteRound#route
:
@Override
public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) {
String address = addressList.get(count(triggerParam.getJobId())%addressList.size());
return new ReturnT<String>(address);
}
代码不多,就是根据jobId和executor总数取余。
1.1.2.7. 终止任务kill
这个,有2个点要看,一个是终止任务的入口,还有一个是官方手册上说的,要抛出异常,不能吃掉。
通过/kill
来找。com.xxl.job.core.biz.client.ExecutorBizClient#kill
。据此信息,找到的controller入口是com.xxl.job.admin.controller.JobLogController#logKill
@RequestMapping("/logKill")
@ResponseBody
public ReturnT<String> logKill(int id){
// base check
XxlJobLog log = xxlJobLogDao.load(id);
XxlJobInfo jobInfo = xxlJobInfoDao.loadById(log.getJobId());
if (jobInfo==null) {
return new ReturnT<String>(500, I18nUtil.getString("jobinfo_glue_jobid_unvalid"));
}
if (ReturnT.SUCCESS_CODE != log.getTriggerCode()) {
return new ReturnT<String>(500, I18nUtil.getString("joblog_kill_log_limit"));
}
// request of kill
ReturnT<String> runResult = null;
try {
ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(log.getExecutorAddress());
// 其实就是调用了executor的kill接口,主要的核心功能还是在executor中去找
runResult = executorBiz.kill(new KillParam(jobInfo.getId()));
} catch (Exception e) {
logger.error(e.getMessage(), e);
runResult = new ReturnT<String>(500, e.getMessage());
}
if (ReturnT.SUCCESS_CODE == runResult.getCode()) {
log.setHandleCode(ReturnT.FAIL_CODE);
log.setHandleMsg( I18nUtil.getString("joblog_kill_log_byman")+":" + (runResult.getMsg()!=null?runResult.getMsg():""));
log.setHandleTime(new Date());
XxlJobCompleter.updateHandleInfoAndFinish(log);
return new ReturnT<String>(runResult.getMsg());
} else {
return new ReturnT<String>(500, runResult.getMsg());
}
}
继续追踪executor中的kill部分,这里要知道,executor主要是集成了netty server,代码在com.xxl.job.core.server.EmbedServer
@Override
public ReturnT<String> kill(KillParam killParam) {
// kill handlerThread, and create new one,从jobThreadRepository拿出来工作线程
JobThread jobThread = XxlJobExecutor.loadJobThread(killParam.getJobId());
if (jobThread != null) {
XxlJobExecutor.removeJobThread(killParam.getJobId(), "scheduling center kill job.");
return ReturnT.SUCCESS;
}
return new ReturnT<String>(ReturnT.SUCCESS_CODE, "job thread already killed.");
}
// 从jobThreadRepository这个容器里取出工作线程
public static JobThread loadJobThread(int jobId){
JobThread jobThread = jobThreadRepository.get(jobId);
return jobThread;
}
// 移除工作线程
public static JobThread removeJobThread(int jobId, String removeOldReason){
JobThread oldJobThread = jobThreadRepository.remove(jobId);
if (oldJobThread != null) {
oldJobThread.toStop(removeOldReason);
oldJobThread.interrupt();
return oldJobThread;
}
return null;
}
/**
* kill job thread
*
* @param stopReason
*/
public void toStop(String stopReason) {
/**
* Thread.interrupt只支持终止线程的阻塞状态(wait、join、sleep),
* 在阻塞出抛出InterruptedException异常,但是并不会终止运行的线程本身;
* 所以需要注意,此处彻底销毁本线程,需要通过共享变量方式;
*/
this.toStop = true;
this.stopReason = stopReason;
}
这里就有概念了,诚如上述代码注释:“Thread.interrupt只支持终止线程的阻塞状态(wait、join、sleep),在阻塞出抛出InterruptedException异常,但是并不会终止运行的线程本身;以需要注意,此处彻底销毁本线程,需要通过共享变量方式;”,这一段如何理解?
上述问题,单独放到“线程的interrupt”这一节来详细讨论。
这里继续看,是如何终止线程的,参考官方手册的说法,下图:
理解一下:流程上来看,如果有对于比如jobInfo.getId = 1的任务,已经有一个A jobThread在运行了,此时同样的jobInfo id为1又来了一个任务。那么,会看阻塞策略:
-
如果策略是丢弃后续的任务,那么简单,直接丢弃就好了
-
如果是cover覆盖之前的任务
会为B任务生成一个新的jobThread并start
同时会把之前A这个jobThread取出来,设置其toStop = true并且调用此A线程的interrupt方法。
对于什么时候将请求进入queue,什么时候会interrupt之前的jobThread,将在executor代码分析的**“调度请求queue”**这一节来分析
1.1.2.8. failover
官网解释:
执行器如若集群部署,调度中心将会感知到在线的所有执行器,如“127.0.0.1:9997, 127.0.0.1:9998, 127.0.0.1:9999”。
当任务”路由策略”选择”故障转移(FAILOVER)”时,当调度中心每次发起调度请求时,会按照顺序对执行器发出心跳检测请求,第一个检测为存活状态的执行器将会被选定并发送调度请求。
调度成功后,可在日志监控界面查看“调度备注”,如下;
“调度备注”可以看出本地调度运行轨迹,执行器的”注册方式”、”地址列表”和任务的”路由策略”。”故障转移(FAILOVER)”路由策略下,调度中心首先对第一个地址进行心跳检测,心跳失败因此自动跳过,第二个依然心跳检测失败……
直至心跳检测第三个地址“127.0.0.1:9999”成功,选定为“目标执行器”;然后对“目标执行器”发送调度请求,调度流程结束,等待执行器回调执行结果。
通过代码来查看,哪里实现了failover:
代码如下:
public class ExecutorRouteFailover extends ExecutorRouter {
@Override
public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) {
StringBuffer beatResultSB = new StringBuffer();
for (String address : addressList) {
// beat
ReturnT<String> beatResult = null;
try {
ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(address);
// 向executor发送beat请求,成功的响应,会被选为目标executor
beatResult = executorBiz.beat();
} catch (Exception e) {
logger.error(e.getMessage(), e);
beatResult = new ReturnT<String>(ReturnT.FAIL_CODE, ""+e );
}
beatResultSB.append( (beatResultSB.length()>0)?"<br><br>":"")
.append(I18nUtil.getString("jobconf_beat") + ":")
.append("<br>address:").append(address)
.append("<br>code:").append(beatResult.getCode())
.append("<br>msg:").append(beatResult.getMsg());
// beat success
if (beatResult.getCode() == ReturnT.SUCCESS_CODE) {
beatResult.setMsg(beatResultSB.toString());
beatResult.setContent(address);
return beatResult;
}
}
return new ReturnT<String>(ReturnT.FAIL_CODE, beatResultSB.toString());
}
}
1.1.3. glue
官网解释:
原理:每个 “GLUE模式(Java)” 任务的代码,实际上是“一个继承自“IJobHandler”的实现类的类代码”,“执行器”接收到“调度中心”的调度请求时,会通过Groovy类加载器加载此代码,实例化成Java对象,同时注入此代码中声明的Spring服务(请确保Glue代码中的服务和类引用在“执行器”项目中存在),然后调用该对象的execute方法,执行任务逻辑。
对于此,那么glue模式实现的核心代码应该是在executor端。从executor的/run
入口开始找。
com.xxl.job.core.biz.impl.ExecutorBizImpl#run
} else if (GlueTypeEnum.GLUE_GROOVY == glueTypeEnum) {
// GLUE_GROOVY("GLUE(Java)", false, null, null),
// valid old jobThread
if (jobThread != null &&
!(jobThread.getHandler() instanceof GlueJobHandler
&& ((GlueJobHandler) jobThread.getHandler()).getGlueUpdatetime()==triggerParam.getGlueUpdatetime() )) {
// change handler or gluesource updated, need kill old thread
removeOldReason = "change job source or glue type, and terminate the old job thread.";
jobThread = null;
jobHandler = null;
}
// valid handler
if (jobHandler == null) {
try {
// 这里是重点。最终的结果就是将java代码生成一个IJobHandler对象。
// 这里的triggerParam.getGlueSource(),就是在配置job的时候,填写的java代码。
IJobHandler originJobHandler = GlueFactory.getInstance().loadNewInstance(triggerParam.getGlueSource());
jobHandler = new GlueJobHandler(originJobHandler, triggerParam.getGlueUpdatetime());
} catch (Exception e) {
logger.error(e.getMessage(), e);
return new ReturnT<String>(ReturnT.FAIL_CODE, e.getMessage());
}
}
}
我们新建一个Glue(Java)任务看看
运行的时候进行debug,可以看到如下信息:
要注意的是,这里GlueFactory实际的对象是SpringGlueFactory。
实际的Glue功能代码其实比较简单,依赖groovy包,加载java代码称为class,并检查这个class中用到的属性是否用到了spring注解,再通过spring容器找到对应的属性对象,进行赋值。
package com.xxl.job.core.glue;
import com.xxl.job.core.glue.impl.SpringGlueFactory;
import com.xxl.job.core.handler.IJobHandler;
import groovy.lang.GroovyClassLoader;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
/**
* glue factory, product class/object by name
*
* @author xuxueli 2016-1-2 20:02:27
*/
public class GlueFactory {
private static GlueFactory glueFactory = new GlueFactory();
public static GlueFactory getInstance(){
return glueFactory;
}
public static void refreshInstance(int type){
if (type == 0) {
glueFactory = new GlueFactory();
} else if (type == 1) {
glueFactory = new SpringGlueFactory();
}
}
/**
* groovy class loader
*/
private GroovyClassLoader groovyClassLoader = new GroovyClassLoader();
private ConcurrentMap<String, Class<?>> CLASS_CACHE = new ConcurrentHashMap<>();
/**
* load new instance, prototype
*
* @param codeSource
* @return
* @throws Exception
*/
public IJobHandler loadNewInstance(String codeSource) throws Exception{
if (codeSource!=null && codeSource.trim().length()>0) {
Class<?> clazz = getCodeSourceClass(codeSource);
if (clazz != null) {
Object instance = clazz.newInstance();
if (instance!=null) {
if (instance instanceof IJobHandler) {
// 只有这里要注意,本类有一个继承类,SpringGlueFactory,实际用的是SpringGlueFactory的injectService,将需要用到的Spring容器中的bean进行注入。
this.injectService(instance);
return (IJobHandler) instance;
} else {
throw new IllegalArgumentException(">>>>>>>>>>> xxl-glue, loadNewInstance error, "
+ "cannot convert from instance["+ instance.getClass() +"] to IJobHandler");
}
}
}
}
throw new IllegalArgumentException(">>>>>>>>>>> xxl-glue, loadNewInstance error, instance is null");
}
private Class<?> getCodeSourceClass(String codeSource){
try {
// md5
byte[] md5 = MessageDigest.getInstance("MD5").digest(codeSource.getBytes());
String md5Str = new BigInteger(1, md5).toString(16);
Class<?> clazz = CLASS_CACHE.get(md5Str);
if(clazz == null){
clazz = groovyClassLoader.parseClass(codeSource);
CLASS_CACHE.putIfAbsent(md5Str, clazz);
}
return clazz;
} catch (Exception e) {
return groovyClassLoader.parseClass(codeSource);
}
}
/**
* inject service of bean field
*
* @param instance
*/
public void injectService(Object instance) {
// do something
}
}
com.xxl.job.core.glue.impl.SpringGlueFactory#injectService
/**
* inject action of spring
* @param instance
*/
@Override
public void injectService(Object instance){
if (instance==null) {
return;
}
if (XxlJobSpringExecutor.getApplicationContext() == null) {
return;
}
Field[] fields = instance.getClass().getDeclaredFields();
for (Field field : fields) {
if (Modifier.isStatic(field.getModifiers())) {
continue;
}
Object fieldBean = null;
// with bean-id, bean could be found by both @Resource and @Autowired, or bean could only be found by @Autowired
// 通过@Resource和@Autowired注解,找到对应的bean并设置到本类的属性上。
if (AnnotationUtils.getAnnotation(field, Resource.class) != null) {
try {
Resource resource = AnnotationUtils.getAnnotation(field, Resource.class);
if (resource.name()!=null && resource.name().length()>0){
fieldBean = XxlJobSpringExecutor.getApplicationContext().getBean(resource.name());
} else {
fieldBean = XxlJobSpringExecutor.getApplicationContext().getBean(field.getName());
}
} catch (Exception e) {
}
if (fieldBean==null ) {
fieldBean = XxlJobSpringExecutor.getApplicationContext().getBean(field.getType());
}
} else if (AnnotationUtils.getAnnotation(field, Autowired.class) != null) {
Qualifier qualifier = AnnotationUtils.getAnnotation(field, Qualifier.class);
if (qualifier!=null && qualifier.value()!=null && qualifier.value().length()>0) {
fieldBean = XxlJobSpringExecutor.getApplicationContext().getBean(qualifier.value());
} else {
fieldBean = XxlJobSpringExecutor.getApplicationContext().getBean(field.getType());
}
}
if (fieldBean!=null) {
field.setAccessible(true);
try {
field.set(instance, fieldBean);
} catch (IllegalArgumentException e) {
logger.error(e.getMessage(), e);
} catch (IllegalAccessException e) {
logger.error(e.getMessage(), e);
}
}
}
}
1.1.4. 日志rolling与callback
官网解释callback
调度模块的“调度中心”作为Web服务部署时,一方面承担调度中心功能,另一方面也为执行器提供API服务。
调度中心提供的”日志回调服务API服务”代码位置如下:
xxl-job-admin#com.xxl.job.admin.controller.JobApiController.callback
“执行器”在接收到任务执行请求后,执行任务,在执行结束之后会将执行结果回调通知“调度中心”
这里有个日志rolling,官网解释:
XXL-JOB会为每次调度请求生成一个单独的日志文件,需要通过 “XxlJobHelper.log” 打印执行日志,“调度中心”查看执行日志时将会加载对应的日志文件。
(历史版本通过重写LOG4J的Appender实现,存在依赖限制,该方式在新版本已经被抛弃)
日志文件存放的位置可在“执行器”配置文件进行自定义,默认目录格式为:/data/applogs/xxl-job/jobhandler/“格式化日期”/“数据库调度日志记录的主键ID.log”。
在JobHandler中开启子线程时,子线程将会将会把日志打印在父线程即JobHandler的执行日志中,方便日志追踪。
重点看一下XxlJobHelper.log方法:
/**
* append log
*
* @param callInfo
* @param appendLog
*/
private static boolean logDetail(StackTraceElement callInfo, String appendLog) {
XxlJobContext xxlJobContext = XxlJobContext.getXxlJobContext();
if (xxlJobContext == null) {
return false;
}
/*// "yyyy-MM-dd HH:mm:ss [ClassName]-[MethodName]-[LineNumber]-[ThreadName] log";
StackTraceElement[] stackTraceElements = new Throwable().getStackTrace();
StackTraceElement callInfo = stackTraceElements[1];*/
// 这里主要是一些拼接,我们看一下后面的日志样例,大概就知道拼接的是什么内容了。
StringBuffer stringBuffer = new StringBuffer();
stringBuffer.append(DateUtil.formatDateTime(new Date())).append(" ")
.append("["+ callInfo.getClassName() + "#" + callInfo.getMethodName() +"]").append("-")
.append("["+ callInfo.getLineNumber() +"]").append("-")
.append("["+ Thread.currentThread().getName() +"]").append(" ")
.append(appendLog!=null?appendLog:"");
String formatAppendLog = stringBuffer.toString();
// appendlog
String logFileName = xxlJobContext.getJobLogFileName();
if (logFileName!=null && logFileName.trim().length()>0) {
XxlJobFileAppender.appendLog(logFileName, formatAppendLog);
return true;
} else {
logger.info(">>>>>>>>>>> {}", formatAppendLog);
return false;
}
}
2021-11-22 09:52:48 [com.xxl.job.core.thread.JobThread#run]-[130]-[Thread-16]
----------- xxl-job job execute start -----------
----------- Param:
2021-11-22 09:52:48 [com.xxl.job.core.handler.impl.GlueJobHandler#execute]-[25]-[Thread-16] ----------- glue.version:1637545254000 -----------
2021-11-22 09:52:48 [sun.reflect.NativeMethodAccessorImpl#invoke0]-[-2]-[Thread-16] XXL-JOB, Hello World.
2021-11-22 09:52:48 [sun.reflect.NativeMethodAccessorImpl#invoke0]-[-2]-[Thread-16] XXL-JOB, Hello KelvinLiu.
2021-11-22 09:52:48 [com.xxl.job.core.thread.JobThread#run]-[176]-[Thread-16]
----------- xxl-job job execute end(finish) -----------
----------- Result: handleCode=200, handleMsg = null
2021-11-22 09:52:48 [com.xxl.job.core.thread.TriggerCallbackThread#callbackLog]-[197]-[xxl-job, executor TriggerCallbackThread]
----------- xxl-job job callback finish.
[Load Log Finish]
XxlJobContext.getXxlJobContext()
是怎么获取的呢?
从源码来看,是直接从InheritableThreadLocal取出来的,那么必然有地方先进行了setXxlJobContext
// 可继承的threadlocal,要拿出来单独看一下。这个应该就是官网所说:在JobHandler中开启子线程时,子线程将会将会把日志打印在父线程即JobHandler的执行日志中,方便日志追踪。
private static InheritableThreadLocal<XxlJobContext> contextHolder = new InheritableThreadLocal<XxlJobContext>(); // support for child thread of job handler)
public static void setXxlJobContext(XxlJobContext xxlJobContext){
contextHolder.set(xxlJobContext);
}
public static XxlJobContext getXxlJobContext(){
return contextHolder.get();
}
哪里setXxlJobContext了?
又回到了熟悉的地方com.xxl.job.core.thread.JobThread#run
@Override
public void run() {
// init
try {
handler.init();
} catch (Throwable e) {
logger.error(e.getMessage(), e);
}
// execute
while(!toStop){
running = false;
idleTimes++;
TriggerParam triggerParam = null;
try {
// to check toStop signal, we need cycle, so wo cannot use queue.take(), instand of poll(timeout)
triggerParam = triggerQueue.poll(3L, TimeUnit.SECONDS);
if (triggerParam!=null) {
running = true;
idleTimes = 0;
triggerLogIdSet.remove(triggerParam.getLogId());
// log filename, like "logPath/yyyy-MM-dd/9999.log"
String logFileName = XxlJobFileAppender.makeLogFileName(new Date(triggerParam.getLogDateTime()), triggerParam.getLogId());
// 是在这里进行了XxlJobContext的初始化,就是线程创建之后就创建了。
XxlJobContext xxlJobContext = new XxlJobContext(
triggerParam.getJobId(),
triggerParam.getExecutorParams(),
logFileName,
triggerParam.getBroadcastIndex(),
triggerParam.getBroadcastTotal());
// init job context
XxlJobContext.setXxlJobContext(xxlJobContext);
// execute
XxlJobHelper.log("<br>----------- xxl-job job execute start -----------<br>----------- Param:" + xxlJobContext.getJobParam());
按照官方说法:
“执行器”在接收到任务执行请求后,执行任务,在执行结束之后会将执行结果回调通知“调度中心”:
那么, 就是要看一下executor调用callback的时候,都传递了哪些信息?
地址格式:{调度中心跟地址}/callback
Header:
XXL-JOB-ACCESS-TOKEN : {请求令牌}
请求数据格式如下,放置在 RequestBody 中,JSON格式:
[{
“logId”:1, // 本次调度日志ID
“logDateTim”:0, // 本次调度日志时间
“executeResult”:{
“code”: 200, // 200 表示任务执行正常,500表示失败
“msg”: null
}
}]
响应数据格式:
{
“code”: 200, // 200 表示正常、其他失败
“msg”: null // 错误提示消息
}
从上面看起来,log的实际内容是没有通过接口上送给admin的。为了验证想法,我们看一下“查看日志”的时候的逻辑。
发现是弹窗:http://localhost:8080/xxl-job-admin/joblog/logDetailPage?id=49,这就好办了,找接口:com.xxl.job.admin.controller.JobLogController#logDetailPage
@RequestMapping("/logDetailPage")
public String logDetailPage(int id, Model model){
// base check
ReturnT<String> logStatue = ReturnT.SUCCESS;
XxlJobLog jobLog = xxlJobLogDao.load(id);
if (jobLog == null) {
throw new RuntimeException(I18nUtil.getString("joblog_logid_unvalid"));
}
model.addAttribute("triggerCode", jobLog.getTriggerCode());
model.addAttribute("handleCode", jobLog.getHandleCode());
model.addAttribute("executorAddress", jobLog.getExecutorAddress());
model.addAttribute("triggerTime", jobLog.getTriggerTime().getTime());
model.addAttribute("logId", jobLog.getId());
return "joblog/joblog.detail";
}
实际又调用了:com.xxl.job.admin.controller.JobLogController#logDetailCat
@RequestMapping("/logDetailCat")
@ResponseBody
public ReturnT<LogResult> logDetailCat(String executorAddress, long triggerTime, long logId, int fromLineNum){
try {
ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(executorAddress);
ReturnT<LogResult> logResult = executorBiz.log(new LogParam(triggerTime, logId, fromLineNum));
// is end
if (logResult.getContent()!=null && logResult.getContent().getFromLineNum() > logResult.getContent().getToLineNum()) {
XxlJobLog jobLog = xxlJobLogDao.load(logId);
if (jobLog.getHandleCode() > 0) {
logResult.getContent().setEnd(true);
}
}
return logResult;
} catch (Exception e) {
logger.error(e.getMessage(), e);
return new ReturnT<LogResult>(ReturnT.FAIL_CODE, e.getMessage());
}
}
admin又通过接口调用了executor的log接口,去取日志数据。所以,通过这个流程分析来看,admin是通过每次接口调用,来实现了cat日志的功能。
@Override
public ReturnT<LogResult> log(LogParam logParam) {
return XxlJobRemotingUtil.postBody(addressUrl + "log", accessToken, timeout, logParam, LogResult.class);
}
还是要去EmbedServer里找/log
接口:
@Override
public ReturnT<LogResult> log(LogParam logParam) {
// log filename: logPath/yyyy-MM-dd/9999.log
String logFileName = XxlJobFileAppender.makeLogFileName(new Date(logParam.getLogDateTim()), logParam.getLogId());
LogResult logResult = XxlJobFileAppender.readLog(logFileName, logParam.getFromLineNum());
return new ReturnT<LogResult>(logResult);
}
1.1.5. 如何确保精准调度?
精确调度,就是在exactly那一秒执行了任务调度。admin调度的核心代码在com.xxl.job.admin.core.thread.JobScheduleHelper#start,这个里面创建了2个线程,scheduleThread和ringThread。
-
scheduleThread
使用悲观锁,以便在admin集群的情况下,保证只有一台机器抢占并执行一个跨度的任务。
select * from xxl_job_lock where lock_name = 'schedule_lock' for update
通过preRead的方式,拿到从当前时间开始,向后5s将要执行的jobInfo。preReadCount是快慢triggerPool之和*20。PRE_READ_MS=5000ms。
List<XxlJobInfo> scheduleList = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleJobQuery(nowTime + PRE_READ_MS, preReadCount); <select id="scheduleJobQuery" parameterType="java.util.HashMap" resultMap="XxlJobInfo"> SELECT <include refid="Base_Column_List" /> FROM xxl_job_info AS t WHERE t.trigger_status = 1 and t.trigger_next_time <![CDATA[ <= ]]> #{maxNextTime} ORDER BY id ASC LIMIT #{pagesize} </select>
继续往下执行,判断如果nowTime <= jobInfo.getTriggerNextTime(),也就是还没到要执行的那一秒的话,就
pushTimeRing(ringSecond, jobInfo.getId())
,将这个任务放到ringData中,代码如下:// 将任务id放到一个0-59为key的map中,key就是描述,值就是要执行的任务id列表。这里看代码变量命名可以认为是时间轮。 private void pushTimeRing(int ringSecond, int jobId){ // push async ring // ringData的结构是:Map<Integer, List<Integer>> ringData = new ConcurrentHashMap<>();这种结构,将要存储的key,是0-59s的数字,而value就是比如第10s这个key将要执行的List<jobId>。 List<Integer> ringItemData = ringData.get(ringSecond); if (ringItemData == null) { ringItemData = new ArrayList<Integer>(); ringData.put(ringSecond, ringItemData); } ringItemData.add(jobId); logger.debug(">>>>>>>>>>> xxl-job, schedule push time-ring : " + ringSecond + " = " + Arrays.asList(ringItemData) ); }
后面就是ringThread上场处理ringData里面的数据
-
ringThread
// ring thread。ringThread是按照系统当前时间,每秒钟进行一次跳动,查看当前的秒,在ringData中有没有要执行的List<jobId>,如果有的话,就取出来,调用JobTriggerPoolHelper.trigger,放入到triggerPool中,由triggerPool来触发执行动作,即调用`/run`方法。 ringThread = new Thread(new Runnable() { @Override public void run() { while (!ringThreadToStop) { // align second。ringThread是每秒钟执行一次,这里会sleep。 try { TimeUnit.MILLISECONDS.sleep(1000 - System.currentTimeMillis() % 1000); } catch (InterruptedException e) { if (!ringThreadToStop) { logger.error(e.getMessage(), e); } } try { // second data List<Integer> ringItemData = new ArrayList<>(); int nowSecond = Calendar.getInstance().get(Calendar.SECOND); // 避免处理耗时太长,跨过刻度,向前校验一个刻度; for (int i = 0; i < 2; i++) { // 查看当前的nowSecond在ringData中有没有数据,有的话就出去来,放到ringItemData中准备去执行。 List<Integer> tmpData = ringData.remove( (nowSecond+60-i)%60 ); if (tmpData != null) { ringItemData.addAll(tmpData); } } // ring trigger logger.debug(">>>>>>>>>>> xxl-job, time-ring beat : " + nowSecond + " = " + Arrays.asList(ringItemData) ); if (ringItemData.size() > 0) { // do trigger。如果ringItemData有数据,就会提交到tirggerPool中去。 for (int jobId: ringItemData) { // do trigger JobTriggerPoolHelper.trigger(jobId, TriggerTypeEnum.CRON, -1, null, null, null); } // clear ringItemData.clear(); } } catch (Exception e) { if (!ringThreadToStop) { logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread error:{}", e); } } } logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread stop"); } });
调度的过程,其实是scheduleThread和ringThread相互配合的过程。scheduleThread会提前几秒将要执行的jobInfo读出来,并放入到
Map<Integer, List<Integer>> ringData = new ConcurrentHashMap<>()
中去。举个例子,比如现在是2021年11月23日13:50:20s,我们读取到一个
*/22 * * * * ?
的任务,它预期是第22s要执行一次的,那么怎么执行呢?就是第20s的时候,通过preRead将此数据读出来,发现其将要执行,然后放入到ringData中,放的key是22,value是这个任务的jobId,比如是7。ringThread一直在心跳检查ringData中是否有要执行的任务,如下:
20,ringData.remove(20)取不到数据 21,ringData.remove(21)取不到数据 22,ringData.remove(22)有数据!取出来的是jobId是7,那么立刻放到triggerPool中去执行。 23,ringData.remove(23)取不到数据 24,ringData.remove(24)取不到数据 ....
这样,就能恰好确保这个任务在第22s执行了。
1.2. 执行器(executor)
1.2.0.1. netty server
可以参考网上的简单介绍netty server博客:https://www.jianshu.com/p/2842fe268a9f
进入整体,分析executor如何监听和执行/run
调度
由于admin要调用executor的/run
方法来进行任务调度,所以executor要启动一个服务,用到了netty。代码在com.xxl.job.core.server.EmbedServer#start,这里启动一个线程,线程监听了executor配置的端口,默认9999。
启动server的过程如下:
public class XxlJobSpringExecutor extends XxlJobExecutor implements ApplicationContextAware, SmartInitializingSingleton, DisposableBean {
private static final Logger logger = LoggerFactory.getLogger(XxlJobSpringExecutor.class);
// start,这里是executor启动的入口
@Override
public void afterSingletonsInstantiated() {
// init JobHandler Repository
/*initJobHandlerRepository(applicationContext);*/
// init JobHandler Repository (for method)
initJobHandlerMethodRepository(applicationContext);
// refresh GlueFactory
GlueFactory.refreshInstance(1);
// super start,这个start有具体启动server的逻辑入口
try {
super.start();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public void start() throws Exception {
// init logpath
XxlJobFileAppender.initLogPath(logPath);
// init invoker, admin-client
initAdminBizList(adminAddresses, accessToken);
// init JobLogFileCleanThread
JobLogFileCleanThread.getInstance().start(logRetentionDays);
// init TriggerCallbackThread
TriggerCallbackThread.getInstance().start();
// init executor-server。这里是netty server的初始化
initEmbedServer(address, ip, port, appname, accessToken);
}
private void initEmbedServer(String address, String ip, int port, String appname, String accessToken) throws Exception {
// fill ip port
port = port>0?port: NetUtil.findAvailablePort(9999);
ip = (ip!=null&&ip.trim().length()>0)?ip: IpUtil.getIp();
// generate address
if (address==null || address.trim().length()==0) {
String ip_port_address = IpUtil.getIpPort(ip, port); // registry-address:default use address to registry , otherwise use ip:port if address is null
address = "http://{ip_port}/".replace("{ip_port}", ip_port_address);
}
// accessToken
if (accessToken==null || accessToken.trim().length()==0) {
logger.warn(">>>>>>>>>>> xxl-job accessToken is empty. To ensure system security, please set the accessToken.");
}
// start
embedServer = new EmbedServer();
// 根据配置的地址,端口信息启动服务
embedServer.start(address, port, appname, accessToken);
}
public void start(final String address, final int port, final String appname, final String accessToken) {
executorBiz = new ExecutorBizImpl();
thread = new Thread(new Runnable() {
@Override
public void run() {
// param
EventLoopGroup bossGroup = new NioEventLoopGroup();
EventLoopGroup workerGroup = new NioEventLoopGroup();
ThreadPoolExecutor bizThreadPool = new ThreadPoolExecutor(
0,
200,
60L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(2000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-rpc, EmbedServer bizThreadPool-" + r.hashCode());
}
},
new RejectedExecutionHandler() {
@Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
throw new RuntimeException("xxl-job, EmbedServer bizThreadPool is EXHAUSTED!");
}
});
try {
// start server
ServerBootstrap bootstrap = new ServerBootstrap();
bootstrap.group(bossGroup, workerGroup)
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel channel) throws Exception {
channel.pipeline()
.addLast(new IdleStateHandler(0, 0, 30 * 3, TimeUnit.SECONDS)) // beat 3N, close if idle
.addLast(new HttpServerCodec())
.addLast(new HttpObjectAggregator(5 * 1024 * 1024)) // merge request & reponse to FULL
// EmbedHttpServerHandler是处理类似/run请求的地方,比较重要
.addLast(new EmbedHttpServerHandler(executorBiz, accessToken, bizThreadPool));
}
})
.childOption(ChannelOption.SO_KEEPALIVE, true);
// bind
ChannelFuture future = bootstrap.bind(port).sync();
logger.info(">>>>>>>>>>> xxl-job remoting server start success, nettype = {}, port = {}", EmbedServer.class, port);
// start registry
startRegistry(appname, address);
// wait util stop
future.channel().closeFuture().sync();
} catch (InterruptedException e) {
if (e instanceof InterruptedException) {
logger.info(">>>>>>>>>>> xxl-job remoting server stop.");
} else {
logger.error(">>>>>>>>>>> xxl-job remoting server error.", e);
}
} finally {
// stop
try {
workerGroup.shutdownGracefully();
bossGroup.shutdownGracefully();
} catch (Exception e) {
logger.error(e.getMessage(), e);
}
}
}
});
thread.setDaemon(true); // daemon, service jvm, user thread leave >>> daemon leave >>> jvm leave
thread.start();
}
我们看看EmbedHttpServerHandler代码,里面就包含了处理对应接口请求的处理逻辑:
public static class EmbedHttpServerHandler extends SimpleChannelInboundHandler<FullHttpRequest> {
private static final Logger logger = LoggerFactory.getLogger(EmbedHttpServerHandler.class);
private ExecutorBiz executorBiz;
private String accessToken;
private ThreadPoolExecutor bizThreadPool;
public EmbedHttpServerHandler(ExecutorBiz executorBiz, String accessToken, ThreadPoolExecutor bizThreadPool) {
this.executorBiz = executorBiz;
this.accessToken = accessToken;
this.bizThreadPool = bizThreadPool;
}
// 当netty server监听到请求,会进入此方法。
@Override
protected void channelRead0(final ChannelHandlerContext ctx, FullHttpRequest msg) throws Exception {
// request parse
//final byte[] requestBytes = ByteBufUtil.getBytes(msg.content()); // byteBuf.toString(io.netty.util.CharsetUtil.UTF_8);
String requestData = msg.content().toString(CharsetUtil.UTF_8);
String uri = msg.uri();// msg.getUri();netty4.0.27.Final需要改造的代码
HttpMethod httpMethod = msg.method();//msg.getMethod();netty4.0.27.Final需要改造的代码
boolean keepAlive = HttpUtil.isKeepAlive(msg); // true;netty4.0.27.Final需要改造的代码
String accessTokenReq = msg.headers().get(XxlJobRemotingUtil.XXL_JOB_ACCESS_TOKEN);
// invoke
bizThreadPool.execute(new Runnable() {
@Override
public void run() {
// do invoke
Object responseObj = process(httpMethod, uri, requestData, accessTokenReq);
// to json
String responseJson = GsonTool.toJson(responseObj);
// write response
writeResponse(ctx, keepAlive, responseJson);
}
});
}
private Object process(HttpMethod httpMethod, String uri, String requestData, String accessTokenReq) {
// valid
if (HttpMethod.POST != httpMethod) {
return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, HttpMethod not support.");
}
if (uri==null || uri.trim().length()==0) {
return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, uri-mapping empty.");
}
if (accessToken!=null
&& accessToken.trim().length()>0
&& !accessToken.equals(accessTokenReq)) {
return new ReturnT<String>(ReturnT.FAIL_CODE, "The access token is wrong.");
}
// services mapping
try {
if ("/beat".equals(uri)) {
return executorBiz.beat();
} else if ("/idleBeat".equals(uri)) {
IdleBeatParam idleBeatParam = GsonTool.fromJson(requestData, IdleBeatParam.class);
return executorBiz.idleBeat(idleBeatParam);
} else if ("/run".equals(uri)) {
// executor处理调度请求的核心入口
TriggerParam triggerParam = GsonTool.fromJson(requestData, TriggerParam.class);
return executorBiz.run(triggerParam);
} else if ("/kill".equals(uri)) {
KillParam killParam = GsonTool.fromJson(requestData, KillParam.class);
return executorBiz.kill(killParam);
} else if ("/log".equals(uri)) {
LogParam logParam = GsonTool.fromJson(requestData, LogParam.class);
return executorBiz.log(logParam);
} else {
return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, uri-mapping("+ uri +") not found.");
}
} catch (Exception e) {
logger.error(e.getMessage(), e);
return new ReturnT<String>(ReturnT.FAIL_CODE, "request error:" + ThrowableUtil.toString(e));
}
}
这个过程还涉及到其他的一些逻辑,比如注册到admin等。
1.2.0.2. 注解@XxlJob注册分析
@XxlJob注解的核心作用,是在启动executor的时候,将标记了此注解的bean和方法识别出来,保存到一个容器对象里,以便当admin调度此handler的时候,能够找到对应的bean和方法。
官方手册对于@XxlJob的说明如下图:
分析这个注解,还是从executor的初始化开始。
public class XxlJobSpringExecutor extends XxlJobExecutor implements ApplicationContextAware, SmartInitializingSingleton, DisposableBean {
private static final Logger logger = LoggerFactory.getLogger(XxlJobSpringExecutor.class);
// start
@Override
public void afterSingletonsInstantiated() {
// init JobHandler Repository
/*initJobHandlerRepository(applicationContext);*/
// init JobHandler Repository (for method)这里是注解解析的入口,在初始化executor的时候解析。
initJobHandlerMethodRepository(applicationContext);
// refresh GlueFactory
GlueFactory.refreshInstance(1);
// super start
try {
super.start();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
继续看initJobHandlerMethodRepository方法:
private void initJobHandlerMethodRepository(ApplicationContext applicationContext) {
if (applicationContext == null) {
return;
}
// init job handler from method。循环遍历spring容器中的bean信息。
String[] beanDefinitionNames = applicationContext.getBeanNamesForType(Object.class, false, true);
for (String beanDefinitionName : beanDefinitionNames) {
Object bean = applicationContext.getBean(beanDefinitionName);
Map<Method, XxlJob> annotatedMethods = null; // referred to :org.springframework.context.event.EventListenerMethodProcessor.processBean
try {
// 这个方法看起来不错,找到容器管理的bean里,有XxlJob注解的方法。
annotatedMethods = MethodIntrospector.selectMethods(bean.getClass(),
new MethodIntrospector.MetadataLookup<XxlJob>() {
@Override
public XxlJob inspect(Method method) {
return AnnotatedElementUtils.findMergedAnnotation(method, XxlJob.class);
}
});
} catch (Throwable ex) {
logger.error("xxl-job method-jobhandler resolve error for bean[" + beanDefinitionName + "].", ex);
}
if (annotatedMethods==null || annotatedMethods.isEmpty()) {
continue;
}
for (Map.Entry<Method, XxlJob> methodXxlJobEntry : annotatedMethods.entrySet()) {
// 通过debug看一下annotatedMethods数据是什么样,下图。
Method executeMethod = methodXxlJobEntry.getKey();
XxlJob xxlJob = methodXxlJobEntry.getValue();
if (xxlJob == null) {
continue;
}
String name = xxlJob.value();
if (name.trim().length() == 0) {
throw new RuntimeException("xxl-job method-jobhandler name invalid, for[" + bean.getClass() + "#" + executeMethod.getName() + "] .");
}
if (loadJobHandler(name) != null) {
throw new RuntimeException("xxl-job jobhandler[" + name + "] naming conflicts.");
}
// execute method
/*if (!(method.getParameterTypes().length == 1 && method.getParameterTypes()[0].isAssignableFrom(String.class))) {
throw new RuntimeException("xxl-job method-jobhandler param-classtype invalid, for[" + bean.getClass() + "#" + method.getName() + "] , " +
"The correct method format like \" public ReturnT<String> execute(String param) \" .");
}
if (!method.getReturnType().isAssignableFrom(ReturnT.class)) {
throw new RuntimeException("xxl-job method-jobhandler return-classtype invalid, for[" + bean.getClass() + "#" + method.getName() + "] , " +
"The correct method format like \" public ReturnT<String> execute(String param) \" .");
}*/
executeMethod.setAccessible(true);
// init and destory。找到jobHandler的初始化和销毁方法
Method initMethod = null;
Method destroyMethod = null;
if (xxlJob.init().trim().length() > 0) {
try {
initMethod = bean.getClass().getDeclaredMethod(xxlJob.init());
initMethod.setAccessible(true);
} catch (NoSuchMethodException e) {
throw new RuntimeException("xxl-job method-jobhandler initMethod invalid, for[" + bean.getClass() + "#" + executeMethod.getName() + "] .");
}
}
if (xxlJob.destroy().trim().length() > 0) {
try {
destroyMethod = bean.getClass().getDeclaredMethod(xxlJob.destroy());
destroyMethod.setAccessible(true);
} catch (NoSuchMethodException e) {
throw new RuntimeException("xxl-job method-jobhandler destroyMethod invalid, for[" + bean.getClass() + "#" + executeMethod.getName() + "] .");
}
}
// registry jobhandler。name,就是@XxlJob注解上的value值(name = xxlJob.value());
// 同时构造了一个MethodJobHandler对象,bean = applicationContext.getBean(beanDefinitionName);就是当前正在处理的标记了@XxlJob注解的bean。executeMethod就是被标记的方法,比如是public void com.xxl.job.executor.service.jobhandler.SampleXxlJob.demoJobHandler() throws java.lang.Exception。initMethod和destroyMethod都好理解。
// registJobHandler具体这个方法内容,比较简单,就是一个ConcurrentHashMap对象。
registJobHandler(name, new MethodJobHandler(bean, executeMethod, initMethod, destroyMethod));
}
}
}
registJobHandler具体这个方法内容,比较简单,就是一个ConcurrentHashMap对象。后面executor被调度的时候,就是通过这个容器找到了bean和方法:
private static ConcurrentMap<String, IJobHandler> jobHandlerRepository = new ConcurrentHashMap<String, IJobHandler>();
public static IJobHandler loadJobHandler(String name){
return jobHandlerRepository.get(name);
}
public static IJobHandler registJobHandler(String name, IJobHandler jobHandler){
logger.info(">>>>>>>>>>> xxl-job register jobhandler success, name:{}, jobHandler:{}", name, jobHandler);
return jobHandlerRepository.put(name, jobHandler);
}
1.2.0.3. 调度请求queue
先下一段结论:
执行器executor,会持有ConcurrentMap<Integer, JobThread> jobThreadRepository对象,这里面的key是jobId,value是JobThread。也就是说,在这个容器里面,对于同一个jobId,只会有一条记录,我们称之为 jobThreadA。
如果我们的阻塞策略是串行,那么后面达到的对于同一个jobId的run请求,会将triggerParam对象放入到jobThreadA的queue中,这个queue是LinkedBlockingQueue triggerQueue。然后jobThreadA会while循环从triggerQueue.poll对象出来执行。
如果阻塞策略是丢弃后续的运行,很简单,直接不进行后续处理即可,也就是不去入triggerQueue。
如果阻塞策略是覆盖cover之前的任务,那么,会通过“jobThreadRepository.put(jobId, newJobThread)”,此语句会返回这个容器里之前的jobThreadA,然后调用
jobThreadA.toStop(removeOldReason); jobThreadA.interrupt();
toStop主要是为jobThread里的变量设置值:this.toStop = true;而这个变量,是用在while循环上的。如果不设置此值,那么这个线程将会继续运行下去。
代码的核心入口,在embedServer里
通过上图,可以看到核心是怎么执行executorBiz.run(triggerParam)
请求,为了便于理解,我们先找到一个triggerParam参数样例
{
"jobId": 1,
"executorHandler": "demoJobHandler",
"executorParams": "",
"executorBlockStrategy": "SERIAL_EXECUTION",
"executorTimeout": 0,
"logId": 11,
"logDateTime": 1637041633591,
"glueType": "EXEC-LLT",
"glueSource": "",
"glueUpdatetime": 1541254891000,
"broadcastIndex": 0,
"broadcastTotal": 1
}
public ReturnT<String> run(TriggerParam triggerParam) {
// load old:jobHandler + jobThread
JobThread jobThread = XxlJobExecutor.loadJobThread(triggerParam.getJobId());
IJobHandler jobHandler = jobThread!=null?jobThread.getHandler():null;
String removeOldReason = null;
// valid:jobHandler + jobThread
GlueTypeEnum glueTypeEnum = GlueTypeEnum.match(triggerParam.getGlueType());
if (GlueTypeEnum.BEAN == glueTypeEnum) {
// new jobhandler
IJobHandler newJobHandler = XxlJobExecutor.loadJobHandler(triggerParam.getExecutorHandler());
// valid old jobThread
if (jobThread!=null && jobHandler != newJobHandler) {
// change handler, need kill old thread
removeOldReason = "change jobhandler or glue type, and terminate the old job thread.";
jobThread = null;
jobHandler = null;
}
// valid handler
if (jobHandler == null) {
jobHandler = newJobHandler;
if (jobHandler == null) {
return new ReturnT<String>(ReturnT.FAIL_CODE, "job handler [" + triggerParam.getExecutorHandler() + "] not found.");
}
}
} else if (GlueTypeEnum.GLUE_GROOVY == glueTypeEnum) {
// valid old jobThread
if (jobThread != null &&
!(jobThread.getHandler() instanceof GlueJobHandler
&& ((GlueJobHandler) jobThread.getHandler()).getGlueUpdatetime()==triggerParam.getGlueUpdatetime() )) {
// change handler or gluesource updated, need kill old thread
removeOldReason = "change job source or glue type, and terminate the old job thread.";
jobThread = null;
jobHandler = null;
}
// valid handler
if (jobHandler == null) {
try {
IJobHandler originJobHandler = GlueFactory.getInstance().loadNewInstance(triggerParam.getGlueSource());
jobHandler = new GlueJobHandler(originJobHandler, triggerParam.getGlueUpdatetime());
} catch (Exception e) {
logger.error(e.getMessage(), e);
return new ReturnT<String>(ReturnT.FAIL_CODE, e.getMessage());
}
}
} else if (glueTypeEnum!=null && glueTypeEnum.isScript()) {
// valid old jobThread
if (jobThread != null &&
!(jobThread.getHandler() instanceof ScriptJobHandler
&& ((ScriptJobHandler) jobThread.getHandler()).getGlueUpdatetime()==triggerParam.getGlueUpdatetime() )) {
// change script or gluesource updated, need kill old thread
removeOldReason = "change job source or glue type, and terminate the old job thread.";
jobThread = null;
jobHandler = null;
}
// valid handler
if (jobHandler == null) {
jobHandler = new ScriptJobHandler(triggerParam.getJobId(), triggerParam.getGlueUpdatetime(), triggerParam.getGlueSource(), GlueTypeEnum.match(triggerParam.getGlueType()));
}
} else {
return new ReturnT<String>(ReturnT.FAIL_CODE, "glueType[" + triggerParam.getGlueType() + "] is not valid.");
}
// executor block strategy
if (jobThread != null) {
ExecutorBlockStrategyEnum blockStrategy = ExecutorBlockStrategyEnum.match(triggerParam.getExecutorBlockStrategy(), null);
if (ExecutorBlockStrategyEnum.DISCARD_LATER == blockStrategy) {
// discard when running
if (jobThread.isRunningOrHasQueue()) {
return new ReturnT<String>(ReturnT.FAIL_CODE, "block strategy effect:"+ExecutorBlockStrategyEnum.DISCARD_LATER.getTitle());
}
} else if (ExecutorBlockStrategyEnum.COVER_EARLY == blockStrategy) {
// kill running jobThread
if (jobThread.isRunningOrHasQueue()) {
removeOldReason = "block strategy effect:" + ExecutorBlockStrategyEnum.COVER_EARLY.getTitle();
jobThread = null;
}
} else {
// just queue trigger
}
}
// replace thread (new or exists invalid)这里开始就是找到对应的jobThread(如果没有就新建一个),将这个任务的信息,推送到jobThread里面的triggerQueue中。由于jobThread创建后就会一直运行,会不停的拉取triggerQueue中的数据来执行,看下一节“从queue中取出并执行”逻辑。
if (jobThread == null) {
jobThread = XxlJobExecutor.registJobThread(triggerParam.getJobId(), jobHandler, removeOldReason);
}
// push data to queue。
ReturnT<String> pushResult = jobThread.pushTriggerQueue(triggerParam);
return pushResult;
}
1.2.0.4. 从queue中取出并执行
这个主要就是看jobThread的逻辑,创建jobThread之后,就会启动这个线程:
// ---------------------- job thread repository ----------------------
private static ConcurrentMap<Integer, JobThread> jobThreadRepository = new ConcurrentHashMap<Integer, JobThread>();
public static JobThread registJobThread(int jobId, IJobHandler handler, String removeOldReason){
JobThread newJobThread = new JobThread(jobId, handler);
// 创建jobThread之后,就会直接启动。
newJobThread.start();
logger.info(">>>>>>>>>>> xxl-job regist JobThread success, jobId:{}, handler:{}", new Object[]{jobId, handler});
JobThread oldJobThread = jobThreadRepository.put(jobId, newJobThread); // putIfAbsent | oh my god, map's put method return the old value!!!
if (oldJobThread != null) {
oldJobThread.toStop(removeOldReason);
oldJobThread.interrupt();
}
return newJobThread;
}
继续看jobThread启动之后的逻辑:
public void run() {
// init
try {
handler.init();
} catch (Throwable e) {
logger.error(e.getMessage(), e);
}
// execute。线程启动之后,就一直循环,并不会执行完就结束,一直从triggerQueue取数据
while(!toStop){
running = false;
idleTimes++;
TriggerParam triggerParam = null;
try {
// to check toStop signal, we need cycle, so wo cannot use queue.take(), instand of poll(timeout)
// 一直从triggerQueue取数据。等待3s,如果取出来triggerParam==null,就while继续再取。
triggerParam = triggerQueue.poll(3L, TimeUnit.SECONDS);
if (triggerParam!=null) {
running = true;
// idleTimes计数,如果一直没从triggerParam取到数据,则++,当达到30次,就结束此线程。
idleTimes = 0;
triggerLogIdSet.remove(triggerParam.getLogId());
// log filename, like "logPath/yyyy-MM-dd/9999.log"
String logFileName = XxlJobFileAppender.makeLogFileName(new Date(triggerParam.getLogDateTime()), triggerParam.getLogId());
XxlJobContext xxlJobContext = new XxlJobContext(
triggerParam.getJobId(),
triggerParam.getExecutorParams(),
logFileName,
triggerParam.getBroadcastIndex(),
triggerParam.getBroadcastTotal());
// init job context
XxlJobContext.setXxlJobContext(xxlJobContext);
// execute
XxlJobHelper.log("<br>----------- xxl-job job execute start -----------<br>----------- Param:" + xxlJobContext.getJobParam());
if (triggerParam.getExecutorTimeout() > 0) {
// limit timeout
Thread futureThread = null;
try {
FutureTask<Boolean> futureTask = new FutureTask<Boolean>(new Callable<Boolean>() {
@Override
public Boolean call() throws Exception {
// init job context
XxlJobContext.setXxlJobContext(xxlJobContext);
handler.execute();
return true;
}
});
futureThread = new Thread(futureTask);
futureThread.start();
// 如果是带有超时时间的任务,那么就放用FutureTask来执行。,否则就直接执行。
Boolean tempResult = futureTask.get(triggerParam.getExecutorTimeout(), TimeUnit.SECONDS);
} catch (TimeoutException e) {
XxlJobHelper.log("<br>----------- xxl-job job execute timeout");
XxlJobHelper.log(e);
// handle result
XxlJobHelper.handleTimeout("job execute timeout ");
} finally {
futureThread.interrupt();
}
} else {
// just execute
handler.execute();
}
// valid execute handle data
if (XxlJobContext.getXxlJobContext().getHandleCode() <= 0) {
XxlJobHelper.handleFail("job handle result lost.");
} else {
String tempHandleMsg = XxlJobContext.getXxlJobContext().getHandleMsg();
tempHandleMsg = (tempHandleMsg!=null&&tempHandleMsg.length()>50000)
?tempHandleMsg.substring(0, 50000).concat("...")
:tempHandleMsg;
XxlJobContext.getXxlJobContext().setHandleMsg(tempHandleMsg);
}
XxlJobHelper.log("<br>----------- xxl-job job execute end(finish) -----------<br>----------- Result: handleCode="
+ XxlJobContext.getXxlJobContext().getHandleCode()
+ ", handleMsg = "
+ XxlJobContext.getXxlJobContext().getHandleMsg()
);
} else {
if (idleTimes > 30) {
if(triggerQueue.size() == 0) { // avoid concurrent trigger causes jobId-lost
// 线程空闲计数次数达到30,移出并停止此任务。
XxlJobExecutor.removeJobThread(jobId, "excutor idel times over limit.");
}
}
}
} catch (Throwable e) {
if (toStop) {
XxlJobHelper.log("<br>----------- JobThread toStop, stopReason:" + stopReason);
}
// handle result
StringWriter stringWriter = new StringWriter();
e.printStackTrace(new PrintWriter(stringWriter));
String errorMsg = stringWriter.toString();
XxlJobHelper.handleFail(errorMsg);
XxlJobHelper.log("<br>----------- JobThread Exception:" + errorMsg + "<br>----------- xxl-job job execute end(error) -----------");
} finally {
if(triggerParam != null) {
// callback handler info
if (!toStop) {
// commonm
TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
triggerParam.getLogId(),
triggerParam.getLogDateTime(),
XxlJobContext.getXxlJobContext().getHandleCode(),
XxlJobContext.getXxlJobContext().getHandleMsg() )
);
} else {
// is killed
TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
triggerParam.getLogId(),
triggerParam.getLogDateTime(),
XxlJobContext.HANDLE_COCE_FAIL,
stopReason + " [job running, killed]" )
);
}
}
}
}
// callback trigger request in queue。当jobThread被停止的时候triggerQueue还有数据,回调结果组装。
while(triggerQueue !=null && triggerQueue.size()>0){
TriggerParam triggerParam = triggerQueue.poll();
if (triggerParam!=null) {
// is killed
TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
triggerParam.getLogId(),
triggerParam.getLogDateTime(),
XxlJobContext.HANDLE_COCE_FAIL,
stopReason + " [job not executed, in the job queue, killed.]")
);
}
}
// destroy
try {
handler.destroy();
} catch (Throwable e) {
logger.error(e.getMessage(), e);
}
logger.info(">>>>>>>>>>> xxl-job JobThread stoped, hashCode:{}", Thread.currentThread());
}
1.2.0.5. 日志采集
executor提供了日志查询接口。
1.2.0.6. 调度结果queue
根据整体的架构规划,executor的执行结果,是通过queue回调给admin的
根据官网的描述:
说明:执行器执行完任务后,回调任务结果时使用
------
地址格式:{调度中心跟地址}/callback
Header:
XXL-JOB-ACCESS-TOKEN : {请求令牌}
请求数据格式如下,放置在 RequestBody 中,JSON格式:
[{
"logId":1, // 本次调度日志ID
"logDateTim":0, // 本次调度日志时间
"executeResult":{
"code": 200, // 200 表示任务执行正常,500表示失败
"msg": null
}
}]
响应数据格式:
{
"code": 200, // 200 表示正常、其他失败
"msg": null // 错误提示消息
}
还是在JobThread来找对应的回调入口:
} finally {
if(triggerParam != null) {
// callback handler info
if (!toStop) {
// commonm。原理还是提交回调信息到com.xxl.job.core.thread.TriggerCallbackThread#callBackQueue,然后通过线程不断回调给admin
TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
triggerParam.getLogId(),
triggerParam.getLogDateTime(),
XxlJobContext.getXxlJobContext().getHandleCode(),
XxlJobContext.getXxlJobContext().getHandleMsg() )
);
} else {
// is killed
TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
triggerParam.getLogId(),
triggerParam.getLogDateTime(),
XxlJobContext.HANDLE_COCE_FAIL,
stopReason + " [job running, killed]" )
);
}
}
}
1.2.0.7. glue
已经在admin部分的**“glue”**章节分析过了。
1.2.0.8. 停止任务
任务管理,选择任务,操作点击停止,调用的admin接口是:
http://localhost:8080/xxl-job-admin/jobinfo/stop
id=7
admin负责处理stop接口的代码是:
@RequestMapping("/stop")
@ResponseBody
public ReturnT<String> pause(int id) {
return xxlJobService.stop(id);
}
@Override
public ReturnT<String> stop(int id) {
XxlJobInfo xxlJobInfo = xxlJobInfoDao.loadById(id);
// 这个停止的逻辑,仅仅是将jobInfo对应的状态置为0停止,并没有发请求到executor,那是executor是如何停止对应的jobThread的呢?
// 其实这个之前分析过jobThread代码,其中有个idleTimes>30的时候,会XxlJobExecutor.removeJobThread(jobId, "excutor idel times over limit.");这里就是停止的逻辑。
xxlJobInfo.setTriggerStatus(0);
xxlJobInfo.setTriggerLastTime(0);
xxlJobInfo.setTriggerNextTime(0);
xxlJobInfo.setUpdateTime(new Date());
xxlJobInfoDao.update(xxlJobInfo);
return ReturnT.SUCCESS;
}
继续看看XxlJobExecutor.removeJobThread
具体的内容:
public static JobThread removeJobThread(int jobId, String removeOldReason){
JobThread oldJobThread = jobThreadRepository.remove(jobId);
if (oldJobThread != null) {
// toStop就是设置jobThread.toStop=true,那么这个jobThread的while循环将会跳出,线程就结束运行了。
// 这么看来,admin点击任务停止的时候,并不会告知executor,而是executor的空闲计数到达阈值之后,自动停止了线程。
oldJobThread.toStop(removeOldReason);
oldJobThread.interrupt();
return oldJobThread;
}
return null;
}
public void toStop(String stopReason) {
/**
* Thread.interrupt只支持终止线程的阻塞状态(wait、join、sleep),
* 在阻塞出抛出InterruptedException异常,但是并不会终止运行的线程本身;
* 所以需要注意,此处彻底销毁本线程,需要通过共享变量方式;
*/
this.toStop = true;
this.stopReason = stopReason;
}
2. 运用到的一些技术点
2.1. spring相关
- InitializingBean
- 找到带有特定注解的bean方法
2.2. 其他
2.2.1. select for update
https://www.cnblogs.com/wxgblogs/p/6849064.html
当执行select status from t_goods where id=1 for update;后。在另外的事务中如果再次执行select status from t_goods where id=1 for update;则第二个事务会一直等待第一个事务的提交,此时第二个查询处于阻塞的状态,但是如果是在第二个事务中执行select status from t_goods where id=1;则能正常查询出数据,不会受第一个事务的影响。
2.2.2. 线程的interrupt
在代码中有如下注释:
Thread.interrupt只支持终止线程的阻塞状态(wait、join、sleep),在阻塞出抛出InterruptedException异常,但是并不会终止运行的线程本身;以需要注意,此处彻底销毁本线程,需要通过共享变量方式;
参考文章:https://www.cnblogs.com/duanxz/p/4457385.html
2.2.3. netty server
2.2.4. InheritableThreadLocal
如果有更多的问题,可以加我微信探讨