绝对干货!!xxl-job任务调度源码深度解析

小强,还玩啥黑神话悟空呢,赶紧来学,绝对干货,xxl-job最核心的就是任务调度系统,里面有很多设计的精髓,建议一定要好好看下!!!
滴滴滴:友情提示:
为了让大家可以更好的理解任务调度核心源码,建议看下之前的文章:

xxl-job到底提供哪些能力?
前菜来啦!xxl-job数据表和关键字段分析
开整,开整!!xxl-job调度器源码启动流程搞起!
继续,继续!!xxl-job执行器源码启动流程!

在这里插入图片描述

1. 如何防止任务被重复调度?

xxj-job的admin每次都会调用 select * from xxl_job_lock where lock_name = ‘schedule_lock’ for update 进行加锁,加锁后才会去查之后5秒的数据,根据下次触发时间会走不同的调度规则。

在这里插入图片描述

所以假设有多个admin实例时,只有一个实例能获取锁,因此就可以避免重复调度的问题,但是这种通过for update加锁,会有以下两个问题:

1. 在锁的粒度太大,在同一时刻只有一个admin区遍历5s内待执行的任务,一旦有任务太多时,会导致整个调度的时间超过5秒,这样在进行下次调度时,会有大量的任务触发过期调用。
2. 由于for-update是加锁,假如网络或宕机导致的事务未提交,会导致后面的任务调用端拿不到对应的锁,所有任务调度会被阻塞。

while (!scheduleThreadToStop) {

    // Scan Job
    long start = System.currentTimeMillis();

    Connection conn = null;
    Boolean connAutoCommit = null;
    PreparedStatement preparedStatement = null;

    boolean preReadSuc = true;
    try {

        conn = XxlJobAdminConfig.getAdminConfig().getDataSource().getConnection();
        connAutoCommit = conn.getAutoCommit();
        conn.setAutoCommit(false);
        //增加数据库全局锁
        preparedStatement = conn.prepareStatement(  "select * from xxl_job_lock where lock_name = 'schedule_lock' for update" );
        preparedStatement.execute();

        // tx start

        // 1、pre read
        long nowTime = System.currentTimeMillis();
        //查询出接下来5s要去执行的任务
        List<XxlJobInfo> scheduleList = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleJobQuery(nowTime + PRE_READ_MS, preReadCount);
        if (scheduleList!=null && scheduleList.size()>0) {
            // 2、push time-ring
            for (XxlJobInfo jobInfo: scheduleList) {

                // 当触发器过期时间超过5s
                if (nowTime > jobInfo.getTriggerNextTime() + PRE_READ_MS) {
                    // 2.1、trigger-expire > 5s:pass && make next-trigger-time
                    logger.warn(">>>>>>>>>>> xxl-job, schedule misfire, jobId = " + jobInfo.getId());

                    // 1、misfire match
                    // 查看MisfireStrategyEnum的策略,如果是FIRE_ONCE_NOW表示再执行一次
                    MisfireStrategyEnum misfireStrategyEnum = MisfireStrategyEnum.match(jobInfo.getMisfireStrategy(), MisfireStrategyEnum.DO_NOTHING);
                    if (MisfireStrategyEnum.FIRE_ONCE_NOW == misfireStrategyEnum) {
                        // FIRE_ONCE_NOW 》 trigger
                        JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.MISFIRE, -1, null, null, null);
                        logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );
                    }

                    // 2、fresh next
                    // 刷新下一次触发时间
                    refreshNextValidTime(jobInfo, new Date());

                } else if (nowTime > jobInfo.getTriggerNextTime()) {
                    // 2.2、trigger-expire < 5s:direct-trigger && make next-trigger-time

                    // 1、trigger
                    JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.CRON, -1, null, null, null);
                    logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );

                    // 2、fresh next
                    // 刷新下次执行时间
                    refreshNextValidTime(jobInfo, new Date());

                    // next-trigger-time in 5s, pre-read again
                            // 下一次触发在5s之内,推到时间轮
                            if (jobInfo.getTriggerStatus()==1 && nowTime + PRE_READ_MS > jobInfo.getTriggerNextTime()) {

                                // 1、make ring second
                                int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);

                                // 2、push time ring
                                pushTimeRing(ringSecond, jobInfo.getId());

                                // 3、fresh next
                                refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));

                            }

                        } else {
                            // 2.3、trigger-pre-read:time-ring trigger && make next-trigger-time
                            // 下次执行时间
                            // 1、make ring second
                            int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);

                            // 2、push time ring
                            // 推动到时间轮
                            pushTimeRing(ringSecond, jobInfo.getId());

                            // 3、fresh next
                            refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));

                        }

                    }

                    // 3、update trigger info
                    for (XxlJobInfo jobInfo: scheduleList) {
                        XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleUpdate(jobInfo);
                    }

                } else {
                    preReadSuc = false;
                }

                // tx stop


            } catch (Exception e) {
                if (!scheduleThreadToStop) {
                    logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread error:{}", e);
                }
            } finally {

                // commit
                if (conn != null) {
                    try {
                        conn.commit();
                    } catch (SQLException e) {
                        if (!scheduleThreadToStop) {
                            logger.error(e.getMessage(), e);
                        }
                    }
                    try {
                        conn.setAutoCommit(connAutoCommit);
                    } catch (SQLException e) {
                        if (!scheduleThreadToStop) {
                            logger.error(e.getMessage(), e);
                        }
                    }
                    try {
                        conn.close();
                    } catch (SQLException e) {
                        if (!scheduleThreadToStop) {
                            logger.error(e.getMessage(), e);
                        }
                    }
                }

                // close PreparedStatement
                if (null != preparedStatement) {
                    try {
                        preparedStatement.close();
                    } catch (SQLException e) {
                        if (!scheduleThreadToStop) {
                            logger.error(e.getMessage(), e);
                        }
                    }
                }
            }
            long cost = System.currentTimeMillis()-start;


            // Wait seconds, align second
            // 如果本次扫描时间小于10s
            if (cost < 1000) {  // scan-overtime, not wait
                try {
                    //如果遍历5s内有执行的任务,但是花费的时间小于1s,则睡眠(1s-当前时间的换算的秒)
                    //如果遍历5s内没有执行的任务,但是睡眠(5s-当前时间的换算的秒)
                    // pre-read period: success > scan each second; fail > skip this period;
                    TimeUnit.MILLISECONDS.sleep((preReadSuc?1000:PRE_READ_MS) - System.currentTimeMillis()%1000);
                } catch (InterruptedException e) {
                    if (!scheduleThreadToStop) {
                        logger.error(e.getMessage(), e);
                    }
                }
            }

        }

        logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread stop");
    }
});
scheduleThread.setDaemon(true);
scheduleThread.setName("xxl-job, admin JobScheduleHelper#scheduleThread");
scheduleThread.start();


// ring thread
ringThread = new Thread(new Runnable() {
    @Override
    public void run() {

        while (!ringThreadToStop) {

            // align second
            try {
                TimeUnit.MILLISECONDS.sleep(1000 - System.currentTimeMillis() % 1000);
            } catch (InterruptedException e) {
                if (!ringThreadToStop) {
                    logger.error(e.getMessage(), e);
                }
            }

            try {
                // second data
                List<Integer> ringItemData = new ArrayList<>();
                int nowSecond = Calendar.getInstance().get(Calendar.SECOND);   // 避免处理耗时太长,跨过刻度,向前校验一个刻度;
                for (int i = 0; i < 2; i++) {
                    List<Integer> tmpData = ringData.remove( (nowSecond+60-i)%60 );
                    if (tmpData != null) {
                        ringItemData.addAll(tmpData);
                    }
                }

                // ring trigger
                logger.debug(">>>>>>>>>>> xxl-job, time-ring beat : " + nowSecond + " = " + Arrays.asList(ringItemData) );
                if (ringItemData.size() > 0) {
                    // do trigger
                    for (int jobId: ringItemData) {
                        // do trigger
                        JobTriggerPoolHelper.trigger(jobId, TriggerTypeEnum.CRON, -1, null, null, null);
                    }
                    // clear
                    ringItemData.clear();
                }
            } catch (Exception e) {
                if (!ringThreadToStop) {
                    logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread error:{}", e);
                }
            }
        }

2. xxl-job的时间轮到底是如何设计的?

当调度器把接下来5秒需要执行的任务和过期的任务查询出来时,要通过pushTimeRing放到对应的时间轮,那他是如何放的呢?继续往下看!!!

xxl-job采用的是自研的时间轮,其实设计上比较简单。
底层采用的数据结构为:ConcurrentHashMapMap<Integer, List>
进入时间轮的源码:

    private void pushTimeRing(int ringSecond, int jobId){
        // push async ring
        // 查看当前的时间刻度是否有已有的jobId集合
        List<Integer> ringItemData = ringData.get(ringSecond);
        if (ringItemData == null) {
            ringItemData = new ArrayList<Integer>();
            ringData.put(ringSecond, ringItemData);
        }
        ringItemData.add(jobId);

        logger.debug(">>>>>>>>>>> xxl-job, schedule push time-ring : " + ringSecond + " = " + Arrays.asList(ringItemData) );
    }

可以看到时间轮其实就是map,key为秒数,value为任务id的集合。
此时就是会有一个线程,每隔一秒会从时间轮读取一次任务
从时间轮获取任务的主要流程如下:

  1. 获取时间轮中当前秒和上一秒中需要执行的任务。
  2. 把任务投放到调度器。
  3. 假如这次从时间轮获取任务并调度的总时间小于1秒,则会让线程等待下一秒才开始执行。

具体源码下:

// ring thread
        ringThread = new Thread(new Runnable() {
            @Override
            public void run() {

                while (!ringThreadToStop) {

                    // align second
                    try {
                        //默认为每一秒执行一次,即使本次轮询在1s之内,也会等到下一秒才执行
                        TimeUnit.MILLISECONDS.sleep(1000 - System.currentTimeMillis() % 1000);
                    } catch (InterruptedException e) {
                        if (!ringThreadToStop) {
                            logger.error(e.getMessage(), e);
                        }
                    }

                    try {
                        // second data
                        List<Integer> ringItemData = new ArrayList<>();
                        int nowSecond = Calendar.getInstance().get(Calendar.SECOND);   // 避免处理耗时太长,跨过刻度,向前校验一个刻度;
                        for (int i = 0; i < 2; i++) {
                            List<Integer> tmpData = ringData.remove( (nowSecond+60-i)%60 );
                            if (tmpData != null) {
                                ringItemData.addAll(tmpData);
                            }
                        }

                        // ring trigger
                        logger.debug(">>>>>>>>>>> xxl-job, time-ring beat : " + nowSecond + " = " + Arrays.asList(ringItemData) );
                        if (ringItemData.size() > 0) {
                            // do trigger
                            for (int jobId: ringItemData) {
                                // do trigger
                                // 准备进行调度
                                JobTriggerPoolHelper.trigger(jobId, TriggerTypeEnum.CRON, -1, null, null, null);
                            }
                            // clear
                            ringItemData.clear();
                        }
                    } catch (Exception e) {
                        if (!ringThreadToStop) {
                            logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread error:{}", e);
                        }
                    }
                }
                logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread stop");
            }
        });
        ringThread.setDaemon(true);
        ringThread.setName("xxl-job, admin JobScheduleHelper#ringThread");
        ringThread.start();

问题来了,调度器是如何执行任务的呢?这就引出了下一个问题快慢线程池!!
首先看下什么是快慢线程池:

3. xxl-job的快慢线程池到底该如何理解?

当调度任务的时候,实际是由线程池调度的,但这个线程池有两个,分别是快线程池和慢线程池,
在首次执行的时候,默认使用快线程池执行的,当一分钟内失败的次数到达10次,会使用慢线程池。
那快慢线程池的区别是什么呢?继续往下看:

public void start(){
    //最大200个线程
    fastTriggerPool = new ThreadPoolExecutor(
            10,
            XxlJobAdminConfig.getAdminConfig().getTriggerPoolFastMax(),
            60L,
            TimeUnit.SECONDS,
            new LinkedBlockingQueue<Runnable>(1000),
            new ThreadFactory() {
                @Override
                public Thread newThread(Runnable r) {
                    return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-fastTriggerPool-" + r.hashCode());
                }
            });

    //当任务再一分钟内超时10次时,会放入到慢触发器执行
    slowTriggerPool = new ThreadPoolExecutor(
            10,
            XxlJobAdminConfig.getAdminConfig().getTriggerPoolSlowMax(),
            60L,
            TimeUnit.SECONDS,
            new LinkedBlockingQueue<Runnable>(2000),
            new ThreadFactory() {
                @Override
                public Thread newThread(Runnable r) {
                    return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-slowTriggerPool-" + r.hashCode());
                }
            });
}
    public int getTriggerPoolFastMax() {
        if (triggerPoolFastMax < 200) {
            return 200;
        }
        return triggerPoolFastMax;
    }

    public int getTriggerPoolSlowMax() {
        if (triggerPoolSlowMax < 100) {
            return 100;
        }
        return triggerPoolSlowMax;
    }

可以看出来
慢线程池的最大线程数少,等待队列长。
快线程池的最大线程数大,等待队列端。
快线程池适合执行调度时间短的任务,慢线程池适合执行调度时间长的任务。
接下来重点看下线程池执行任务调度源码:
稳住老铁么!!!!!

        triggerPool_.execute(new Runnable() {
            @Override
            public void run() {

                long start = System.currentTimeMillis();

                try {
                    // do trigger
                    XxlJobTrigger.trigger(jobId, triggerType, failRetryCount, executorShardingParam, executorParam, addressList);
                } catch (Exception e) {
                    logger.error(e.getMessage(), e);
                } finally {

                    // check timeout-count-map
                    // 获取任务调度后的时间,精确到分钟
                    long minTim_now = System.currentTimeMillis()/60000;
                    //如果超过一分钟
                    if (minTim != minTim_now) {
                        //重置调度时间
                        minTim = minTim_now;
                        //清空job超时统计map
                        jobTimeoutCountMap.clear();
                    }

                    // incr timeout-count-map
                    long cost = System.currentTimeMillis()-start;
                    if (cost > 500) {       // ob-timeout threshold 500ms
                        AtomicInteger timeoutCount = jobTimeoutCountMap.putIfAbsent(jobId, new AtomicInteger(1));
                        if (timeoutCount != null) {
                            timeoutCount.incrementAndGet();
                        }
                    }

                }

            }
        });

上面的主要流程为:

  1. 通过jobTimeoutCountMap判断当前任务是否是曾超过10次的慢任务,慢任务由慢线程池运行。jobTimeoutCountMap存储了jobId曾经的执行耗时。
  2. 执行调度trigger主逻辑。
  3. 1分钟一次,会去清空jobTimeoutCountMap。为了避免慢任务这辈子都翻不了身。
  4. 在计算耗时的时候,会将执行时间大于500ms的任务存储到jobTimeoutCountMap中。如果该job已存在,则value+1,这里体现了value的含义,记录慢执行的次数。

在这快慢线程池有一定了解之后,还有一个很重要的问题没有解决:
大家都知道xxl-job的admin和worker是分开的,那admin是如何调度worker的呢?继续搞起!!!

4. 再来扒一扒xxl-job底层的调度策略

首先咱先来整体源码执行流程

private static void processTrigger(XxlJobGroup group, XxlJobInfo jobInfo, int finalFailRetryCount, TriggerTypeEnum triggerType, int index, int total){

        // param
        // 获取阻塞策略,如果用户没有配置,则取串行化
        ExecutorBlockStrategyEnum blockStrategy = ExecutorBlockStrategyEnum.match(jobInfo.getExecutorBlockStrategy(), ExecutorBlockStrategyEnum.SERIAL_EXECUTION);  // block strategy
        // 获取路由策略
        ExecutorRouteStrategyEnum executorRouteStrategyEnum = ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null);    // route strategy
        String shardingParam = (ExecutorRouteStrategyEnum.SHARDING_BROADCAST==executorRouteStrategyEnum)?String.valueOf(index).concat("/").concat(String.valueOf(total)):null;

        // 2、init address
        String address = null;
        ReturnT<String> routeAddressResult = null;
        if (group.getRegistryList()!=null && !group.getRegistryList().isEmpty()) {
            //假如是广播模式
            if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST == executorRouteStrategyEnum) {
                //分片的index,通过index从registryList获取对应的地址
                if (index < group.getRegistryList().size()) {
                    address = group.getRegistryList().get(index);
                } else {
                    address = group.getRegistryList().get(0);
                }
                //如果不是广播模式,调用对应的路由策略
            } else {
                routeAddressResult = executorRouteStrategyEnum.getRouter().route(triggerParam, group.getRegistryList());
                if (routeAddressResult.getCode() == ReturnT.SUCCESS_CODE) {
                    address = routeAddressResult.getContent();
                }
            }
        } else {
            routeAddressResult = new ReturnT<String>(ReturnT.FAIL_CODE, I18nUtil.getString("jobconf_trigger_address_empty"));
        }

        // 3、trigger remote executor
        ReturnT<String> triggerResult = null;
        if (address != null) {
            //调用对应地址
            triggerResult = runExecutor(triggerParam, address);
        } else {
            triggerResult = new ReturnT<String>(ReturnT.FAIL_CODE, null);
        }

        // 4、collection trigger info
        StringBuffer triggerMsgSb = new StringBuffer();
        triggerMsgSb.append(I18nUtil.getString("jobconf_trigger_type")).append(":").append(triggerType.getTitle());
        triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_admin_adress")).append(":").append(IpUtil.getIp());
        triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regtype")).append(":")
                .append( (group.getAddressType() == 0)?I18nUtil.getString("jobgroup_field_addressType_0"):I18nUtil.getString("jobgroup_field_addressType_1") );
        triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regaddress")).append(":").append(group.getRegistryList());
        triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorRouteStrategy")).append(":").append(executorRouteStrategyEnum.getTitle());
        if (shardingParam != null) {
            triggerMsgSb.append("("+shardingParam+")");
        }
        triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorBlockStrategy")).append(":").append(blockStrategy.getTitle());
        triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_timeout")).append(":").append(jobInfo.getExecutorTimeout());
        triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorFailRetryCount")).append(":").append(finalFailRetryCount);

        triggerMsgSb.append("<br><br><span style=\"color:#00c0ef;\" > >>>>>>>>>>>"+ I18nUtil.getString("jobconf_trigger_run") +"<<<<<<<<<<< </span><br>")
                .append((routeAddressResult!=null&&routeAddressResult.getMsg()!=null)?routeAddressResult.getMsg()+"<br><br>":"").append(triggerResult.getMsg()!=null?triggerResult.getMsg():"");

        // 5、save log trigger-info
        jobLog.setExecutorAddress(address);
        jobLog.setExecutorHandler(jobInfo.getExecutorHandler());
        jobLog.setExecutorParam(jobInfo.getExecutorParam());
        jobLog.setExecutorShardingParam(shardingParam);
        jobLog.setExecutorFailRetryCount(finalFailRetryCount);
        //jobLog.setTriggerTime();
        jobLog.setTriggerCode(triggerResult.getCode());
        jobLog.setTriggerMsg(triggerMsgSb.toString());
        XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(jobLog);

        logger.debug(">>>>>>>>>>> xxl-job trigger end, jobId:{}", jobLog.getId());
    }

那我们总结下上面的流程:

  1. 获取对应阻塞策略,如果没有设置的话,会设置为串行策略。
  2. 获取对应的路由策略。
  3. 通过路由策略找到对应的executor地址,下发任务。

5. 调度器的路由策略到底是如何设计的?

那调度器是如何选择执行器执行的呢?这块就需要看下调度器的路由策略
路由策略到底有哪些呢?可以先从代码看下:
image.png
父类的接口是ExecutorRouter

public abstract class ExecutorRouter {
    protected static Logger logger = LoggerFactory.getLogger(ExecutorRouter.class);

    /**
     * route address
     *  
     * @param addressList 执行器的地址集合
     * @return  ReturnT.content=address
     */
    public abstract ReturnT<String> route(TriggerParam triggerParam, List<String> addressList);

}

那接下来具体常用的每个路由策略和算法:

5.1. ExecutorRouteFirst

取第一个地址值

public class ExecutorRouteFirst extends ExecutorRouter {

    @Override
    public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList){
        return new ReturnT<String>(addressList.get(0));
    }

}

5.2. ExecutorRouteLast

取最后一个地址值

public class ExecutorRouteLast extends ExecutorRouter {

    @Override
    public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) {
        return new ReturnT<String>(addressList.get(addressList.size()-1));
    }

}

5.3. ExecutorRouteRound

轮询并不是从第一个开始,而是随机选择开始的位置,每次通过自增后取模来定位到下一个地址,为了防止任务的统计值太大,每24小时会清除一次位置信息,重新随机定位。
底层会存储每个任务的计数值:ConcurrentMap<Integer, AtomicInteger> routeCountEachJob
key为任务id,value为任务的统计

public class ExecutorRouteRound extends ExecutorRouter {

    private static ConcurrentMap<Integer, AtomicInteger> routeCountEachJob = new ConcurrentHashMap<>();
    private static long CACHE_VALID_TIME = 0;

    private static int count(int jobId) {
        // cache clear
        if (System.currentTimeMillis() > CACHE_VALID_TIME) {
            //每24小时清除所有位置的任务
            routeCountEachJob.clear();
            //设置缓存时间
            CACHE_VALID_TIME = System.currentTimeMillis() + 1000*60*60*24;
        }

        AtomicInteger count = routeCountEachJob.get(jobId);
        if (count == null || count.get() > 1000000) {
            // 初始化时主动Random一次,缓解首次压力
            count = new AtomicInteger(new Random().nextInt(100));
        } else {
            // count++
            count.addAndGet(1);
        }
        routeCountEachJob.put(jobId, count);
        return count.get();
    }


    @Override
    public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) {
        String address = addressList.get(count(triggerParam.getJobId())%addressList.size());
        return new ReturnT<String>(address);
    }

}

5.4. ExecutorRouteConsistentHash

为了让任务能够均匀的分散在各个机器上,采用了一致性hash算法,对每个地址都要映射成100个虚拟节点,让任务分布的更均匀

public class ExecutorRouteConsistentHash extends ExecutorRouter {

    private static int VIRTUAL_NODE_NUM = 100;

    /**
     * get hash code on 2^32 ring (md5散列的方式计算hash值)
     * @param key
     * @return
     */
    private static long hash(String key) {

        // md5 byte
        MessageDigest md5;
        try {
            md5 = MessageDigest.getInstance("MD5");
        } catch (NoSuchAlgorithmException e) {
            throw new RuntimeException("MD5 not supported", e);
        }
        md5.reset();
        byte[] keyBytes = null;
        try {
            keyBytes = key.getBytes("UTF-8");
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException("Unknown string :" + key, e);
        }

        md5.update(keyBytes);
        byte[] digest = md5.digest();

        // hash code, Truncate to 32-bits
        long hashCode = ((long) (digest[3] & 0xFF) << 24)
                | ((long) (digest[2] & 0xFF) << 16)
                | ((long) (digest[1] & 0xFF) << 8)
                | (digest[0] & 0xFF);

        long truncateHashCode = hashCode & 0xffffffffL;
        return truncateHashCode;
    }

    public String hashJob(int jobId, List<String> addressList) {

        // ------A1------A2-------A3------
        // -----------J1------------------
        // 使用treemap使之有序
        TreeMap<Long, String> addressRing = new TreeMap<Long, String>();
        // 遍历所有地址
        for (String address: addressList) {
            //对每一个地址,都有100个虚拟节点
            for (int i = 0; i < VIRTUAL_NODE_NUM; i++) {
                long addressHash = hash("SHARD-" + address + "-NODE-" + i);
                addressRing.put(addressHash, address);
            }
        }
        //取jobId的哈希值
        long jobHash = hash(String.valueOf(jobId));
        //取键值大于jobHash的后半部分视图
        SortedMap<Long, String> lastRing = addressRing.tailMap(jobHash);
        if (!lastRing.isEmpty()) {
            //取后半部分试图的最小值
            return lastRing.get(lastRing.firstKey());
        }
        //如果jobHash正好是treeMap的最后一个值,后半部分视图没有,此时会去取视图的第一个值
        return addressRing.firstEntry().getValue();
    }

    @Override
    public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) {
        String address = hashJob(triggerParam.getJobId(), addressList);
        return new ReturnT<String>(address);
    }

}

5.5. ExecutorRouteFailover

这个路由策略为故障转移,里面会遍历执行器地址值集合,如果执行器异常,跳过,直到得到能够正常使用的执行器

public class ExecutorRouteFailover extends ExecutorRouter {

    @Override
    public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) {

        StringBuffer beatResultSB = new StringBuffer();
        //遍历执行器地址,找到存活的执行器
        for (String address : addressList) {
            // beat
            ReturnT<String> beatResult = null;
            try {
                //查看机器是否存活,如果
                ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(address);
                beatResult = executorBiz.beat();
            } catch (Exception e) {
                logger.error(e.getMessage(), e);
                beatResult = new ReturnT<String>(ReturnT.FAIL_CODE, ""+e );
            }
            beatResultSB.append( (beatResultSB.length()>0)?"<br><br>":"")
                    .append(I18nUtil.getString("jobconf_beat") + ":")
                    .append("<br>address:").append(address)
                    .append("<br>code:").append(beatResult.getCode())
                    .append("<br>msg:").append(beatResult.getMsg());

            // beat success
            // 直到发现存活的,就返回
            if (beatResult.getCode() == ReturnT.SUCCESS_CODE) {

                beatResult.setMsg(beatResultSB.toString());
                beatResult.setContent(address);
                return beatResult;
            }
        }
        return new ReturnT<String>(ReturnT.FAIL_CODE, beatResultSB.toString());

    }
}

5.6. ExecutorRouteBusyover

这个路由器为判断执行器忙碌时跳过,直到获取空闲的执行器

public class ExecutorRouteBusyover extends ExecutorRouter {

    @Override
    public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) {
        StringBuffer idleBeatResultSB = new StringBuffer();
        for (String address : addressList) {
            // beat
            ReturnT<String> idleBeatResult = null;
            try {
                ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(address);
                //查看执行器是否在执行任务
                idleBeatResult = executorBiz.idleBeat(new IdleBeatParam(triggerParam.getJobId()));
            } catch (Exception e) {
                logger.error(e.getMessage(), e);
                idleBeatResult = new ReturnT<String>(ReturnT.FAIL_CODE, ""+e );
            }
            idleBeatResultSB.append( (idleBeatResultSB.length()>0)?"<br><br>":"")
            .append(I18nUtil.getString("jobconf_idleBeat") + ":")
            .append("<br>address:").append(address)
            .append("<br>code:").append(idleBeatResult.getCode())
            .append("<br>msg:").append(idleBeatResult.getMsg());

            // beat success
            if (idleBeatResult.getCode() == ReturnT.SUCCESS_CODE) {
                idleBeatResult.setMsg(idleBeatResultSB.toString());
                idleBeatResult.setContent(address);
                return idleBeatResult;
            }
        }

        return new ReturnT<String>(ReturnT.FAIL_CODE, idleBeatResultSB.toString());
    }

}

到现在,任务就已经被调度完成,接下来我们看下任务是如何被执行器执行的?

7. 执行器到底是如何执行任务的

源码如下:

public ReturnT<String> run(TriggerParam triggerParam) {
        // load old:jobHandler + jobThread
        // 查看当前的任务id是否有执行的线程
        JobThread jobThread = XxlJobExecutor.loadJobThread(triggerParam.getJobId());
        // 如果有返回对应的JobHandler
        IJobHandler jobHandler = jobThread!=null?jobThread.getHandler():null;
        String removeOldReason = null;

        // valid:jobHandler + jobThread
        GlueTypeEnum glueTypeEnum = GlueTypeEnum.match(triggerParam.getGlueType());
        if (GlueTypeEnum.BEAN == glueTypeEnum) {


            // 获取admin传进来的需要执行的jobHandler
            IJobHandler newJobHandler = XxlJobExecutor.loadJobHandler(triggerParam.getExecutorHandler());

            // valid old jobThread
            // 判断传进来的handler是否和旧的handler一致,如果不一致,就会把之前的之前的handler清除掉
            // 其实就是把jobThread和jobHandler置为空
            if (jobThread!=null && jobHandler != newJobHandler) {
                // change handler, need kill old thread
                removeOldReason = "change jobhandler or glue type, and terminate the old job thread.";

                jobThread = null;
                jobHandler = null;
            }

            // valid handler
            // 如果为空,说明jobHandler发生改变,或者是handler首次调用
            if (jobHandler == null) {
                jobHandler = newJobHandler;
                if (jobHandler == null) {
                    return new ReturnT<String>(ReturnT.FAIL_CODE, "job handler [" + triggerParam.getExecutorHandler() + "] not found.");
                }
            }

        } else if (GlueTypeEnum.GLUE_GROOVY == glueTypeEnum) {

            // valid old jobThread
            if (jobThread != null &&
                    !(jobThread.getHandler() instanceof GlueJobHandler
                        && ((GlueJobHandler) jobThread.getHandler()).getGlueUpdatetime()==triggerParam.getGlueUpdatetime() )) {
                // change handler or gluesource updated, need kill old thread
                removeOldReason = "change job source or glue type, and terminate the old job thread.";

                jobThread = null;
                jobHandler = null;
            }

            // valid handler
            if (jobHandler == null) {
                try {
                    IJobHandler originJobHandler = GlueFactory.getInstance().loadNewInstance(triggerParam.getGlueSource());
                    jobHandler = new GlueJobHandler(originJobHandler, triggerParam.getGlueUpdatetime());
                } catch (Exception e) {
                    logger.error(e.getMessage(), e);
                    return new ReturnT<String>(ReturnT.FAIL_CODE, e.getMessage());
                }
            }
        } else if (glueTypeEnum!=null && glueTypeEnum.isScript()) {

            // valid old jobThread
            if (jobThread != null &&
                    !(jobThread.getHandler() instanceof ScriptJobHandler
                            && ((ScriptJobHandler) jobThread.getHandler()).getGlueUpdatetime()==triggerParam.getGlueUpdatetime() )) {
                // change script or gluesource updated, need kill old thread
                removeOldReason = "change job source or glue type, and terminate the old job thread.";

                jobThread = null;
                jobHandler = null;
            }

            // valid handler
            if (jobHandler == null) {
                jobHandler = new ScriptJobHandler(triggerParam.getJobId(), triggerParam.getGlueUpdatetime(), triggerParam.getGlueSource(), GlueTypeEnum.match(triggerParam.getGlueType()));
            }
        } else {
            return new ReturnT<String>(ReturnT.FAIL_CODE, "glueType[" + triggerParam.getGlueType() + "] is not valid.");
        }

        // executor block strategy
        // 判断对应的执行器策略
        if (jobThread != null) {
            ExecutorBlockStrategyEnum blockStrategy = ExecutorBlockStrategyEnum.match(triggerParam.getExecutorBlockStrategy(), null);
            //假如策略是丢弃
            if (ExecutorBlockStrategyEnum.DISCARD_LATER == blockStrategy) {
                // discard when running
                if (jobThread.isRunningOrHasQueue()) {
                    return new ReturnT<String>(ReturnT.FAIL_CODE, "block strategy effect:"+ExecutorBlockStrategyEnum.DISCARD_LATER.getTitle());
                }
            //假如策略是覆盖,会新启动一个线程,去执行任务
            } else if (ExecutorBlockStrategyEnum.COVER_EARLY == blockStrategy) {
                // kill running jobThread
                if (jobThread.isRunningOrHasQueue()) {
                    removeOldReason = "block strategy effect:" + ExecutorBlockStrategyEnum.COVER_EARLY.getTitle();
                    //
                    jobThread = null;
                }
            } else {
                // just queue trigger
            }
        }

        // replace thread (new or exists invalid)
        // 注册jobId和对应的JobThread
        if (jobThread == null) {
            jobThread = XxlJobExecutor.registJobThread(triggerParam.getJobId(), jobHandler, removeOldReason);
        }

        // push data to queue
        // 假如是排队等待
        ReturnT<String> pushResult = jobThread.pushTriggerQueue(triggerParam);
        return pushResult;
    }

主要干的事情:

  1. 根据jobId查看是否有对应jobHandler在执行,如果有需要比较旧的JobHandler是否和新传进来的一样,如果不一样,需要让旧的JobHandler失效,主要是应对JobHandler会发生实时变更的情况。
  2. 判断执行器的阻塞策略。
  3. 如果执行器的阻塞策略是串行执行,会把任务添加到JobId对应的JobThread中的阻塞队列之中。其实这块就可以看出JobThread是用于jobId相同的任务,用阻塞队列实现任务的串行。
  4. 如果执行器的拒绝策略为丢弃新任务,此时会判断阻塞队列里面是否有等待任务,如果有则不做任何操作。
  5. 如果阻塞策略是覆盖旧任务,此时会再起一个新线程去执行当前的任务。

那阻塞队列是如何被消费的?
这里面就是通过xxl-job定义的jobHandler来进行执行的,那接下来去看下JobHandler来如何执行任务的

8. 自定义的JobHandler是如何被执行的

public void run() {

    	// init
    	try {
			//首次执行handler时,运行handler的init初始化
			handler.init();
		} catch (Throwable e) {
    		logger.error(e.getMessage(), e);
		}

		// execute
		while(!toStop){
			running = false;
			idleTimes++;

            TriggerParam triggerParam = null;
            try {
				// to check toStop signal, we need cycle, so wo cannot use queue.take(), instand of poll(timeout)
				// 每隔3秒获取任务
				triggerParam = triggerQueue.poll(3L, TimeUnit.SECONDS);
				if (triggerParam!=null) {
					running = true;
					idleTimes = 0;
					triggerLogIdSet.remove(triggerParam.getLogId());

					// log filename, like "logPath/yyyy-MM-dd/9999.log"
					String logFileName = XxlJobFileAppender.makeLogFileName(new Date(triggerParam.getLogDateTime()), triggerParam.getLogId());
					XxlJobContext xxlJobContext = new XxlJobContext(
							triggerParam.getJobId(),
							triggerParam.getExecutorParams(),
							logFileName,
							triggerParam.getBroadcastIndex(),
							triggerParam.getBroadcastTotal());
					
					// 设置xxlJobContext上下文
					XxlJobContext.setXxlJobContext(xxlJobContext);

					// execute
					XxlJobHelper.log("<br>----------- xxl-job job execute start -----------<br>----------- Param:" + xxlJobContext.getJobParam());

					if (triggerParam.getExecutorTimeout() > 0) {
						// limit timeout
						Thread futureThread = null;
						try {
							FutureTask<Boolean> futureTask = new FutureTask<Boolean>(new Callable<Boolean>() {
								@Override
								public Boolean call() throws Exception {

									// init job context
									XxlJobContext.setXxlJobContext(xxlJobContext);

									handler.execute();
									return true;
								}
							});
							futureThread = new Thread(futureTask);
							futureThread.start();
							Boolean tempResult = futureTask.get(triggerParam.getExecutorTimeout(), TimeUnit.SECONDS);
						} catch (TimeoutException e) {

							XxlJobHelper.log("<br>----------- xxl-job job execute timeout");
							XxlJobHelper.log(e);

							// handle result
							XxlJobHelper.handleTimeout("job execute timeout ");
						} finally {
							// 中断自定义任务线程
							futureThread.interrupt();
						}
					} else {
						// just execute
						handler.execute();
					}

					// valid execute handle data
					if (XxlJobContext.getXxlJobContext().getHandleCode() <= 0) {
						XxlJobHelper.handleFail("job handle result lost.");
					} else {
						String tempHandleMsg = XxlJobContext.getXxlJobContext().getHandleMsg();
						tempHandleMsg = (tempHandleMsg!=null&&tempHandleMsg.length()>50000)
								?tempHandleMsg.substring(0, 50000).concat("...")
								:tempHandleMsg;
						XxlJobContext.getXxlJobContext().setHandleMsg(tempHandleMsg);
					}
					XxlJobHelper.log("<br>----------- xxl-job job execute end(finish) -----------<br>----------- Result: handleCode="
							+ XxlJobContext.getXxlJobContext().getHandleCode()
							+ ", handleMsg = "
							+ XxlJobContext.getXxlJobContext().getHandleMsg()
					);

				} else {
					//如果连续30次获取到任务,线程会关闭
					if (idleTimes > 30) {
						if(triggerQueue.size() == 0) {	// avoid concurrent trigger causes jobId-lost
							XxlJobExecutor.removeJobThread(jobId, "excutor idel times over limit.");
						}
					}
				}
			} catch (Throwable e) {
				if (toStop) {
					XxlJobHelper.log("<br>----------- JobThread toStop, stopReason:" + stopReason);
				}

				// handle result
				StringWriter stringWriter = new StringWriter();
				e.printStackTrace(new PrintWriter(stringWriter));
				String errorMsg = stringWriter.toString();

				XxlJobHelper.handleFail(errorMsg);

				XxlJobHelper.log("<br>----------- JobThread Exception:" + errorMsg + "<br>----------- xxl-job job execute end(error) -----------");
			} finally {
                if(triggerParam != null) {
                    // callback handler info
                    if (!toStop) {
                        // 把执行结果发送给回调队列
                        TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
                        		triggerParam.getLogId(),
								triggerParam.getLogDateTime(),
								XxlJobContext.getXxlJobContext().getHandleCode(),
								XxlJobContext.getXxlJobContext().getHandleMsg() )
						);
                    } else {
                        // is killed
                        TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
                        		triggerParam.getLogId(),
								triggerParam.getLogDateTime(),
								XxlJobContext.HANDLE_CODE_FAIL,
								stopReason + " [job running, killed]" )
						);
                    }
                }
            }
        }

		// callback trigger request in queue
		while(triggerQueue !=null && triggerQueue.size()>0){
			TriggerParam triggerParam = triggerQueue.poll();
			if (triggerParam!=null) {
				// is killed
				TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
						triggerParam.getLogId(),
						triggerParam.getLogDateTime(),
						XxlJobContext.HANDLE_CODE_FAIL,
						stopReason + " [job not executed, in the job queue, killed.]")
				);
			}
		}

		// destroy
		try {
			handler.destroy();
		} catch (Throwable e) {
			logger.error(e.getMessage(), e);
		}

		logger.info(">>>>>>>>>>> xxl-job JobThread stoped, hashCode:{}", Thread.currentThread());
	}

主要干的事情

  1. 从阻塞队列中获取任务,等待时间为3秒钟。
  2. 创建上下文XxlJobContext,并且会默认设置成功状态位,这样做的原因是我们在JobHandler处理任务的时候,如果不设置返回状态码,并且也没有异常抛出来,对于调度中心而言,会认为任务执行成功了。
  3. 拿到任务之后,会判断请求参数有没有超时时间,如果设置了会起一个异步任务,并且调用FutureTask.get(task.timeout)等待异步任务执行,如果没有设置超时时间,同步执行用户定义的任务。
  4. 当任务执行完之后,执行器是如何把结果推送到一个回调队列,这个回调队列的作用就是把结果异步的传递给任务的调度端。

9. 执行器如何把执行结果通知给调用端?

把执行结果放入到回调队列之中

if(triggerParam != null) {
    // callback handler info
    if (!toStop) {
        // 把执行结果发送给回调队列
        TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
                triggerParam.getLogId(),
                triggerParam.getLogDateTime(),
                XxlJobContext.getXxlJobContext().getHandleCode(),
                XxlJobContext.getXxlJobContext().getHandleMsg() )
        );
    } else {
        // is killed
        TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
                triggerParam.getLogId(),
                triggerParam.getLogDateTime(),
                XxlJobContext.HANDLE_CODE_FAIL,
                stopReason + " [job running, killed]" )
        );
    }
}
triggerCallbackThread = new Thread(new Runnable() {

@Override
public void run() {

    // normal callback
    while(!toStop){
        try {
            //获取任务执行结果
            HandleCallbackParam callback = getInstance().callBackQueue.take();
            if (callback != null) {

                // callback list param
                List<HandleCallbackParam> callbackParamList = new ArrayList<HandleCallbackParam>();

                //批量放入到callbackParamList集合
                int drainToNum = getInstance().callBackQueue.drainTo(callbackParamList);
                callbackParamList.add(callback);

                // callback, will retry if error
                if (callbackParamList!=null && callbackParamList.size()>0) {
                    doCallback(callbackParamList);
                }
            }
        } catch (Exception e) {
            if (!toStop) {
                logger.error(e.getMessage(), e);
            }
        }
    }

    // last callback
    try {
        List<HandleCallbackParam> callbackParamList = new ArrayList<HandleCallbackParam>();
        int drainToNum = getInstance().callBackQueue.drainTo(callbackParamList);
        if (callbackParamList!=null && callbackParamList.size()>0) {
            //回调admin端
            doCallback(callbackParamList);
        }
    } catch (Exception e) {
        if (!toStop) {
            logger.error(e.getMessage(), e);
        }
    }
    logger.info(">>>>>>>>>>> xxl-job, executor callback thread destroy.");

}

1. 当异步线程在执行JobHandler的时候,会调用futureTask.get(triggerParam.getExecutorTimeout(), TimeUnit.SECONDS)等待JobHandler执行完。
2. 任务执行完之后,调用TriggerCallbackThread.pushCallBack()方法,把任务的执行结果放到callBackQueue之中。
3. 此时会有一个定时任务,会去监听callBackQueue,并且会把执行结果传递回admin端。

  • 15
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值