Quartz集群调度出现的WARN警告问题

最新推荐文章于 2022-08-16 11:37:47 发布

Yarcl

最新推荐文章于 2022-08-16 11:37:47 发布

阅读量8.7k

点赞数 6

分类专栏： J2EE&JavaSE 问题专栏

本文为博主原创文章，未经博主允许不得转载。所有资源均在网上搜录获取，如有侵权，请联系本人删除!遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。云服务器68元一年:https://i.didiyun.com/280m3s8jYPM

本文链接：https://blog.csdn.net/u012225679/article/details/119387932

版权

J2EE&JavaSE 同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

问题专栏

3 篇文章 0 订阅

订阅专栏

1、报错内容如下：
在这里插入图片描述
具体描述如下图所示：

This scheduler instance xxxx is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior.
ClusterManager detected 1 failed or restarted instances.

分析：
1、可以看到当前日志是由LocalDataSourceJobStore打印出来的，源码查看无日志信息，往父类和接口进行查找到JobStoreSupport，主要源码如下：

protected void clusterRecover(Connection conn, List<SchedulerStateRecord> failedInstances)
        throws JobPersistenceException {

        if (failedInstances.size() > 0) {

            long recoverIds = System.currentTimeMillis();

            logWarnIfNonZero(failedInstances.size(),
                    "ClusterManager: detected " + failedInstances.size()
                            + " failed or restarted instances.");
            // 省略后面的N行代码
            // ....
        }
    }

protected List<SchedulerStateRecord> findFailedInstances(Connection conn)
        throws JobPersistenceException {
        try {
            List<SchedulerStateRecord> failedInstances = new LinkedList<SchedulerStateRecord>();
            boolean foundThisScheduler = false;
            long timeNow = System.currentTimeMillis();
            
            List<SchedulerStateRecord> states = getDelegate().selectSchedulerStateRecords(conn, null);

            for(SchedulerStateRecord rec: states) {
        
                // find own record...
                if (rec.getSchedulerInstanceId().equals(getInstanceId())) {
                    foundThisScheduler = true;
                    if (firstCheckIn) {
                        failedInstances.add(rec);
                    }
                } else {
                    // find failed instances...
                    if (calcFailedIfAfter(rec) < timeNow) {
                        failedInstances.add(rec);
                    }
                }
            }
            
            // The first time through, also check for orphaned fired triggers.
            if (firstCheckIn) {
                failedInstances.addAll(findOrphanedFailedInstances(conn, states));
            }
            
            // If not the first time but we didn't find our own instance, then
            // 不是当前机器同时也不是第一次进行check.
            if ((!foundThisScheduler) && (!firstCheckIn)) {
                // FUTURE_TODO: revisit when handle self-failed-out impl'ed (see FUTURE_TODO in clusterCheckIn() below)
                getLog().warn(
                    "This scheduler instance (" + getInstanceId() + ") is still " + 
                    "active but was recovered by another instance in the cluster.  " +
                    "This may cause inconsistent behavior.");
            }
            
            return failedInstances;
        } catch (Exception e) {
            lastCheckin = System.currentTimeMillis();
            throw new JobPersistenceException("Failure identifying failed instances when checking-in: "
                    + e.getMessage(), e);
        }
    }

可以看到代码中的 // find failed instances… 下面的calcFailedIfAfter方法：

protected long calcFailedIfAfter(SchedulerStateRecord rec) {
   return rec.getCheckinTimestamp() +
        Math.max(rec.getCheckinInterval(), 
                (System.currentTimeMillis() - lastCheckin)) +
        7500L;
}

由于数据库中没有找到当前机器的instance并不是第一次check，所以会打印如下日志：

This scheduler instance xxxx is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior.

同时有其他机器节点的时间发生了超时，由于系统的时间差值较大，超过7.5秒，才会将失败的实例增加到failedInstances中，由于存在超时通讯的节点，所以会执行调用clusterRecover方法，则会打印如下的日志：

ClusterManager detected 1 failed or restarted instances.

所以这个问题主要是由于系统服务器时间不同步导致的，同步集群当中服务的时间即可解决该问题。当前源码学习仍在进行中，如有不对，请不吝赐教，感激不尽!

Yarcl

关注

6
点赞
踩
7

收藏

觉得还不错? 一键收藏
6
评论
Quartz集群调度出现的WARN警告问题

1、报错内容如下：This scheduler instance xxxx is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior.ClusterManager detected 1 failed or restarted instances.这个问题主要是由于系统服务器时间不同步导致的，同步集群当中服务的时间即可解决该问题。...
复制链接

扫一扫