在Dubbo中,为增强系统的容错能力,在很多地方需要用到只需进行一次执行的任务调度。比如RPC调用的超时机制的实现,消费者需要各个RPC调用是否超时,如果超时会将超时结果返回给应用层。在Dubbo最开始的实现中,是采用将所有的返回结果(DefaultFuture)都放入一个集合中,并且通过一个定时任务,每隔一定时间间隔就扫描所有的future,逐个判断是否超时。
这样的实现方式实现起来比较简单,但是存在一个问题就是会有很多无意义的遍历操作。比如一个RPC调用的超时时间是10秒,而我的超时判定定时任务是2秒执行一次,那么可能会有4次左右无意义的轮询操作。
为了解决类似的场景中的问题,Dubbo借鉴Netty,引入了时间轮算法,用来对只需要执行一次的任务进行调度。时间轮算法的原理可以参见这篇文章,https://blog.csdn.net/mindfloating/article/details/8033340
下面主要分析一下Dubbo/Netty中时间轮算法的实现。Dubbo/Netty中时间轮算法主要有以下几个类实现:
Timer接口
/**
* Schedules {@link TimerTask}s for one-time future execution in a background
* thread.
*/
public interface Timer {
/**
* Schedules the specified {@link TimerTask} for one-time execution after
* the specified delay.
*
* @return a handle which is associated with the specified task
* @throws IllegalStateException if this timer has been {@linkplain #stop() stopped} already
* @throws RejectedExecutionException if the pending timeouts are too many and creating new timeout
* can cause instability in the system.
*/
Timeout newTimeout(TimerTask task, long delay, TimeUnit unit);
/**
* Releases all resources acquired by this {@link Timer} and cancels all
* tasks which were scheduled but not executed yet.
*
* @return the handles associated with the tasks which were canceled by
* this method
*/
Set<Timeout> stop();
/**
* the timer is stop
*
* @return true for stop
*/
boolean isStop();
}
这个接口是一个调度的核心接口,从注释可以看出,它主要用于在后台执行一次性的调度。它有一个isStop方法,用来判断这个调度器是否停止运行,还有一个stop方法用来停止调度器的运行。再看newTimeout这个方法,这个方法就是把一个任务扔给调度器执行,第一个参数类型TimerTask,即需要执行的任务,第二个参数类型long,即执行此任务的相对延迟时间,第三个是一个时间单位,也就是第二个参数对应的时间单位。接下来看它的入参TimerTask
TimerTask接口
/**
* A task which is executed after the delay specified with
* {@link Timer#newTimeout(TimerTask, long, TimeUnit)} (TimerTask, long, TimeUnit)}.
*/
public interface TimerTask {
/**
* Executed after the delay specified with
* {@link Timer#newTimeout(TimerTask, long, TimeUnit)}.
*
* @param timeout a handle which is associated with this task
*/
void run(Timeout timeout) throws Exception;
}
这个类就代表调度器要执行的任务,它只有一个方法run,参数类型是Timeout,我们注意到上面Timer接口的newTimeout这个方法返回的参数就是Timeout,和此处的入参相同,大胆猜测这里传入的Timeout参数应该就是newTimeout的返回值。(留待后文验证)
Timeout接口
/**
* A handle associated with a {@link TimerTask} that is returned by a
* {@link Timer}.
*/
public interface Timeout {
/**
* Returns the {@link Timer} that created this handle.
*/
Timer timer();
/**
* Returns the {@link TimerTask} which is associated with this handle.
*/
TimerTask task();
/**
* Returns {@code true} if and only if the {@link TimerTask} associated
* with this handle has been expired.
*/
boolean isExpired();
/**
* Returns {@code true} if and only if the {@link TimerTask} associated
* with this handle has been cancelled.
*/
boolean isCancelled();
/**
* Attempts to cancel the {@link TimerTask} associated with this handle.
* If the task has been executed or cancelled already, it will return with
* no side effect.
*
* @return True if the cancellation completed successfully, otherwise false
*/
boolean cancel();
}
Timeout代表的是对一次任务的处理。timer方法返回的就是创建这个Timeout的Timer对象,task返回的是这个Timeout处理的任务,isExpired代表的是这个任务是否已经超过它预设的时间,isCancelled是返回是否已取消此任务,cancel则是取消此任务。
以上者几个接口就从逻辑上构成了一个任务调度器系统。我们从各个接口的入参和返回值可以看出,这几个接口设计的很巧妙,往往是某个类创建了另一个类的对象,然后它创建的对象又可以通过方法获取到创建它的对象。这种设计方式在spring框架中也是经常出现的。可以看出在设计一个复杂的系统时这是一种很有效的方式。可以学习一下。
下面就开始看本文的重点,时间轮调度器的实现HashedWheelTimer。首先是类头:
/**
* A {@link Timer} optimized for approximated I/O timeout scheduling.
*
* <h3>Tick Duration</h3>
* <p>
* As described with 'approximated', this timer does not execute the scheduled
* {@link TimerTask} on time. {@link HashedWheelTimer}, on every tick, will
* check if there are any {@link TimerTask}s behind the schedule and execute
* them.
* <p>
* You can increase or decrease the accuracy of the execution timing by
* specifying smaller or larger tick duration in the constructor. In most
* network applications, I/O timeout does not need to be accurate. Therefore,
* the default tick duration is 100 milliseconds and you will not need to try
* different configurations in most cases.
*
* <h3>Ticks per Wheel (Wheel Size)</h3>
* <p>
* {@link HashedWheelTimer} maintains a data structure called 'wheel'.
* To put simply, a wheel is a hash table of {@link TimerTask}s whose hash
* function is 'dead line of the task'. The default number of ticks per wheel
* (i.e. the size of the wheel) is 512. You could specify a larger value
* if you are going to schedule a lot of timeouts.
*
* <h3>Do not create many instances.</h3>
* <p>
* {@link HashedWheelTimer} creates a new thread whenever it is instantiated and
* started. Therefore, you should make sure to create only one instance and
* share it across your application. One of the common mistakes, that makes
* your application unresponsive, is to create a new instance for every connection.
*
* <h3>Implementation Details</h3>
* <p>
* {@link HashedWheelTimer} is based on
* <a href="http://cseweb.ucsd.edu/users/varghese/">George Varghese</a> and
* Tony Lauck's paper,
* <a href="http://cseweb.ucsd.edu/users/varghese/PAPERS/twheel.ps.Z">'Hashed
* and Hierarchical Timing Wheels: data structures to efficiently implement a
* timer facility'</a>. More comprehensive slides are located
* <a href="http://www.cse.wustl.edu/~cdgill/courses/cs6874/TimingWheels.ppt">here</a>.
*/
public class HashedWheelTimer implements Timer {
从注释可以看出,该类并不提供准确的定时执行任务的功能,也就是不能指定几点几分几秒准时执行某个任务,而是在每个tick(也就是时间轮的一个“时间槽”)中,检测是否存在TimerTask已经落后于当前时间,如果是则执行它。(相信了解了时间轮算法的同学,应该是很容易理解这段话的意思的。)我们可以通过设定更小或更大的tick duration(时间槽的持续时间),来提高或降低执行时间的准确率。这句话也很好理解,比如我一个时间槽有1秒,和一个时间槽是5秒,那准确度相差5倍。注释继续说,在大多数网络应用程序中,IO超时不必须是准确的,也就是比如说我要求5秒就超时,那框架不是说必须要在5秒刚好超时的那个点告诉我超时,也可以稍微晚一点点也无所谓。因此,默认的tick duration是100毫秒,我们在大多数场景下并不需要修改它。
这个类维护了一种称为“wheel”的数据结构,也就是我们说的时间轮。简单地说,一个wheel就是一个hash table,它的hash函数是任务的截止时间,也就是我们要通过hash函数把这个任务放到它应该在的时间槽中,这样随着时间的推移,当我们进入某个时间槽中时,这个槽中的任务也刚好到了它该执行的时间。这样就避免了在每一个槽中都需要检测所有任务是否需要执行。默认的时间槽的数量是512,如果我们需要调度非常多的任务,我们可以自定义这个值。
这个类在系统中只需要创建一个实例,因为它在每次被初始化并开始运行的时候,会创建一个新的线程。一个常见的使用错误是,对每个连接(这里应该是Netty中的注释,因为这个类主要用在处理连接,这里的连接可以理解为任务)都创建一个这个类,这将导致应用程序变得不可响应(开的线程太多)。
下面就是介绍这个类的实现原理依据的论文,就不看了。下面直接看代码。首先是field。
/**
* may be in spi?
*/
public static final String NAME = "hased";
private static final Logger logger = LoggerFactory.getLogger(HashedWheelTimer.class);
// 实例计数器,用于记录创建了多少个本类的对象
private static final AtomicInteger INSTANCE_COUNTER = new AtomicInteger();
// 用于对象数超过限制时的告警
private static final AtomicBoolean WARNED_TOO_MANY_INSTANCES = new AtomicBoolean();
// 实例上限
private static final int INSTANCE_COUNT_LIMIT = 64;
// 原子化更新workState变量的工具
private static final AtomicIntegerFieldUpdater<HashedWheelTimer> WORKER_STATE_UPDATER =
AtomicIntegerFieldUpdater.newUpdater(HashedWheelTimer.class, "workerState");
// 推动时间轮运转的执行类
private final Worker worker = new Worker();
// 绑定的执行线程
private final Thread workerThread;
// WORKER初始化状态
private static final int WORKER_STATE_INIT = 0;
// WORKER已开始状态
private static final int WORKER_STATE_STARTED = 1;
// WORKER已停止状态
private static final int WORKER_STATE_SHUTDOWN = 2;
/**
* 0 - init, 1 - started, 2 - shut down
*/
@SuppressWarnings({
"unused", "FieldMayBeFinal"})
private volatile int workerState;
// 时间槽持续时间
private final long tickDuration;
// 时间槽数组
private final HashedWheelBucket[] wheel;
// 计算任务应该放到哪个时间槽时使用的掩码
private final int mask;
// 线程任务同步工具
private final CountDownLatch startTimeInitialized = new CountDownLatch(1);
// 保存任务调度的队列
private final Queue<HashedWheelTimeout> timeouts = new LinkedBlockingQueue<>();
// 已取消的任务调度队列
private final Queue<HashedWheelTimeout> cancelledTimeouts = new LinkedBlockingQueue<>();
// 等待中的任务调度数量
private final AtomicLong pendingTimeouts = new AtomicLong(0);
// 最大等待任务调度数量
private final long maxPendingTimeouts;
// 时间轮的初始时间
private volatile long startTime;
可能有部分参数的作用看不太懂,结合下文就可以看懂了。首先就看一下这个方法的构造器吧。
/**
* Creates a new timer.
*
* @param threadFactory a {@link ThreadFactory} that creates a
* background {@link Thread} which is dedicated to
* {@link TimerTask} execution.
* @param tickDuration the duration between tick
* @param unit the time unit of the {@code tickDuration}
* @param ticksPerWheel the size of the wheel
* @param maxPendingTimeouts The maximum number of pending timeouts after which call to
* {@code newTimeout} will result in
* {@link java.util.co