Dubbo/Netty中时间轮算法的原理

最新推荐文章于 2024-05-08 22:01:47 发布

dbqb007

最新推荐文章于 2024-05-08 22:01:47 发布

阅读量2.8k

点赞数 5

分类专栏： Dubbo 技术中间件微服务

本文链接：https://blog.csdn.net/dbqb007/article/details/90740839

版权

在Dubbo中，为增强系统的容错能力，在很多地方需要用到只需进行一次执行的任务调度。比如RPC调用的超时机制的实现，消费者需要各个RPC调用是否超时，如果超时会将超时结果返回给应用层。在Dubbo最开始的实现中，是采用将所有的返回结果（DefaultFuture）都放入一个集合中，并且通过一个定时任务，每隔一定时间间隔就扫描所有的future，逐个判断是否超时。

这样的实现方式实现起来比较简单，但是存在一个问题就是会有很多无意义的遍历操作。比如一个RPC调用的超时时间是10秒，而我的超时判定定时任务是2秒执行一次，那么可能会有4次左右无意义的轮询操作。

为了解决类似的场景中的问题，Dubbo借鉴Netty，引入了时间轮算法，用来对只需要执行一次的任务进行调度。时间轮算法的原理可以参见这篇文章，https://blog.csdn.net/mindfloating/article/details/8033340

下面主要分析一下Dubbo/Netty中时间轮算法的实现。Dubbo/Netty中时间轮算法主要有以下几个类实现：
在这里插入图片描述

Timer接口

/**
 * Schedules {@link TimerTask}s for one-time future execution in a background
 * thread.
 */
public interface Timer {
   

    /**
     * Schedules the specified {@link TimerTask} for one-time execution after
     * the specified delay.
     *
     * @return a handle which is associated with the specified task
     * @throws IllegalStateException      if this timer has been {@linkplain #stop() stopped} already
     * @throws RejectedExecutionException if the pending timeouts are too many and creating new timeout
     *                                    can cause instability in the system.
     */
    Timeout newTimeout(TimerTask task, long delay, TimeUnit unit);

    /**
     * Releases all resources acquired by this {@link Timer} and cancels all
     * tasks which were scheduled but not executed yet.
     *
     * @return the handles associated with the tasks which were canceled by
     * this method
     */
    Set<Timeout> stop();

    /**
     * the timer is stop
     *
     * @return true for stop
     */
    boolean isStop();
}

这个接口是一个调度的核心接口，从注释可以看出，它主要用于在后台执行一次性的调度。它有一个isStop方法，用来判断这个调度器是否停止运行，还有一个stop方法用来停止调度器的运行。再看newTimeout这个方法，这个方法就是把一个任务扔给调度器执行，第一个参数类型TimerTask，即需要执行的任务，第二个参数类型long，即执行此任务的相对延迟时间，第三个是一个时间单位，也就是第二个参数对应的时间单位。接下来看它的入参TimerTask

TimerTask接口

/**
 * A task which is executed after the delay specified with
 * {@link Timer#newTimeout(TimerTask, long, TimeUnit)} (TimerTask, long, TimeUnit)}.
 */
public interface TimerTask {
   

    /**
     * Executed after the delay specified with
     * {@link Timer#newTimeout(TimerTask, long, TimeUnit)}.
     *
     * @param timeout a handle which is associated with this task
     */
    void run(Timeout timeout) throws Exception;
}

这个类就代表调度器要执行的任务，它只有一个方法run，参数类型是Timeout，我们注意到上面Timer接口的newTimeout这个方法返回的参数就是Timeout，和此处的入参相同，大胆猜测这里传入的Timeout参数应该就是newTimeout的返回值。（留待后文验证）

Timeout接口

/**
 * A handle associated with a {@link TimerTask} that is returned by a
 * {@link Timer}.
 */
public interface Timeout {
   

    /**
     * Returns the {@link Timer} that created this handle.
     */
    Timer timer();

    /**
     * Returns the {@link TimerTask} which is associated with this handle.
     */
    TimerTask task();

    /**
     * Returns {@code true} if and only if the {@link TimerTask} associated
     * with this handle has been expired.
     */
    boolean isExpired();

    /**
     * Returns {@code true} if and only if the {@link TimerTask} associated
     * with this handle has been cancelled.
     */
    boolean isCancelled();

    /**
     * Attempts to cancel the {@link TimerTask} associated with this handle.
     * If the task has been executed or cancelled already, it will return with
     * no side effect.
     *
     * @return True if the cancellation completed successfully, otherwise false
     */
    boolean cancel();
}

Timeout代表的是对一次任务的处理。timer方法返回的就是创建这个Timeout的Timer对象，task返回的是这个Timeout处理的任务，isExpired代表的是这个任务是否已经超过它预设的时间，isCancelled是返回是否已取消此任务，cancel则是取消此任务。

以上者几个接口就从逻辑上构成了一个任务调度器系统。我们从各个接口的入参和返回值可以看出，这几个接口设计的很巧妙，往往是某个类创建了另一个类的对象，然后它创建的对象又可以通过方法获取到创建它的对象。这种设计方式在spring框架中也是经常出现的。可以看出在设计一个复杂的系统时这是一种很有效的方式。可以学习一下。

下面就开始看本文的重点，时间轮调度器的实现HashedWheelTimer。首先是类头：

/**
 * A {@link Timer} optimized for approximated I/O timeout scheduling.
 *
 * <h3>Tick Duration</h3>
 * <p>
 * As described with 'approximated', this timer does not execute the scheduled
 * {@link TimerTask} on time.  {@link HashedWheelTimer}, on every tick, will
 * check if there are any {@link TimerTask}s behind the schedule and execute
 * them.
 * <p>
 * You can increase or decrease the accuracy of the execution timing by
 * specifying smaller or larger tick duration in the constructor.  In most
 * network applications, I/O timeout does not need to be accurate.  Therefore,
 * the default tick duration is 100 milliseconds and you will not need to try
 * different configurations in most cases.
 *
 * <h3>Ticks per Wheel (Wheel Size)</h3>
 * <p>
 * {@link HashedWheelTimer} maintains a data structure called 'wheel'.
 * To put simply, a wheel is a hash table of {@link TimerTask}s whose hash
 * function is 'dead line of the task'.  The default number of ticks per wheel
 * (i.e. the size of the wheel) is 512.  You could specify a larger value
 * if you are going to schedule a lot of timeouts.
 *
 * <h3>Do not create many instances.</h3>
 * <p>
 * {@link HashedWheelTimer} creates a new thread whenever it is instantiated and
 * started.  Therefore, you should make sure to create only one instance and
 * share it across your application.  One of the common mistakes, that makes
 * your application unresponsive, is to create a new instance for every connection.
 *
 * <h3>Implementation Details</h3>
 * <p>
 * {@link HashedWheelTimer} is based on
 * <a href="http://cseweb.ucsd.edu/users/varghese/">George Varghese</a> and
 * Tony Lauck's paper,
 * <a href="http://cseweb.ucsd.edu/users/varghese/PAPERS/twheel.ps.Z">'Hashed
 * and Hierarchical Timing Wheels: data structures to efficiently implement a
 * timer facility'</a>.  More comprehensive slides are located
 * <a href="http://www.cse.wustl.edu/~cdgill/courses/cs6874/TimingWheels.ppt">here</a>.
 */
public class HashedWheelTimer implements Timer {

从注释可以看出，该类并不提供准确的定时执行任务的功能，也就是不能指定几点几分几秒准时执行某个任务，而是在每个tick（也就是时间轮的一个“时间槽”）中，检测是否存在TimerTask已经落后于当前时间，如果是则执行它。（相信了解了时间轮算法的同学，应该是很容易理解这段话的意思的。）我们可以通过设定更小或更大的tick duration（时间槽的持续时间），来提高或降低执行时间的准确率。这句话也很好理解，比如我一个时间槽有1秒，和一个时间槽是5秒，那准确度相差5倍。注释继续说，在大多数网络应用程序中，IO超时不必须是准确的，也就是比如说我要求5秒就超时，那框架不是说必须要在5秒刚好超时的那个点告诉我超时，也可以稍微晚一点点也无所谓。因此，默认的tick duration是100毫秒，我们在大多数场景下并不需要修改它。

这个类维护了一种称为“wheel”的数据结构，也就是我们说的时间轮。简单地说，一个wheel就是一个hash table，它的hash函数是任务的截止时间，也就是我们要通过hash函数把这个任务放到它应该在的时间槽中，这样随着时间的推移，当我们进入某个时间槽中时，这个槽中的任务也刚好到了它该执行的时间。这样就避免了在每一个槽中都需要检测所有任务是否需要执行。默认的时间槽的数量是512，如果我们需要调度非常多的任务，我们可以自定义这个值。

这个类在系统中只需要创建一个实例，因为它在每次被初始化并开始运行的时候，会创建一个新的线程。一个常见的使用错误是，对每个连接（这里应该是Netty中的注释，因为这个类主要用在处理连接，这里的连接可以理解为任务）都创建一个这个类，这将导致应用程序变得不可响应（开的线程太多）。

下面就是介绍这个类的实现原理依据的论文，就不看了。下面直接看代码。首先是field。

   /**
     * may be in spi?
     */
    public static final String NAME = "hased";

    private static final Logger logger = LoggerFactory.getLogger(HashedWheelTimer.class);

    // 实例计数器，用于记录创建了多少个本类的对象
    private static final AtomicInteger INSTANCE_COUNTER = new AtomicInteger();
    // 用于对象数超过限制时的告警
    private static final AtomicBoolean WARNED_TOO_MANY_INSTANCES = new AtomicBoolean();
    // 实例上限
    private static final int INSTANCE_COUNT_LIMIT = 64;
    // 原子化更新workState变量的工具
    private static final AtomicIntegerFieldUpdater<HashedWheelTimer> WORKER_STATE_UPDATER =
            AtomicIntegerFieldUpdater.newUpdater(HashedWheelTimer.class, "workerState");
    // 推动时间轮运转的执行类
    private final Worker worker = new Worker();
    // 绑定的执行线程
    private final Thread workerThread;

    // WORKER初始化状态
    private static final int WORKER_STATE_INIT = 0;
    // WORKER已开始状态
    private static final int WORKER_STATE_STARTED = 1;
    // WORKER已停止状态
    private static final int WORKER_STATE_SHUTDOWN = 2;

    /**
     * 0 - init, 1 - started, 2 - shut down
     */
    @SuppressWarnings({
   "unused", "FieldMayBeFinal"})
    private volatile int workerState;

	// 时间槽持续时间
    private final long tickDuration;
    // 时间槽数组
    private final HashedWheelBucket[] wheel;
    // 计算任务应该放到哪个时间槽时使用的掩码
    private final int mask;
    // 线程任务同步工具
    private final CountDownLatch startTimeInitialized = new CountDownLatch(1);
    // 保存任务调度的队列
    private final Queue<HashedWheelTimeout> timeouts = new LinkedBlockingQueue<>();
    // 已取消的任务调度队列
    private final Queue<HashedWheelTimeout> cancelledTimeouts = new LinkedBlockingQueue<>();
    // 等待中的任务调度数量
    private final AtomicLong pendingTimeouts = new AtomicLong(0);
    // 最大等待任务调度数量
    private final long maxPendingTimeouts;
    // 时间轮的初始时间
    private volatile long startTime;

可能有部分参数的作用看不太懂，结合下文就可以看懂了。首先就看一下这个方法的构造器吧。

/**
     * Creates a new timer.
     *
     * @param threadFactory      a {@link ThreadFactory} that creates a
     *                           background {@link Thread} which is dedicated to
     *                           {@link TimerTask} execution.
     * @param tickDuration       the duration between tick
     * @param unit               the time unit of the {@code tickDuration}
     * @param ticksPerWheel      the size of the wheel
     * @param maxPendingTimeouts The maximum number of pending timeouts after which call to
     *                           {@code newTimeout} will result in
     *                           {@link java.util.co

最低0.47元/天解锁文章

dbqb007

关注

5
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
Dubbo/Netty中时间轮算法的原理

在Dubbo中，为增强系统的容错能力，在很多地方需要用到只需进行一次执行的任务调度。比如RPC调用的超时机制的实现，消费者需要各个RPC调用是否超时，如果超时会将超时结果返回给应用层。在Dubbo最开始的实现中，是采用将所有的返回结果（DefaultFuture）都放入一个集合中，并且通过一个定时任务，每隔一定时间间隔就扫描所有的future，逐个判断是否超时。这样的实现方式实现起来比较简单，但...
复制链接

扫一扫