时间轮

wuychn

已于 2023-03-02 14:26:38 修改

阅读量1.1k

点赞数 1

分类专栏： Java 文章标签：算法

于 2022-04-29 15:29:04 首次发布

本文链接：https://blog.csdn.net/qmqm011/article/details/124495387

版权

Java 专栏收录该内容

25 篇文章 2 订阅

订阅专栏

时间轮是一个高性能，低消耗的数据结构，它适合用非准实时，短平快的延迟任务，例如心跳检测、话/请求是否超时、消息延迟推送、业务场景超时取消(订单、退款单等)。在netty和kafka中都有使用。

比如Netty动辄管理100w+的连接，每一个连接都会有很多超时任务。比如发送超时、心跳检测间隔等，如果每一个定时任务都启动一个Timer，不仅低效，而且会消耗大量的资源。

在Netty中的一个典型应用场景是判断某个连接是否idle，如果idle（如客户端由于网络原因导致到服务器的心跳无法送达），则服务器会主动断开连接，释放资源。得益于Netty NIO的优异性能，基于Netty开发的服务器可以维持大量的长连接，单台8核16G的云主机可以同时维持几十万长连接，及时掐掉不活跃的连接就显得尤其重要。

什么是时间轮

直接上图：

上面是一张时间轮的示意图，可以看到，这个时间轮就像一个钟表一样，它有刻度，图中画了9个格子，每个格子表示时间精度，比如每个格子表示1s，那么转一圈就是9s，对于钟表上的秒针来说它的最小刻度是1s，秒针转一圈就是60s。时间轮上每个格子储存了一个双向链表，用于记录定时任务，当指针转到对应的格子的时候，会检查对应的任务是否到期，如果到期就会执行链条上的任务。

为什么使用时间轮

我认为这个世界上任何事物的出现都有它的原因，只是大部分事物我们都无法找到它的原因而已，好在技术的出现是有一定规律的，要么是性能上的提高，要么就是易用性，时间轮也不例外。假设给你一批任务，每个任务都有它的执行时间点，时间精确到秒，你会怎么去实现它？

启动一个线程，每秒轮询每个任务，用当前时间与任务的年，月，日，时，分，秒匹配，能匹配的扔到线程池中执行。

优点：实现简单

缺点：每秒都要遍历所有的任务，对每个任务做匹配，对于很多还没有到时间的任务，做了无用功，当数据量大的时候会导致任务执行延时，对于这种情况也可以考虑多个轮询线程分批执行的方案

根据执行时间采用小顶堆的排序算法。

优点：无需轮询每个任务，只要取出第一个节点判断是否到期即可，如果时间未到期，线程wait

缺点：数据量大的时候，插入到一个已经排好序的小顶堆，时间复杂度为O(lgn)，当然也还好，即使是2的30次方个数据，也就是30次，但是有一个与第一种方法相同的问题，那就是任务可能会导致延时，这种也可以通过分批来做优化。

上面的两种方式都有一个共同点，就是对任务没有分组，那我们给他们分个组，比如任务
里面有一个延时最大的任务的执行时间是100s，那么我们可以创建一个长度为100的数组，相同时间执行的任务放在一起，变成下面这样：

可以看到这个数组被分成了100个格子，每个格子表示1s，相同执行时间的任务被放在了一起，组成了一个链表，此时启动一个每秒执行的线程，每秒走一个格子，如果格子里有任务那么扔到线程池中执行。那如果我最大的延时任务是上万秒以后，那是不是就得创建一个上万长度的数组啊？是的，这样的话就会导致一些问题，如果中间好多格子都没有任务，着实挺浪费空间的，那么怎么改进呢？这个时候时间轮就呼之欲出了，下面就是时间轮的表演时间了。

我们固定数组的长度为60个格子，每个格子的精度为1s，那么一圈就是60s，如果我有3个任务A、B、C，他们相对于启动轮询线程开始走第一个格子的时间差分别为3s，50s，55s，那么其对应的格子为：

轮询线程只要走到对应A、B、C的格子就可以执行它们了，但是如果我有一个一万秒之后执行的任务D，该怎么办呢？首先我们可以计算下走一万秒，轮询线程需要走166圈，还余40s，那么这个任务我们可以增加额外的属性用于记录圈数，任务存放在第40个格子上：

也就是说我这个轮询线程从第一个格子到达最后一个格子，再从第一个格子再到最后一个格子，周而复始，只要在第40个格子上遇见D任务167次就可以执行这个任务了。

看起来挺完美的，但是由于精度小，格子固定，当任务非常多的时候，每个格子上的链表将会变得很长，任务的执行将可能会延时，那怎么办？我们都知道，我们的钟表，除了秒针之外，还有分针，时针。那么我们能不能再定义一个精度不同的时间轮呢？当然是可以的，假设我有一个任务E是在某点某分某秒执行，那么我们可以定义三个时间轮，分别是秒时间轮，分时间轮，小时时间轮：

秒时间轮：总共60个格子，每格1s

分时间轮：总共60个格子，每格1分钟

时时间轮：总共24个格子，每格1小时

现在假设上面三个时间轮启动时间都是startTime，用totalTick变量表示总格子数，tick表示当前指针走到的格子位置，tickDuration表示每个格子的精度，那么对于一个任务，怎么计算其圈数和所在下标的位置呢？计算方式如下：


    //计算任务执行点相对于时间轮启动时间的差值
    duration = 任务执行时间点 - 时间轮启动时间点startTime
    //计算从时间轮启动点到达执行点需要走多少格子
    needTicks = duration / tickDuration
    //减去已经走过的格子数，计算指针还需要走多少个格子
    remainTicks = duration / tickDuration - tick
    //计算还需要走多少圈
    remainRounds（圈数）= remainTicks / totalTick
    //如果提交的任务是过时的，比如我的任务执行点比当前时间点还小，那这种任务属于超时未执行任务，needTicks势必比tick小，那么需要尽早执行
    ticks = Math.max(needTicks, tick)
    //求余，计算存放任务的下标，ticks是一直往上递增的，为了性能考虑，这个totalTick会膨胀为2的指数次幂
    index = ticks % totalTick

假设上面三个时间轮启动的时间一样并且我们的任务E计算出来的duration为24小时30分20秒，那么首先这个任务E会存放在时时间轮的第24个格子上，等时时间轮走到第24个格子后，会将这个任务E降级存放到分时间轮的第30个格子上，等分时间轮也走到第30个格子之后，又会把任务E存放到秒时间轮的第20个格子上，等秒时间轮走到第20个格子上之后就会执行任务，我们管这种时间轮叫做层级时间轮。

Netty中时间轮的实现

Netty中的时间轮不是层级的时间轮，它通过圈数判断任务是否已经到达执行时间。

示例

    private static CountDownLatch countDownLatch = new CountDownLatch(1);

    public static void main(String[] args) throws Exception {
        // tickDuration: 格子的精度
        // ticksPerWheel：格子数
        HashedWheelTimer timer = new HashedWheelTimer(1, TimeUnit.SECONDS, 60);
        System.out.println("下单成功，当前时间：" + System.currentTimeMillis());
        // 添加任务
        timer.newTimeout((timeout) -> {
            System.out.println("超时未支付，订单取消，当前时间：" + System.currentTimeMillis());
            countDownLatch.countDown();
        }, 30, TimeUnit.SECONDS);
        countDownLatch.await();
        timer.stop();
    }

类图

HashedWheelTimer：时间轮对象
HashedWheelBucket：HashedWheelTimer的内部类，表示时间轮上的每个格子，在HashedWheelTimer里面通过HashedWheelBucket[] wheel表示多个格子。HashedWheelBucket记录了头、尾任务。
Worker：时间轮的轮询线程，理解为指针即可。
HashedWheelTimeout：延时任务，封装了任务，相对于时间轮启动的延迟时间、任务剩余圈数、前一个任务、后一个任务、当前任务所属的时间轮、当前任务所属时间轮的哪个bucket等信息。

以上几个类的关系如图所示：

Timer

注意是io.netty.util包下的，不是java.util包下的：

public interface Timer {

    /**
     * Schedules the specified {@link TimerTask} for one-time execution after
     * the specified delay.
     *
     * @return a handle which is associated with the specified task
     *
     * @throws IllegalStateException       if this timer has been {@linkplain #stop() stopped} already
     * @throws RejectedExecutionException if the pending timeouts are too many and creating new timeout
     *                                    can cause instability in the system.
     */
    Timeout newTimeout(TimerTask task, long delay, TimeUnit unit);

    /**
     * Releases all resources acquired by this {@link Timer} and cancels all
     * tasks which were scheduled but not executed yet.
     *
     * @return the handles associated with the tasks which were canceled by
     *         this method
     */
    Set<Timeout> stop();
}

newTimeout：添加任务，任务会在指定的延时之后一次性执行

stop：关闭时间轮，释放资源、取消所有未执行的任务

思考：既然有stop方法用来关闭时间轮，为什么没有start方法启动时间轮？为什么这样设计？

HashedWheelTimer

它有多个重载的构造函数，但是最终调用的都是如下的构造函数：

    public HashedWheelTimer(
            ThreadFactory threadFactory,
            long tickDuration, TimeUnit unit, int ticksPerWheel, boolean leakDetection,
            long maxPendingTimeouts, Executor taskExecutor) {

        checkNotNull(threadFactory, "threadFactory");
        checkNotNull(unit, "unit");
        checkPositive(tickDuration, "tickDuration");
        checkPositive(ticksPerWheel, "ticksPerWheel");
        this.taskExecutor = checkNotNull(taskExecutor, "taskExecutor");

        // Normalize ticksPerWheel to power of two and initialize the wheel.
        // 创建时间轮
        wheel = createWheel(ticksPerWheel);
        mask = wheel.length - 1;

        // Convert tickDuration to nanos.
        long duration = unit.toNanos(tickDuration);

        // Prevent overflow.
        if (duration >= Long.MAX_VALUE / wheel.length) {
            throw new IllegalArgumentException(String.format(
                    "tickDuration: %d (expected: 0 < tickDuration in nanos < %d",
                    tickDuration, Long.MAX_VALUE / wheel.length));
        }

        if (duration < MILLISECOND_NANOS) {
            logger.warn("Configured tickDuration {} smaller than {}, using 1ms.",
                        tickDuration, MILLISECOND_NANOS);
            this.tickDuration = MILLISECOND_NANOS;
        } else {
            this.tickDuration = duration;
        }

        workerThread = threadFactory.newThread(worker);

        leak = leakDetection || !workerThread.isDaemon() ? leakDetector.track(this) : null;

        this.maxPendingTimeouts = maxPendingTimeouts;

        if (INSTANCE_COUNTER.incrementAndGet() > INSTANCE_COUNT_LIMIT &&
            WARNED_TOO_MANY_INSTANCES.compareAndSet(false, true)) {
            reportTooManyInstances();
        }
    }

一些重要的参数如下：

threadFactory：创建线程的工厂

tickDuration：每个格子的精度

ticksPerWheel：时间轮的格子数

构造器中主要对一些参数做了些校验和处理，然后创建时间轮：

    private static HashedWheelBucket[] createWheel(int ticksPerWheel) {
        //ticksPerWheel may not be greater than 2^30
        checkInRange(ticksPerWheel, 1, 1073741824, "ticksPerWheel");

        // 将ticksPerWheel修改为2的指数幂，只要是考虑使用&替代%求余提升性能
        ticksPerWheel = normalizeTicksPerWheel(ticksPerWheel);
        HashedWheelBucket[] wheel = new HashedWheelBucket[ticksPerWheel];
        for (int i = 0; i < wheel.length; i ++) {
            wheel[i] = new HashedWheelBucket();
        }
        return wheel;
    }

    private static int normalizeTicksPerWheel(int ticksPerWheel) {
        int normalizedTicksPerWheel = 1;
        while (normalizedTicksPerWheel < ticksPerWheel) {
            normalizedTicksPerWheel <<= 1;
        }
        return normalizedTicksPerWheel;
    }

可以看到，源码中对传入的ticksPerWheel做了修改，改为了2的指数幂，这是为何？然后创建了HashedWheelBucket[] wheel，这个数组就代表时间轮中的格子。后面的代码继续做了一些转换、校验操作。

提交延时任务

我们的例子中提交任务的代码是：

timer.newTimeout((timeout) -> {
            System.out.println("超时未支付，订单取消，当前时间：" + System.currentTimeMillis());
            countDownLatch.countDown();
        }, 30, TimeUnit.SECONDS);

这个任务会在30秒之后执行，执行的内容就是打印一句话，然后唤醒主线程。

newTimeout的源码如下：

    @Override
    public Timeout newTimeout(TimerTask task, long delay, TimeUnit unit) {
        checkNotNull(task, "task");
        checkNotNull(unit, "unit");

        long pendingTimeoutsCount = pendingTimeouts.incrementAndGet();

        if (maxPendingTimeouts > 0 && pendingTimeoutsCount > maxPendingTimeouts) {
            pendingTimeouts.decrementAndGet();
            throw new RejectedExecutionException("Number of pending timeouts ("
                + pendingTimeoutsCount + ") is greater than or equal to maximum allowed pending "
                + "timeouts (" + maxPendingTimeouts + ")");
        }

        start();

        // Add the timeout to the timeout queue which will be processed on the next tick.
        // During processing all the queued HashedWheelTimeouts will be added to the correct HashedWheelBucket.
        long deadline = System.nanoTime() + unit.toNanos(delay) - startTime;

        // Guard against overflow.
        if (delay > 0 && deadline < 0) {
            deadline = Long.MAX_VALUE;
        }
        HashedWheelTimeout timeout = new HashedWheelTimeout(this, task, deadline);
        timeouts.add(timeout);
        return timeout;
    }

可以看到，提交的任务是TimerTask类型，并且指定了任务的延迟时间。

提交任务的时候，会判断时间轮是否已经启动，如果没有启动那么会启动时间轮。之所以把启动时间轮的操作放到添加任务的方法中，是为了避免没有任务时时间轮空转。那么，这里会有多线程的问题吗？

start方法的源码如下：

    public void start() {
        switch (WORKER_STATE_UPDATER.get(this)) {
            case WORKER_STATE_INIT:
                if (WORKER_STATE_UPDATER.compareAndSet(this, WORKER_STATE_INIT, WORKER_STATE_STARTED)) {
                    workerThread.start();
                }
                break;
            case WORKER_STATE_STARTED:
                break;
            case WORKER_STATE_SHUTDOWN:
                throw new IllegalStateException("cannot be started once stopped");
            default:
                throw new Error("Invalid WorkerState");
        }

        // Wait until the startTime is initialized by the worker.
        while (startTime == 0) {
            try {
                startTimeInitialized.await();
            } catch (InterruptedException ignore) {
                // Ignore - it will be ready very soon.
            }
        }
    }

可以看到，它是通过 WORKER_STATE_UPDATER.get(this) 来获取时间轮的状态进行判断，如果是WORKER_STATE_INIT状态，那么就会通过CAS操作去更新为WORKER_STATE_STARTED，并且启动轮询线程（workerThread.start()），而如果处于其他状态则会做其他相应的处理。

WORKER_STATE_UPDATER 是一个 AtomicIntegerFieldUpdater 类型的变量：

    private static final AtomicIntegerFieldUpdater<HashedWheelTimer> WORKER_STATE_UPDATER =
            AtomicIntegerFieldUpdater.newUpdater(HashedWheelTimer.class, "workerState");

它会通过反射去修改HashedWheelTimer的 workerState 变量的值，workerState 变量是 volatile 的：

    public static final int WORKER_STATE_INIT = 0;
    public static final int WORKER_STATE_STARTED = 1;
    public static final int WORKER_STATE_SHUTDOWN = 2;
    @SuppressWarnings({"unused", "FieldMayBeFinal"})
    private volatile int workerState; // 0 - init, 1 - started, 2 - shut down

因此，这里启动时间轮是不会有线程安全问题的。

回到添加任务的newTimeout方法，启动后继续计算当前任务执行时间相对于时间轮启动时间的差值，以便轮询线程去计算下标位置和圈数：

        // Add the timeout to the timeout queue which will be processed on the next tick.
        // During processing all the queued HashedWheelTimeouts will be added to the correct HashedWheelBucket.
        long deadline = System.nanoTime() + unit.toNanos(delay) - startTime;

接下来，对任务执行时间做了判断，如果当前时间已经超过了任务的执行时间，那么将执行时间调整为Long.MAX_VALUE，防止溢出。

最后，将提交的任务封装为了HashedWheelTimeout对象，然后将任务添加到任务队列中，在处理过程中，所有排队的HashedWheelTimeout将被添加到正确的HashedWheelBucket中。

HashedWheelTimeout timeout = new HashedWheelTimeout(this, task, deadline);
timeouts.add(timeout);

任务队列是多生产者、单消费者的队列：

private final Queue<HashedWheelTimeout> timeouts = PlatformDependent.newMpscQueue();
private final Queue<HashedWheelTimeout> cancelledTimeouts = PlatformDependent.newMpscQueue();

轮询线程

轮询线程是HashedWheelTimer的内部类Worker类，它实现了Runnable接口，它的run方法如下：

        @Override
        public void run() {
            // Initialize the startTime.
            startTime = System.nanoTime();
            if (startTime == 0) {
                // We use 0 as an indicator for the uninitialized value here, so make sure it's not 0 when initialized.
                startTime = 1;
            }

            // Notify the other threads waiting for the initialization at start().
            startTimeInitialized.countDown();

            do {
                // 睡眠一个格子的时间，返回值是当前时间相对于时间轮启动过去的时间
                final long deadline = waitForNextTick();
                if (deadline > 0) {
                    // mask = wheel.length - 1
                    // tick是已经走过的格子数
                    // 前面提到，源码中对ticksPerWheel的值做了修改，改为了2的指数幂，这是为了使用&符号来代替%取余，提升执行效率，在HahMap中也有类似的操作
                    // 这里是计算当前指针指向的时间轮的格子下标
                    int idx = (int) (tick & mask);
                    // 移除已经被取消的任务
                    processCancelledTasks();
                    // 得到当前指针指向的任务链表
                    HashedWheelBucket bucket =
                            wheel[idx];
                    // 将队列中的任务添加到bucket上
                    transferTimeoutsToBuckets();
                    bucket.expireTimeouts(deadline);
                    tick++;
                }
            } while (WORKER_STATE_UPDATER.get(HashedWheelTimer.this) == WORKER_STATE_STARTED);

            // Fill the unprocessedTimeouts so we can return them from stop() method.
            for (HashedWheelBucket bucket: wheel) {
                bucket.clearTimeouts(unprocessedTimeouts);
            }
            for (;;) {
                HashedWheelTimeout timeout = timeouts.poll();
                if (timeout == null) {
                    break;
                }
                if (!timeout.isCancelled()) {
                    unprocessedTimeouts.add(timeout);
                }
            }
            processCancelledTasks();
        }

重点是在do....while循环中，只要时间轮的状态是WORKER_STATE_STARTED，就会一直轮询。waitForNextTick()方法会睡眠一个格子的时间，比如每个格子1s，那么sleep一秒：

        private long waitForNextTick() {
            // tickDuration 表示每走一个格子需要的时间
            // tick表示总共走过的格子数，一直递增
            long deadline = tickDuration * (tick + 1);

            for (;;) {
                // 当前时间相对于时间轮启动过去的时间
                final long currentTime = System.nanoTime() - startTime;
                long sleepTimeMs = (deadline - currentTime + 999999) / 1000000;

                if (sleepTimeMs <= 0) {
                    if (currentTime == Long.MIN_VALUE) {
                        return -Long.MAX_VALUE;
                    } else {
                        return currentTime;
                    }
                }

                // Check if we run on windows, as if thats the case we will need
                // to round the sleepTime as workaround for a bug that only affect
                // the JVM if it runs on windows.
                //
                // See https://github.com/netty/netty/issues/356
                if (PlatformDependent.isWindows()) {
                    sleepTimeMs = sleepTimeMs / 10 * 10;
                    if (sleepTimeMs == 0) {
                        sleepTimeMs = 1;
                    }
                }

                try {
                    // 睡眠一段时间
                    Thread.sleep(sleepTimeMs);
                } catch (InterruptedException ignored) {
                    if (WORKER_STATE_UPDATER.get(HashedWheelTimer.this) == WORKER_STATE_SHUTDOWN) {
                        return Long.MIN_VALUE;
                    }
                }
            }
        }

接下来计算当前指针指向的时间轮格子的下标，这里就是为什么在创建时间轮的时候，要对ticksPerWheel进行调整，主要是为了使用&代替%取余，把ticksPerWheel调整为2的指数幂，这两个计算结果才相等。

int idx = (int) (tick & mask);

然后移除已经被取消的任务（processCancelledTasks()），然后得到时间轮上当前格子的HashedWheelBucket对象（HashedWheelBucket bucket = wheel[idx]），它其实就是任务链表，并且通过transferTimeoutsToBuckets()方法将任务队列上的任务添加到HashedWheelBucket上（这一步的目的是什么？）：

        private void transferTimeoutsToBuckets() {
            // transfer only max. 100000 timeouts per tick to prevent a thread to stale the workerThread when it just
            // adds new timeouts in a loop.
            // 每个bucket最多100000个任务
            for (int i = 0; i < 100000; i++) {
                HashedWheelTimeout timeout = timeouts.poll();
                if (timeout == null) {
                    // all processed
                    break;
                }
                if (timeout.state() == HashedWheelTimeout.ST_CANCELLED) {
                    // Was cancelled in the meantime.
                    continue;
                }

                // 计算剩余的圈数 
                // timeout.deadline 是任务相对于时间轮启动时间的时间
                // tickDuration 是时间轮指针走一个格子需要的时间
                long calculated = timeout.deadline / tickDuration;
                timeout.remainingRounds = (calculated - tick) / wheel.length;

                final long ticks = Math.max(calculated, tick); // Ensure we don't schedule for past.
                int stopIndex = (int) (ticks & mask);

                // 将任务添加到链表的指定下标
                HashedWheelBucket bucket = wheel[stopIndex];
                bucket.addTimeout(timeout);
            }
        }

transferTimeoutsToBuckets()方法的作用就是将任务队列中的任务转移到时间轮的链表上，链表上可能已经有任务了，但是这里还是需要去将队列中的任务放到链表上，因为新任务提交时是放到队列中的，而没有放到链表上。

最后，去执行时间轮上的任务（bucket.expireTimeouts(deadline)），expireTimeouts(dead)方法的源码如下：

        public void expireTimeouts(long deadline) {
            HashedWheelTimeout timeout = head;

            // process all timeouts
            // 循环变量链表上的任务，直到将所有已经到达执行时间的任务执行完成（只是启动执行，并不会等待任务执行结束）
            while (timeout != null) {
                HashedWheelTimeout next = timeout.next;
                // 判断剩余圈数，只有当剩余圈数小于等于0时才会执行
                if (timeout.remainingRounds <= 0) {
                    // 将任务从时间轮的链表上移除
                    next = remove(timeout);
                    // 判断时间是否到达
                    if (timeout.deadline <= deadline) {
                        // 执行任务
                        timeout.expire();
                    } else {
                        // The timeout was placed into a wrong slot. This should never happen.
                        throw new IllegalStateException(String.format(
                                "timeout.deadline (%d) > deadline (%d)", timeout.deadline, deadline));
                    }
                } else if (timeout.isCancelled()) {
                    next = remove(timeout);
                } else {
                    // 如果没有到达指定圈数，那么将圈数减一
                    timeout.remainingRounds --;
                }
                timeout = next;
            }
        }

Kafka中的时间轮实现

wuychn

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
时间轮

什么是时间轮直接上图：上面是一张时间轮的示意图，可以看到，这个时间轮就像一个钟表一样，它有刻度，图中画了9个格子，每个格子表示时间精度，比如每个格子表示1s，那么转一圈就是9s，对于钟表上的秒针来说它的最小刻度是1s，秒针转一圈就是60s。时间轮上每个格子储存了一个双向链表，用于记录定时任务，当指针转到对应的格子的时候，会检查对应的任务是否到期，如果到期就会执行链条上的任务。为什么使用时间轮我认为这个世界上任何事物的出现都有它的原因，只是大部分事物我们都无法找到它的原因而已，好在技术的出
复制链接

扫一扫