Kafka中的时间轮调度算法

最新推荐文章于 2024-07-19 14:45:32 发布

dinghua_xuexi

最新推荐文章于 2024-07-19 14:45:32 发布

阅读量898

点赞数

分类专栏： kafka 文章标签： kafka

本文链接：https://blog.csdn.net/dinghua_xuexi/article/details/108998413

版权

kafka 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

问题背景

在kafka生产者发送数据过程中，源源不断的产生数据，然后发给kafka。我们考虑这样一种场景，数据产生速率时快时慢，我们是不是产生一条发送一条呢？很显然，在高吞吐数据场景下，这样的作法，IO性能不高，没有很好的利用socket缓存。那么是不是一定要积累到一定量再发数据呢？这在低吞吐场景下，数据时效性又没法保证。因此，我们一般结合数据量和时间两个因素来考量数据发送的时机。除了这种场景外，还有许多场景，需要用到定时器，在java传统的Timer定时器，定时比较固定刻板，不够灵活，而且效率也不高。因此在kafka中，舍去了这种，采用了一种非常灵活高效的定时器算法-分层时间轮调度算法

算法概述

在kafka中，任何一个时刻，都会产生许许多多的时间调度任务，要做到精准无误的调度他们，很不容易，分层时间轮调度算法，正为高效解决此类问题而生。

kafka源代码中，TimingWheel类详细描述了该算法，我把源码中算法原理描述放到附录里面，这里我按照自己的理解来描述

1 先定义一个拥有100个时间格的轮子，依次编号为1-100, 每个时间格的时间跨度为1ms, 还有一个指向时间轮格子的当前时刻指针，即当前时刻指针，每1ms都会走1个时间格，当前时刻指针走到哪个格子，这个时间格上的任务，都会被取出并执行，比如，当前时刻指针已经走到了20ms, 如果20ms的时间格上面有调度任务，则会被全部取出执行。这时，如果来了一个需要延迟20ms执行的任务，这时我们把该任务插入到40ms的时间格上即可。这是100ms以内，并且没有溢出的情况

2 假如当前时刻指针已经走到了90ms的时间格，恰好此时来了一个需要延迟30ms执行的任务，这时我们按照如下算法，将其插入时间轮

( 90 + 30 ) % 100[时间轮的总大小] = 20

根据计算结果，我们只需将其插入20ms的时间格，因为当前时刻指针走完100ms时，又会回到1ms开始往下走。这时100ms以内，但是产生了溢出的情况

3 假如当前时刻指针走到了20ms，此时来了一个需要需要延迟10020ms的任务, 这时就没法简单的用（20+10020）% 100 = 40 来算了。因为下次指针走到40ms时，还远远没到10020ms的时间呢。此时需要引入层次时间轮的层次概念了。我们引入第二层时间轮。这个时间轮也是100个时间格子，但是与之前第一层不同的是，这个时间轮每个时间格的时间跨度刚好是第一层时间轮的总大小，即100ms，那么该时间轮格子的编号是这样，100,200,...,9900,10000 这个时间轮的总大小为10000ms，

因为 (10020 - 10000) = 20 < 100，则可通过组合一二层时间轮即可组成该延迟调度。这时，我们这样处理，在第二层时间轮的10000ms的这个时间格插入这个延迟任务，等到这个10000ms时间格到期时，将其取出，插入第一层时间轮的当前时刻指针位置往后20个时间格上(注意溢出的情况，情况2已经分析)

4 如果有需要，则可引入更高层次的时间轮，每一层的时间轮时间格跨度都是上一层时间轮的总大小，则4层时间轮已经可以安排1亿ms以内的任何时间调度了，每次增加删除时间调度任务的复杂度都是O(1)

附录

Kafka中TimingWheel源码关于时间轮的讲解

/*
 * Hierarchical Timing Wheels
 *
 * A simple timing wheel is a circular list of buckets of timer tasks. Let u be the time unit.
 * A timing wheel with size n has n buckets and can hold timer tasks in n * u time interval.
 * Each bucket holds timer tasks that fall into the corresponding time range. At the beginning,
 * the first bucket holds tasks for [0, u), the second bucket holds tasks for [u, 2u), …,
 * the n-th bucket for [u * (n -1), u * n). Every interval of time unit u, the timer ticks and
 * moved to the next bucket then expire all timer tasks in it. So, the timer never insert a task
 * into the bucket for the current time since it is already expired. The timer immediately runs
 * the expired task. The emptied bucket is then available for the next round, so if the current
 * bucket is for the time t, it becomes the bucket for [t + u * n, t + (n + 1) * u) after a tick.
 * A timing wheel has O(1) cost for insert/delete (start-timer/stop-timer) whereas priority queue
 * based timers, such as java.util.concurrent.DelayQueue and java.util.Timer, have O(log n)
 * insert/delete cost.
 *
 * A major drawback of a simple timing wheel is that it assumes that a timer request is within
 * the time interval of n * u from the current time. If a timer request is out of this interval,
 * it is an overflow. A hierarchical timing wheel deals with such overflows. It is a hierarchically
 * organized timing wheels. The lowest level has the finest time resolution. As moving up the
 * hierarchy, time resolutions become coarser. If the resolution of a wheel at one level is u and
 * the size is n, the resolution of the next level should be n * u. At each level overflows are
 * delegated to the wheel in one level higher. When the wheel in the higher level ticks, it reinsert
 * timer tasks to the lower level. An overflow wheel can be created on-demand. When a bucket in an
 * overflow bucket expires, all tasks in it are reinserted into the timer recursively. The tasks
 * are then moved to the finer grain wheels or be executed. The insert (start-timer) cost is O(m)
 * where m is the number of wheels, which is usually very small compared to the number of requests
 * in the system, and the delete (stop-timer) cost is still O(1).
 *
 * Example
 * Let's say that u is 1 and n is 3. If the start time is c,
 * then the buckets at different levels are:
 *
 * level    buckets
 * 1        [c,c]   [c+1,c+1]  [c+2,c+2]
 * 2        [c,c+2] [c+3,c+5]  [c+6,c+8]
 * 3        [c,c+8] [c+9,c+17] [c+18,c+26]
 *
 * The bucket expiration is at the time of bucket beginning.
 * So at time = c+1, buckets [c,c], [c,c+2] and [c,c+8] are expired.
 * Level 1's clock moves to c+1, and [c+3,c+3] is created.
 * Level 2 and level3's clock stay at c since their clocks move in unit of 3 and 9, respectively.
 * So, no new buckets are created in level 2 and 3.
 *
 * Note that bucket [c,c+2] in level 2 won't receive any task since that range is already covered in level 1.
 * The same is true for the bucket [c,c+8] in level 3 since its range is covered in level 2.
 * This is a bit wasteful, but simplifies the implementation.
 *
 * 1        [c+1,c+1]  [c+2,c+2]  [c+3,c+3]
 * 2        [c,c+2]    [c+3,c+5]  [c+6,c+8]
 * 3        [c,c+8]    [c+9,c+17] [c+18,c+26]
 *
 * At time = c+2, [c+1,c+1] is newly expired.
 * Level 1 moves to c+2, and [c+4,c+4] is created,
 *
 * 1        [c+2,c+2]  [c+3,c+3]  [c+4,c+4]
 * 2        [c,c+2]    [c+3,c+5]  [c+6,c+8]
 * 3        [c,c+8]    [c+9,c+17] [c+18,c+18]
 *
 * At time = c+3, [c+2,c+2] is newly expired.
 * Level 2 moves to c+3, and [c+5,c+5] and [c+9,c+11] are created.
 * Level 3 stay at c.
 *
 * 1        [c+3,c+3]  [c+4,c+4]  [c+5,c+5]
 * 2        [c+3,c+5]  [c+6,c+8]  [c+9,c+11]
 * 3        [c,c+8]    [c+9,c+17] [c+8,c+11]
 *
 * The hierarchical timing wheels works especially well when operations are completed before they time out.
 * Even when everything times out, it still has advantageous when there are many items in the timer.
 * Its insert cost (including reinsert) and delete cost are O(m) and O(1), respectively while priority
 * queue based timers takes O(log N) for both insert and delete where N is the number of items in the queue.
 *
 * This class is not thread-safe. There should not be any add calls while advanceClock is executing.
 * It is caller's responsibility to enforce it. Simultaneous add calls are thread-safe.
 */