Kafka中的时间轮调度算法

问题背景

在kafka生产者发送数据过程中,源源不断的产生数据,然后发给kafka。我们考虑这样一种场景,数据产生速率时快时慢,我们是不是产生一条发送一条呢?很显然,在高吞吐数据场景下,这样的作法,IO性能不高,没有很好的利用socket缓存。那么是不是一定要积累到一定量再发数据呢?这在低吞吐场景下,数据时效性又没法保证。因此,我们一般结合数据量和时间两个因素来考量数据发送的时机。除了这种场景外,还有许多场景,需要用到定时器,在java传统的Timer定时器,定时比较固定刻板,不够灵活,而且效率也不高。因此在kafka中,舍去了这种,采用了一种非常灵活高效的定时器算法-分层时间轮调度算法

算法概述

在kafka中,任何一个时刻,都会产生许许多多的时间调度任务,要做到精准无误的调度他们,很不容易,分层时间轮调度算法,正为高效解决此类问题而生。

kafka源代码中,TimingWheel类详细描述了该算法,我把源码中算法原理描述放到附录里面,这里我按照自己的理解来描述

1 先定义一个拥有100个时间格的轮子,依次编号为1-100, 每个时间格的时间跨度为1ms,  还有一个指向时间轮格子的当前时刻指针,即当前时刻指针,每1ms都会走1个时间格,当前时刻指针走到哪个格子,这个时间格上的任务,都会被取出并执行,比如,当前时刻指针已经走到了20ms, 如果20ms的时间格上面有调度任务,则会被全部取出执行。这时,如果来了一个需要延迟20ms执行的任务,这时我们把该任务插入到40ms的时间格上即可。这是100ms以内,并且没有溢出的情况

2 假如当前时刻指针已经走到了90ms的时间格,恰好此时来了一个需要延迟30ms执行的任务,这时我们按照如下算法,将其插入时间轮

    ( 90 + 30 ) % 100[时间轮的总大小]  = 20

根据计算结果,我们只需将其插入20ms的时间格,因为当前时刻指针走完100ms时,又会回到1ms开始往下走。这时100ms以内,但是产生了溢出的情况 

3 假如当前时刻指针走到了20ms,此时来了一个需要需要延迟10020ms的任务, 这时就没法简单的用 (20+10020)% 100 = 40 来算了。因为下次指针走到40ms时,还远远没到10020ms的时间呢。此时需要引入层次时间轮的层次概念了。我们引入第二层时间轮。这个时间轮也是100个时间格子,但是与之前第一层不同的是,这个时间轮每个时间格的时间跨度刚好是第一层时间轮的总大小,即100ms,那么该时间轮格子的编号是这样,100,200,...,9900,10000 这个时间轮的总大小为10000ms, 

因为 (10020 - 10000) = 20 < 100,则可通过组合一二层时间轮即可组成该延迟调度。这时,我们这样处理,在第二层时间轮的10000ms的这个时间格插入这个延迟任务,等到这个10000ms时间格到期时,将其取出,插入第一层时间轮的当前时刻指针位置往后20个时间格上(注意溢出的情况,情况2已经分析)

4 如果有需要,则可引入更高层次的时间轮,每一层的时间轮时间格跨度都是上一层时间轮的总大小,则4层时间轮已经可以安排1亿ms以内的任何时间调度了,每次增加删除时间调度任务的复杂度都是O(1)

附录

Kafka中TimingWheel源码关于时间轮的讲解

/*
 * Hierarchical Timing Wheels
 *
 * A simple timing wheel is a circular list of buckets of timer tasks. Let u be the time unit.
 * A timing wheel with size n has n buckets and can hold timer tasks in n * u time interval.
 * Each bucket holds timer tasks that fall into the corresponding time range. At the beginning,
 * the first bucket holds tasks for [0, u), the second bucket holds tasks for [u, 2u), …,
 * the n-th bucket for [u * (n -1), u * n). Every interval of time unit u, the timer ticks and
 * moved to the next bucket then expire all timer tasks in it. So, the timer never insert a task
 * into the bucket for the current time since it is already expired. The timer immediately runs
 * the expired task. The emptied bucket is then available for the next round, so if the current
 * bucket is for the time t, it becomes the bucket for [t + u * n, t + (n + 1) * u) after a tick.
 * A timing wheel has O(1) cost for insert/delete (start-timer/stop-timer) whereas priority queue
 * based timers, such as java.util.concurrent.DelayQueue and java.util.Timer, have O(log n)
 * insert/delete cost.
 *
 * A major drawback of a simple timing wheel is that it assumes that a timer request is within
 * the time interval of n * u from the current time. If a timer request is out of this interval,
 * it is an overflow. A hierarchical timing wheel deals with such overflows. It is a hierarchically
 * organized timing wheels. The lowest level has the finest time resolution. As moving up the
 * hierarchy, time resolutions become coarser. If the resolution of a wheel at one level is u and
 * the size is n, the resolution of the next level should be n * u. At each level overflows are
 * delegated to the wheel in one level higher. When the wheel in the higher level ticks, it reinsert
 * timer tasks to the lower level. An overflow wheel can be created on-demand. When a bucket in an
 * overflow bucket expires, all tasks in it are reinserted into the timer recursively. The tasks
 * are then moved to the finer grain wheels or be executed. The insert (start-timer) cost is O(m)
 * where m is the number of wheels, which is usually very small compared to the number of requests
 * in the system, and the delete (stop-timer) cost is still O(1).
 *
 * Example
 * Let's say that u is 1 and n is 3. If the start time is c,
 * then the buckets at different levels are:
 *
 * level    buckets
 * 1        [c,c]   [c+1,c+1]  [c+2,c+2]
 * 2        [c,c+2] [c+3,c+5]  [c+6,c+8]
 * 3        [c,c+8] [c+9,c+17] [c+18,c+26]
 *
 * The bucket expiration is at the time of bucket beginning.
 * So at time = c+1, buckets [c,c], [c,c+2] and [c,c+8] are expired.
 * Level 1's clock moves to c+1, and [c+3,c+3] is created.
 * Level 2 and level3's clock stay at c since their clocks move in unit of 3 and 9, respectively.
 * So, no new buckets are created in level 2 and 3.
 *
 * Note that bucket [c,c+2] in level 2 won't receive any task since that range is already covered in level 1.
 * The same is true for the bucket [c,c+8] in level 3 since its range is covered in level 2.
 * This is a bit wasteful, but simplifies the implementation.
 *
 * 1        [c+1,c+1]  [c+2,c+2]  [c+3,c+3]
 * 2        [c,c+2]    [c+3,c+5]  [c+6,c+8]
 * 3        [c,c+8]    [c+9,c+17] [c+18,c+26]
 *
 * At time = c+2, [c+1,c+1] is newly expired.
 * Level 1 moves to c+2, and [c+4,c+4] is created,
 *
 * 1        [c+2,c+2]  [c+3,c+3]  [c+4,c+4]
 * 2        [c,c+2]    [c+3,c+5]  [c+6,c+8]
 * 3        [c,c+8]    [c+9,c+17] [c+18,c+18]
 *
 * At time = c+3, [c+2,c+2] is newly expired.
 * Level 2 moves to c+3, and [c+5,c+5] and [c+9,c+11] are created.
 * Level 3 stay at c.
 *
 * 1        [c+3,c+3]  [c+4,c+4]  [c+5,c+5]
 * 2        [c+3,c+5]  [c+6,c+8]  [c+9,c+11]
 * 3        [c,c+8]    [c+9,c+17] [c+8,c+11]
 *
 * The hierarchical timing wheels works especially well when operations are completed before they time out.
 * Even when everything times out, it still has advantageous when there are many items in the timer.
 * Its insert cost (including reinsert) and delete cost are O(m) and O(1), respectively while priority
 * queue based timers takes O(log N) for both insert and delete where N is the number of items in the queue.
 *
 * This class is not thread-safe. There should not be any add calls while advanceClock is executing.
 * It is caller's responsibility to enforce it. Simultaneous add calls are thread-safe.
 */

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值