服务限流-CSDN博客

为什么需要限流

保障服务稳定的三大利器：熔断降级、服务限流和故障模拟。今天和大家谈谈限流算法的几种实现方式，本文所说的限流并非是Nginx层面的限流，而是业务代码中的逻辑限流。

与用户打交道的服务

比如web服务、对外API，这种类型的服务有以下几种可能导致机器被拖垮：

用户增长过快（这是好事）
因为某个热点事件（微博热搜）
竞争对象爬虫
恶意的刷单

这些情况都是无法预知的，不知道什么时候会有10倍甚至20倍的流量打进来，如果真碰上这种情况，扩容是根本来不及的（弹性扩容都是虚谈，一秒钟你给我扩一下试试）

对内的RPC服务

一个服务A的接口可能被BCDE多个服务进行调用，在B服务发生突发流量时，直接把A服务给调用挂了，导致A服务对CDE也无法提供服务。这种情况时有发生，解决方案有两种：

每个调用方采用线程池进行资源隔离
使用限流手段对每个调用方进行限流

限流算法实现

常见的限流算法有：计数器、令牌桶、漏桶。

计数器算法

采用计数器实现限流有点简单粗暴，一般我们会限制一秒钟的能够通过的请求数，比如限流qps为100，算法的实现思路就是从第一个请求进来开始计时，在接下去的1s内，每来一个请求，就把计数加1，如果累加的数字达到了100，那么后续的请求就会被全部拒绝。等到1s结束后，把计数恢复成0，重新开始计数。

具体的实现可以是这样的：对于每次服务调用，可以通过 AtomicLong#incrementAndGet()方法来给计数器加1并返回最新值，通过这个最新值和阈值进行比较。

这种实现方式，相信大家都知道有一个弊端：如果我在单位时间1s内的前10ms，已经通过了100个请求，那后面的990ms，只能眼巴巴的把请求拒绝，我们把这种现象称为“突刺现象”

漏桶算法

为了消除"突刺现象"，可以采用漏桶算法实现限流，漏桶算法这个名字就很形象，算法内部有一个容器，类似生活用到的漏斗，当请求进来时，相当于水倒入漏斗，然后从下端小口慢慢匀速的流出。不管上面流量多大，下面流出的速度始终保持不变。

不管服务调用方多么不稳定，通过漏桶算法进行限流，每10毫秒处理一次请求。因为处理的速度是固定的，请求进来的速度是未知的，可能突然进来很多请求，没来得及处理的请求就先放在桶里，既然是个桶，肯定是有容量上限，如果桶满了，那么新进来的请求就丢弃。

在这里插入图片描述
在算法实现方面，可以准备一个队列，用来保存请求，另外通过一个线程池定期从队列中获取请求并执行，可以一次性获取多个并发执行。

这种算法，在使用过后也存在弊端：无法应对短时间的突发流量。

令牌桶算法

从某种意义上讲，令牌桶算法是对漏桶算法的一种改进，桶算法能够限制请求调用的速率，而令牌桶算法能够在限制调用的平均速率的同时还允许一定程度的突发调用。

在令牌桶算法中，存在一个桶，用来存放固定数量的令牌。算法中存在一种机制，以一定的速率往桶中放令牌。每次请求调用需要先获取令牌，只有拿到令牌，才有机会继续执行，否则选择选择等待可用的令牌、或者直接拒绝。

放令牌这个动作是持续不断的进行，如果桶中令牌数达到上限，就丢弃令牌，所以就存在这种情况，桶中一直有大量的可用令牌，这时进来的请求就可以直接拿到令牌执行，比如设置qps为100，那么限流器初始化完成一秒后，桶中就已经有100个令牌了，这时服务还没完全启动好，等启动完成对外提供服务时，该限流器可以抵挡瞬时的100个请求。所以，只有桶中没有令牌时，请求才会进行等待，最后相当于以一定的速率执行。

在这里插入图片描述
实现思路：可以准备一个队列，用来保存令牌，另外通过一个线程池定期生成令牌放到队列中，每来一个请求，就从队列中获取一个令牌，并继续执行。

幸运的是，通过Google开源的guava包，我们可以很轻松的创建一个令牌桶算法的限流器。

Guava

Guava中开源出来一个令牌桶算法的工具类RateLimiter，可以轻松实现限流的工作。RateLimiter对简单的令牌桶算法做了一些工程上的优化，默认的具体的实现是SmoothBursty。也许是出于简单起见，RateLimiter中的时间窗口能且仅能为1S，如果想搞其他时间单位的限流，只能另外造轮子。

RateLimiter有一个有趣的特性是[前人挖坑后人跳]，也就是说RateLimiter允许某次请求拿走了超出剩余令牌数的令牌，但是下一次请求将为此付出代价，一直等到令牌亏空补上，并且桶中有足够本次请求使用的令牌为止。这里面就涉及到一个权衡，是让前一次请求干等到令牌够用才走掉呢，还是让它走掉后面的请求等一等呢？Guava的设计者选择的是后者，先把眼前的活干了，后面的事后面再说。

并且构建了一个自定义注解，方便松耦合，灵活的对服务进行限流。

设计思想

Google的Guava库中提供了一个基于令牌桶算法的限流工具类RateLimiter。在该类的子类SmoothRateLimiter中有一大段关于如何设计RateLimiter的描述，这里大致翻译一下：
如何设计一个限流器，并且为什么这么设计？
限流器最主要的功能是保证一个稳定的速率，这里稳定速率指的是通常情况下的最高速率。这个机制通常通过控制流入的请求来保证，比如对于一个请求，当达到最高速率时，我们需要计算它需要受限制等待的时间，并让它等待直到有权访问为止。
保证QPS为稳定速率的最简单的方式是保存上一个授权请求的时间戳，然后保证在接下来的1/QPS 秒内没有其他请求进入。举例来说，对于QPS=5的需求，如果我们能保证没有请求能够在上个请求之后的200ms内获得授权，那我们就实现了限流。如果有个请求在上个授权请求之后的100ms到来，那么我们需要做的就是让它再等待100ms。以此类推，对于并发的15个请求，总共会花掉3秒钟。
值得注意的一点是，这样的机制只保存了非常少的关于过去的记忆，因为它只需要记录最近一次请求。那么如果在一个请求获得授权之前的很长时间都没有请求时会发生什么呢？限流器会立马忘记关于过去的寂寞（低利用率），而只记录新的请求的时间戳，然后下个请求也只能在这个请求之后的1/QPS时间间隔之后才能获得授权，这显然与我们期望的QPS不太匹配，并最终会导致低利用率或者请求溢出。
过去的低利用率意味着大量的资源空闲，因此，限流器应该加速利用这些空闲资源。比较典型的场景就是在网络带宽上，低利用率意味着有多的缓存空间可以立即使用。
但是在另一方面，一段时间的低利用率也意味着“服务对于新来的请求没有准备好”。这有点隐晦，举个列子，服务的缓存可能陈旧，请求更有可能触发耗时操作。
为了处理这种两难的情况，我们需要添加另外一个衡量维度，通过storedPermits来描述过去的低利用率。当这个变量为0的时候，代表没有低利用率存在，随着低利用率持续增加时，storedPermits能够到达maxStoredPermits。因此，当获取permits令牌发生时，拿到的令牌通常会来自两个部分：

stored permits （过去留存的令牌，当有低利用率存在就能使用）
fresh permits（stored permits之后还有剩余的permits，我们认为他需要新鲜fresh的permit令牌来保证）

我们通过下面这个例子来说明：
我们有一个限流器每秒产生一个令牌，即保证QPS=1。如果有一秒限流器没有请求进来，那么我们对storedPermits加1。假定在过去的10秒内都没有请求进来，那storedPermits就会增加至10（假定maxStoredPermits>10）。这时一个请求到来并申请获取3个令牌，我们可以直接从storedPermits中提出来3个令牌支持，并且storedPermits减少到7。在此之后又来了一个请求申请获取10个令牌，这时候我们从storedPermits中直接提取完剩下来的7个令牌，余下的3个令牌我们需要等待限流器放入3个fresh pertmits才能完成这个请求的授权访问。
我们知道我们的QPS=1，所以我们需要等待3秒才能拿到新的3个fresh permits。那我们拿到7个stored permits需要多少时间呢？根据上面的讨论，这个问题没有一个标准答案。如果我们想加速处理来快速填满过去低利用率带来的损失，那我们肯定希望我们拿到stored permits的速度快于fresh permits，因为低利用率代表了更多空闲资源可以利用。如果我们主要的关注点在防溢出上，那stored permits的提取速度应该要比fresh permits要慢。因此我们需要一个函数来衡量stored permits和受控制等待时间之间的关联。这个函数就是storedPermitsToWaitTime。（后面的描述会比较复杂，这里不进行深入展开了。对于我们一般使用的SmoothBursty，这个函数恒定返回0，即立即获取storedPermits）。
最后，让我们考虑一个场景，有个QPS=1的限流器，当限流器空闲时来了一个请求需要获取100个令牌，这时候我们应该直接等待100秒再开始处理？这样的情况多半会使得结果毫无意义。一种更好的策略是对这个请求放行，就像获取1个令牌一样，然后推迟后续的请求。换句话说，我们允许立即完成对这个请求的授权，然后后续的请求进来就至少得等100s的时间。这保证了请求完成的及时。
这个策略产生了非常重要的结果，就是限流器不会记录最近的请求时间，而是记录下一个请求可用的期望时间。这也保证了我们有能力判断在一个timeout时间段内一个请求是否能够获取到令牌。另一方面，根据这个期望时间，我们可以很好地判断一个限流器的未使用时间，一旦这个期望时间在当前时间之前，那么当前时间与期望时间的差值就是限流器未使用的时长，而这个时长也可以转换到stored permits上（根据前文所述storedPermits随着空闲时间增长）

简单测试

    /**
     * The entry point of application.
     * <p>
     * RateLimiter有一个有趣的特性是[前人挖坑后人跳]，也就是说RateLimiter允许某次请求拿走了超出剩余令牌数的令牌，
     * 但是下一次请求将为此付出代价，一直等到令牌亏空补上，并且桶中有足够本次请求使用的令牌为止。
     * 这里面就涉及到一个权衡，是让前一次请求干等到令牌够用才走掉呢，还是让它走掉后面的请求等一等呢？
     * Guava的设计者选择的是后者，先把眼前的活干了，后面的事后面再说。
     */
    @Test
    public void advanceConsumerTest() {
        //每秒产生2个令牌
        RateLimiter rateLimiter = RateLimiter.create(2);
        //获取令牌，返回获取令牌所需等待的时间，获取太多，导致后面得等亏损的令牌补上才能获取到。
        System.out.println(rateLimiter.acquire(10));
        System.out.println(rateLimiter.tryAcquire(2, 2, TimeUnit.SECONDS));
        System.out.println(rateLimiter.acquire(2));
        System.out.println(rateLimiter.acquire(1));
    }

结果如下,可以看到，RateLimiter每秒只能产生2个令牌，而第一获取10个的话，后面的就需要用5秒的时间补上空缺。

0.0
false
4.994628
0.995124

并发测试

下面通过一个例子，测试100个并发下，限流起到的效果

    @Test
    public void rateLimitTest() throws InterruptedException {
        CountDownLatch countDownLatch = new CountDownLatch(1);
        for (int i = 0; i <= 100; i++) {
            Business business = new Business(countDownLatch);
            business.start();
        }
        countDownLatch.countDown();
        //等待结果处理,有只设了10个令牌，所以，只有10个请求有效。
        TimeUnit.SECONDS.sleep(10);
        System.out.println("所有模拟请求结束  at " + new Date());
    }

    class Business extends Thread {
        CountDownLatch countDownLatch;

        public Business(CountDownLatch latch) {
            this.countDownLatch = latch;
        }

        @Override
        public void run() {
            try {
                countDownLatch.await();
                if (rateLimiterService.tryAcquire()) {
                    //模拟业务
                    TimeUnit.SECONDS.sleep(3);
                    System.out.println("成功处理业务" + new Date());
                } else {
                    System.out.println("系统繁忙！请稍后再试!");
                }
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

运行结果,只有10个请求获取到令牌，成功执行，其他的都直接返回

.....
系统繁忙！请稍后再试!
系统繁忙！请稍后再试!
系统繁忙！请稍后再试!
系统繁忙！请稍后再试!
系统繁忙！请稍后再试!
系统繁忙！请稍后再试!
系统繁忙！请稍后再试!
系统繁忙！请稍后再试!
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
成功处理业务Tue Nov 20 23:45:23 CST 2018
所有模拟请求结束  at Tue Nov 20 23:45:30 CST 2018

注解测试

最后，为了方便日常使用，我还特定的设计了一个自定义注解，返回简单定义达到效果，正所谓偷懒使人进步。

这里贴出基于注解的设计：

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface RateLimiterAnnotation {
    /**
     * 限流服务名
     *
     * @return
     */
    String name();

    /**
     * 每秒限流次数
     *
     * @return
     */
    double count();
}

切面实现类

@Aspect
@Component
public class RateLimiterAnnotationAspect {

    private ConcurrentMap<String, RateLimiter> rateLimiterMap = new ConcurrentHashMap<>();

    /**
     * Before.
     *
     * @param point the point
     */
    @Before("@annotation(com.dashuai.learning.ratelimiter.annotation.RateLimiterAnnotation)")
    public void before(JoinPoint point) {
        RateLimiterAnnotation rateLimiterAnnotation = this.getAnnotation(point, RateLimiterAnnotation.class);
        double rateLimitCount = rateLimiterAnnotation.count();
        String rateLimitName = rateLimiterAnnotation.name();
        if (rateLimiterMap.get(rateLimitName) == null) {
            rateLimiterMap.put(rateLimitName, RateLimiter.create(rateLimitCount));
        }
        rateLimiterMap.get(rateLimitName).acquire();
    }

    private <T extends Annotation> T getAnnotation(JoinPoint pjp, Class<T> clazz) {
        MethodSignature signature = (MethodSignature) pjp.getSignature();
        Method method = signature.getMethod();
        return method.getAnnotation(clazz);
    }
}

使用：

@Service
public class AopTestServiceImpl implements AopTestService {
    @Override
    @RateLimiterAnnotation(name = "v1", count = 5.0)
    public String testRateLimiter(Double count, String context) {
        System.out.println(count + "   " + context);
        return "测试";
    }

    @Override
    @RateLimiterAnnotation(name = "v2", count = 7.0)
    public String testRateLimiterv2(Double count, String context) {
        System.out.println("V2版本发出:" + count + "   " + context);
        return "测试第二个";
    }
}

设计思路较简单，通过一个map存储各个服务的限流数，在通过AOP切面前置判断，达到一个限流效果。

源码分析

Guava有两种限流模式，一种为稳定模式(SmoothBursty:令牌生成速度恒定)，一种为渐进模式(SmoothWarmingUp:令牌生成速度缓慢提升直到维持在一个稳定值)
两种模式实现思路类似，主要区别在等待时间的计算上，本篇重点介绍SmoothBursty。

在调用create接口时，实际实例化的为SmoothBursty类

public static RateLimiter create(double permitsPerSecond) {
    return create(permitsPerSecond, SleepingStopwatch.createFromSystemTimer());
}
 
static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) {
    RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);
    rateLimiter.setRate(permitsPerSecond);
    return rateLimiter;
}

首先我们看下几个比较关键的变量：

 /**
   * The currently stored permits. 目前保存下来的令牌数目
   */
  double storedPermits;

  /**
   * The maximum number of stored permits.最大的令牌保存量，即桶大小
   */
  double maxPermits;

  /**
   * The interval between two unit requests, at our stable rate. E.g., a stable rate of 5 permits
   * per second has a stable interval of 200ms.
   两个请求之间的间隙，也就是添加一个令牌到桶中的时间间隔。
   */
  double stableIntervalMicros;

/**
   * The time when the next request (no matter its size) will be granted. After granting a
   * request, this is pushed further in the future. Large requests push this further than small
   * requests.
   下一个请求能够被授权的期望时间，当一个请求被授权之后（通过acquire可以预定），这个时间会被继续往后推，大令牌量的请求会比少量的请求推的更远。
   */
  private long nextFreeTicketMicros = 0L; // could be either in the past or future 有可能在过去或者将来

这几个变量跟我们之前提到的几个概念息息相关，相信大家还记得。

接下来是几个常用的变量：

/**
   * The underlying timer; used both to measure elapsed time and sleep as necessary. A separate
   * object to facilitate testing.
   底层计时器，通过它来进行时间的计算和线程sleep
   */
  private final SleepingStopwatch stopwatch;

  // Can't be initialized in the constructor because mocks don't call the constructor. 非直接用的互斥锁
  private volatile Object mutexDoNotUseDirectly;
  /**
  双重判定的互斥锁构建过程，线程安全
  **/
  private Object mutex() {
    Object mutex = mutexDoNotUseDirectly;
    if (mutex == null) {
      synchronized (this) {
        mutex = mutexDoNotUseDirectly;
        if (mutex == null) {
          mutexDoNotUseDirectly = mutex = new Object();
        }
      }
    }
    return mutex;
  }

我们从最简单的acquire入手：

/**
   * Acquires a single permit from this {@code RateLimiter}, blocking until the
   * request can be granted. Tells the amount of time slept, if any.
   *
   * <p>This method is equivalent to {@code acquire(1)}.
   *
   * @return time spent sleeping to enforce rate, in seconds; 0.0 if not rate-limited
   * @since 16.0 (present in 13.0 with {@code void} return type})
   */
  public double acquire() {
    return acquire(1);
  }

  /**
   * Acquires the given number of permits from this {@code RateLimiter}, blocking until the
   * request can be granted. Tells the amount of time slept, if any.
   *
   * @param permits the number of permits to acquire
   * @return time spent sleeping to enforce rate, in seconds; 0.0 if not rate-limited
   * @throws IllegalArgumentException if the requested number of permits is negative or zero
   * @since 16.0 (present in 13.0 with {@code void} return type})
   */
  public double acquire(int permits) {
    long microsToWait = reserve(permits);
    stopwatch.sleepMicrosUninterruptibly(microsToWait);
    return 1.0 * microsToWait / SECONDS.toMicros(1L);
  }

这两段代码比较简单，第一个就是把acquire方法调用委托到acquire(1)，第二个稍微复杂一点，首先调用reserve()方法得到获取permits个令牌需要的等待时间，然后通过stopwatch直接无中断地sleep这么长的时间，最后返回等待的时间毫秒数。那我们再深入reserve方法：

  /**
   * Reserves the given number of permits from this {@code RateLimiter} for future use, returning
   * the number of microseconds until the reservation can be consumed.
   *
   * @return time in microseconds to wait until the resource can be acquired, never negative
   */
  final long reserve(int permits) {
    checkPermits(permits);
    synchronized (mutex()) {
      return reserveAndGetWaitLength(permits, stopwatch.readMicros());
    }
  }

首先做一些参数检验，然后获取互斥锁，接着调用reserveAndGetWaitTime，传入需要获取的令牌数和当前的毫秒数。（插句题外话，不得不服google的代码质量，从注释到命名，一目了然）

   /**
   * Reserves next ticket and returns the wait time that the caller must wait for.
   *
   * @return the required wait time, never negative
   */
  final long reserveAndGetWaitLength(int permits, long nowMicros) {
    long momentAvailable = reserveEarliestAvailable(permits, nowMicros);
    return max(momentAvailable - nowMicros, 0);
  }

这一段代码通过调用reserveEarliestAvailable来得到该请求能够获取令牌授权的毫秒时刻，然后通过运算返回得到需要等待的毫秒数，我们继续看reserveEarliestAvailable方法：

   /**
   * Updates {@code storedPermits} and {@code nextFreeTicketMicros} based on the current time.
   * 这个函数功能是在每次请求调用产生时更新限流器的令牌数
   */
  void resync(long nowMicros) {
    // if nextFreeTicket is in the past, resync to now
    // 如果下次能授权的毫秒数在现在的毫秒计数之前
    // 说明这个限流器已经有一段时间没有使用了
    // 需要计算这段时间产生的stored permits
    // 否则说明这段时间限流器一直有请求进来，则不需要更新
    if (nowMicros > nextFreeTicketMicros) {
      //stored permits 最多为maxPermits，
      //大小根据这段空闲时间长度(nowMicros - nextFreeTicketMicros)确定
      storedPermits = min(maxPermits,
          storedPermits
            + (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros());
      //更新nextFreeTiecket为now
      nextFreeTicketMicros = nowMicros;
    }
  }


  /*
    直接返回放入令牌间隔，即 1 / QPS * 1000（毫秒）
  */
  @Override
  double coolDownIntervalMicros() {
      return stableIntervalMicros;
  }


 /*
  在当前场景下，对于storedPermits，我们的策略是立即获取，因此没有wait time，返回0
*/
  @Override
  long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {
      return 0L;
  }

  @Override
  final long reserveEarliestAvailable(int requiredPermits, long nowMicros) {
    // 更新令牌桶
    resync(nowMicros);
    //保存nextFreeTicketMicros
    long returnValue = nextFreeTicketMicros;
    //获取这次能够使用的storedPermits
    double storedPermitsToSpend = min(requiredPermits, this.storedPermits);
    //计算需要等待获取的fresh permits
    double freshPermits = requiredPermits - storedPermitsToSpend;
    //总的等待时间等于storedPermits的等待时间加上fresh permit的等待时间
    //fresh的等待时间就是放入令牌的间隔*fresh permits数目
    long waitMicros = storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)
        + (long) (freshPermits * stableIntervalMicros);
  
    // 增加nextFreeTicketMicros， 这里支持预定
    try {
      this.nextFreeTicketMicros = LongMath.checkedAdd(nextFreeTicketMicros, waitMicros);
    } catch (ArithmeticException e) {
      this.nextFreeTicketMicros = Long.MAX_VALUE;
    }
    //更新stored permits
    this.storedPermits -= storedPermitsToSpend;
    //因为支持预定，所以返回的是这些计算之前nextFreeTicketMicros作为需要wait的时间
    //而不是计算后的
    return returnValue;
  }

这段代码的讲解我大部分已经写在代码注释里面了，需要说明的是，我最开始一直在想按照令牌桶算法的描述，应该有一个定时插入令牌的过程，但是我看了下确实没有多的线程同步机制来做这个事儿，原来Guava中采用了触发式的更新令牌桶机制。原理就是在每次请求到来的时候去完成令牌桶中令牌插入工作和其他属性如nextFreeTicketMicros的更新工作，这样减少了线程使用，节约了资源，并且也简化了操作。这个功能在resync函数代码中完成。需要值得注意的是，因为Guava的实现支持令牌预定功能，即当限流器当前处于空闲状态时，一个大量令牌请求进来的时候，可以提前预授权给他足够的令牌让它能够立即执行，并推迟后续请求的等待时间（如之前所述），因此才会出现nowMicros < nextFreeTicketMicro的情况，而这种情况就说明当前仍处于对于之前一个请求的预授权阶段，不需要更新storedPermits，否则就还是nowMicros >= nextFreeTicketMicro的情况。

看完了acquire的流程，我们再来看tryAcquire的代码:

 /**
   * Acquires a permit from this {@code RateLimiter} if it can be obtained
   * without exceeding the specified {@code timeout}, or returns {@code false}
   * immediately (without waiting) if the permit would not have been granted
   * before the timeout expired.
   *
   * <p>This method is equivalent to {@code tryAcquire(1, timeout, unit)}.
   *
   * @param timeout the maximum time to wait for the permit. Negative values are treated as zero.
   * @param unit the time unit of the timeout argument
   * @return {@code true} if the permit was acquired, {@code false} otherwise
   * @throws IllegalArgumentException if the requested number of permits is negative or zero
   */
  public boolean tryAcquire(long timeout, TimeUnit unit) {
    return tryAcquire(1, timeout, unit);
  }

  /**
   * Acquires permits from this {@link RateLimiter} if it can be acquired immediately without delay.
   *
   * <p>
   * This method is equivalent to {@code tryAcquire(permits, 0, anyUnit)}.
   *
   * @param permits the number of permits to acquire
   * @return {@code true} if the permits were acquired, {@code false} otherwise
   * @throws IllegalArgumentException if the requested number of permits is negative or zero
   * @since 14.0
   */
  public boolean tryAcquire(int permits) {
    return tryAcquire(permits, 0, MICROSECONDS);
  }

  /**
   * Acquires a permit from this {@link RateLimiter} if it can be acquired immediately without
   * delay.
   *
   * <p>
   * This method is equivalent to {@code tryAcquire(1)}.
   *
   * @return {@code true} if the permit was acquired, {@code false} otherwise
   * @since 14.0
   */
  public boolean tryAcquire() {
    return tryAcquire(1, 0, MICROSECONDS);
  }

这还是很简单。。层层委托。我们来看最后这个tryAcquire

/**
   * Acquires the given number of permits from this {@code RateLimiter} if it can be obtained
   * without exceeding the specified {@code timeout}, or returns {@code false}
   * immediately (without waiting) if the permits would not have been granted
   * before the timeout expired.
   *
   * @param permits the number of permits to acquire
   * @param timeout the maximum time to wait for the permits. Negative values are treated as zero.
   * @param unit the time unit of the timeout argument
   * @return {@code true} if the permits were acquired, {@code false} otherwise
   * @throws IllegalArgumentException if the requested number of permits is negative or zero
   */
  public boolean tryAcquire(int permits, long timeout, TimeUnit unit) {
    long timeoutMicros = max(unit.toMicros(timeout), 0);
    checkPermits(permits);
    long microsToWait;
    //获取互斥锁
    synchronized (mutex()) {
      //获取当前时间
      long nowMicros = stopwatch.readMicros();
      //判断是否能够在timeout时间内能够获取
      if (!canAcquire(nowMicros, timeoutMicros)) {
        return false;
      } else {
       //如果判断能够获取，则调用reserveAndGetWaitLength获取等待时间
       //其实就是走了一遍acquire
        microsToWait = reserveAndGetWaitLength(permits, nowMicros);
      }
    }
    // sleep直到能获取令牌
    stopwatch.sleepMicrosUninterruptibly(microsToWait);
    return true;
  }

通过上述分析，我们知道主要逻辑在canAcquire方法内：

private boolean canAcquire(long nowMicros, long timeoutMicros) {
    return queryEarliestAvailable(nowMicros) - timeoutMicros <= nowMicros;
}

通过调用queryEarliestAvailable得到最近的令牌可用时间，然后看这个时间与now的差值是否小于timeout，如果小于则表示这个timeout内可以获取到令牌，返回true，否则返回false

  @Override
  final long queryEarliestAvailable(long nowMicros) {
    return nextFreeTicketMicros;
  }

在SmoothBursty实现中，queryEarliestAvailable的实现直接返回nextFreeTicketMicros，这个也很清晰，nextFreeTicketMicros本来的意义就是最近的令牌可用时间。

分布式限流

滑动窗口

这个限流需求中存在一个滑动时间窗口，想想 zset 数据结构的 score 值，是不是可以通过 score 来圈出这个时间窗口来。而且我们只需要保留这个时间窗口，窗口之外的数据都可以砍掉。那这个 zset 的 value 填什么比较合适呢？它只需要保证唯一性即可，用 uuid 会比较浪费空间，那就改用毫秒时间戳吧。

在这里插入图片描述
如图所示，用一个 zset 结构记录用户的行为历史，每一个行为都会作为 zset 中的一个 key 保存下来。同一个用户同一种行为用一个 zset 记录。

为节省内存，我们只需要保留时间窗口内的行为记录，同时如果用户是冷用户，滑动时间窗口内的行为是空记录，那么这个 zset 就可以从内存中移除，不再占用空间。

通过统计滑动窗口内的行为数量与阈值 max_count 进行比较就可以得出当前的行为是否允许。用代码表示如下：

public class SimpleRateLimiter {

  private Jedis jedis;

  public SimpleRateLimiter(Jedis jedis) {
    this.jedis = jedis;
  }

  public boolean isActionAllowed(String userId, String actionKey, int period, int maxCount) {
    String key = String.format("hist:%s:%s", userId, actionKey);
    long nowTs = System.currentTimeMillis();
    Pipeline pipe = jedis.pipelined();
    pipe.multi();
    pipe.zadd(key, nowTs, "" + nowTs);
    pipe.zremrangeByScore(key, 0, nowTs - period * 1000);
    Response<Long> count = pipe.zcard(key);
    pipe.expire(key, period + 1);
    pipe.exec();
    pipe.close();
    return count.get() <= maxCount;
  }

  public static void main(String[] args) {
    Jedis jedis = new Jedis();
    SimpleRateLimiter limiter = new SimpleRateLimiter(jedis);
    for(int i=0;i<20;i++) {
      System.out.println(limiter.isActionAllowed("laoqian", "reply", 60, 5));
    }
  }

}

这段代码还是略显复杂，需要读者花一定的时间好好啃。它的整体思路就是：每一个行为到来时，都维护一次时间窗口。将时间窗口外的记录全部清理掉，只保留窗口内的记录。zset 集合中只有 score 值非常重要，value 值没有特别的意义，只需要保证它是唯一的就可以了。

redis+lua

分布式限流最关键的是要将限流服务做成原子化。使用redis+lua实现某时间窗内某个接口的请求数限流。

Lua脚本

--
--lua 下标从 1 开始
-- 限流 key
local key = KEYS[1]
-- 限流大小
local limit = tonumber(ARGV[1])

local MINUTES = "MINUTES"

-- 获取当前流量大小
local curentLimit = tonumber(redis.call('get', key) or "0")
local m = ARGV[2]
if curentLimit + 1 > limit then
    -- 达到限流大小 返回
    return 0;
else
    -- 没有达到阈值 value + 1
    redis.call("INCRBY", key, 1)
    if m == MINUTES then
        redis.call("EXPIRE", key, 120)
    else
        redis.call("EXPIRE", key, 2)
    end
    return curentLimit + 1
end

桶令牌lua脚本

--- 获取令牌
--- 返回码
--- 0 没有令牌桶配置
--- -1 表示取令牌失败，也就是桶里没有令牌
--- 1 表示取令牌成功
--- @param key 令牌（资源）的唯一标识
--- @param permits  请求令牌数量
--- @param curr_mill_second 当前毫秒数
--- @param context 使用令牌的应用标识
local function acquire(key, permits, curr_mill_second, context)
    local rate_limit_info = redis.pcall("HMGET", key, "last_mill_second", "curr_permits", "max_permits", "rate", "apps")    
    local last_mill_second = rate_limit_info[1]    
    local curr_permits = tonumber(rate_limit_info[2])    
    local max_permits = tonumber(rate_limit_info[3])    
    local rate = rate_limit_info[4]    
    local apps = rate_limit_info[5]    
    --- 标识没有配置令牌桶
    if type(apps) == 'boolean' or apps == nil or not contains(apps, context) then
        return 0
    end
    local local_curr_permits = max_permits;    
    --- 令牌桶刚刚创建，上一次获取令牌的毫秒数为空
    --- 根据和上一次向桶里添加令牌的时间和当前时间差，触发式往桶里添加令牌
    --- 并且更新上一次向桶里添加令牌的时间
    --- 如果向桶里添加的令牌数不足一个，则不更新上一次向桶里添加令牌的时间
    if (type(last_mill_second) ~= 'boolean' and last_mill_second ~= false and last_mill_second ~= nil) then
        local reverse_permits = math.floor(((curr_mill_second - last_mill_second) / 1000) * rate)        
        local expect_curr_permits = reverse_permits + curr_permits;
        local_curr_permits = math.min(expect_curr_permits, max_permits);       
         --- 大于0表示不是第一次获取令牌，也没有向桶里添加令牌
        if (reverse_permits > 0) then
            redis.pcall("HSET", key, "last_mill_second", curr_mill_second)       
      end
    else
        redis.pcall("HSET", key, "last_mill_second", curr_mill_second)   
    end
    local result = -1
    if (local_curr_permits - permits >= 0) then
        result = 1
        redis.pcall("HSET", key, "curr_permits", local_curr_permits - permits)    
    else
        redis.pcall("HSET", key, "curr_permits", local_curr_permits)    
    end
    return result
end