问题描述
前期测试期间设置:线程池fix大小3,限速2,但是出现了同时3个线程获取到令牌的场景,日志如下
2021-05-20 10:50:51.325 [0b17349516214790512533803e2612] [SystemReport-TP-0] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:51.325 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:51.325 [0b17349516214790512533803e2612] [SystemReport-TP-1] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:51.959 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:52.194 [0b17349516214790512533803e2612] [SystemReport-TP-0] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:52.349 [0b17349516214790512533803e2612] [SystemReport-TP-1] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:53.459 [0b17349516214790512533803e2612] [SystemReport-TP-0] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:53.504 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:53.559 [0b17349516214790512533803e2612] [SystemReport-TP-1] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:54.164 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:54.167 [0b17349516214790512533803e2612] [SystemReport-TP-1] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:54.209 [0b17349516214790512533803e2612] [SystemReport-TP-0] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:54.664 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:54.668 [0b17349516214790512533803e2612] [SystemReport-TP-1] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:54.702 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:54.709 [0b17349516214790512533803e2612] [SystemReport-TP-1] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:54.959 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
2021-05-20 10:50:55.164 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-2, rate=2.0
2021-05-20 10:50:55.791 [0b17349516214790512533803e2612] [SystemReport-TP-2] INFO c.d.m.manager.IAcsClientManager:65 - returnAcsClient=client-1, rate=2.0
在第一秒出现了超速现象,3个线程获取到了资源开始执行任务。。。
问题分析
代码实现,并发场景下3个线程读取的lastIndex可能为同一个值,也就是说获取到了同一个限流器,理应只有两个线程执行任务,一个线程等待,但事实上我们看到第一批次出现了超速
// 限流器构造方法
RateLimiter.create(2);
// 获取客户端
private IAcsClient getClientInner() {
for (int i = 0; i < QueryEagleeyeConstant.WAIT_CLIENT_THRESHOLD; i++) {
for (int j = 0; j < CLIENT_CACHE.size(); j++) {
ClientLimiterDTO clientLimiter = CLIENT_CACHE.get(j);
if (clientLimiter.getRateLimiter()
// 5 ms
.tryAcquire(QueryEagleeyeConstant.WAIT_PERMIT_THRESHOLD, TimeUnit.MILLISECONDS)) {
log.info("returnAcsClient={}, rate={}",
((DefaultAcsClient) clientLimiter.getIAcsClient()).getProfile()
.getCredential().getAccessKeyId(),
clientLimiter.getRateLimiter().getRate());
return clientLimiter.getIAcsClient();
}
}
try {
// 休眠 5 ms 后重试
Thread.sleep(QueryEagleeyeConstant.WAIT_PERMIT_THRESHOLD);
} catch (InterruptedException e) {
// do nothing
}
}
return null;
}
源码看一波
限流器创建
guava采用令牌桶算法,提供两种实现:Bursty(尖刺)/warmUp(预热),默认构造器使用的限流类型为:尖刺版本,也就是说请求同时来,只要有令牌都放行,另外一个类型则是warmUp版本,即存在一个冷启动阶段,避免服务被瞬间流量打垮
public static RateLimiter create(double permitsPerSecond) {
/*
* 为了避免并发失速问题,生产未使用的令牌的时间可能会超过1秒,例如:限流1qps,4个线程同时访问
* The default RateLimiter configuration can save the unused permits of up to one second.
* This is to avoid unnecessary stalls in situations like this: A RateLimiter of 1qps,
* and 4 threads, all calling acquire() at these moments:
*
* T0 at 0 seconds
* T1 at 1.05 seconds
* T2 at 2 seconds
* T3 at 3 seconds
* 由于T1线程延迟了,所以T2应该延迟至2.05秒放行,T3也应延迟至3.05秒放行
* Due to the slight delay of T1, T2 would have to sleep till 2.05 seconds,
* and T3 would also have to sleep till 3.05 seconds.
*/
// 创建计时器用于构造限流实例
return create(SleepingStopwatch.createFromSystemTimer(), permitsPerSecond);
}
static RateLimiter create(SleepingStopwatch stopwatch, double permitsPerSecond) {
RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);
rateLimiter.setRate(permitsPerSecond);
return rateLimiter;
}
设置速率
public final void setRate(double permitsPerSecond) {
checkArgument(
permitsPerSecond > 0.0 && !Double.isNaN(permitsPerSecond), "rate must be positive");
// 互斥锁
synchronized (mutex()) {
// 设置速率
doSetRate(permitsPerSecond, stopwatch.readMicros());
}
}
final void doSetRate(double permitsPerSecond, long nowMicros) {
// 基于当前时间nowMicros更新令牌数,以及下次生产令牌的时间
// 如果当前时间大于nextFreeTicketMicros则将nextFreeTicketMicros更新为当前时间nowMicros
resync(nowMicros);
double stableIntervalMicros = SECONDS.toMicros(1L) / permitsPerSecond;
this.stableIntervalMicros = stableIntervalMicros;
doSetRate(permitsPerSecond, stableIntervalMicros);
}
void doSetRate(double permitsPerSecond, double stableIntervalMicros) {
double oldMaxPermits = this.maxPermits;
// 最大令牌数,1s * 每秒生产令牌数
maxPermits = maxBurstSeconds * permitsPerSecond;
// storedPermits初始化为0
if (oldMaxPermits == Double.POSITIVE_INFINITY) {
// if we don't special-case this, we would get storedPermits == NaN, below
storedPermits = maxPermits;
} else {
storedPermits = (oldMaxPermits == 0.0)
? 0.0 // initial state
: storedPermits * maxPermits / oldMaxPermits;
}
}
获取令牌
public boolean tryAcquire(int permits, long timeout, TimeUnit unit) {
long timeoutMicros = max(unit.toMicros(timeout), 0);
checkPermits(permits);
long microsToWait;
// 互斥锁
synchronized (mutex()) {
// 当前时间 - 限流器构建时间点 毫秒数
long nowMicros = stopwatch.readMicros();
// 根据超时参数判断是否可以获取令牌
// queryEarliestAvailable返回nextFreeTicketMicros初始值为限流器创建时间
// nextFreeTicketMicros令牌库存大于等于令牌申请数时为申请令牌时的时间
// nextFreeTicketMicros令牌库存小于令牌申请数时为:
// nextFreeTicketMicros+缺口令牌数生产需要花费的时间
// nextFreeTicketMicros - timeoutMicros <= nowMicros 则返回true
// 即:nextFreeTicketMicros - nowMicros <= timeoutMicros
// 查看reserveAndGetWaitLength代码,最大等待值一定<=timeoutMicros
if (!canAcquire(nowMicros, timeoutMicros)) {
return false;
} else {
// 预定令牌并获取获取令牌需要等待的时间
microsToWait = reserveAndGetWaitLength(permits, nowMicros);
}
}
// 如果令牌供求小于需求,等待令牌生产
stopwatch.sleepMicrosUninterruptibly(microsToWait);
return true;
}
限流器放行规则图解
- 时间点A获取令牌失败,不放行
- 时间点B获取令牌成功,放行
- 分界线为超时时间左边界
预定令牌并获取获取令牌需要等待的时间
final long reserveAndGetWaitLength(int permits, long nowMicros) {
// momentAvailable = nextFreeTicketMicros
long momentAvailable = reserveEarliestAvailable(permits, nowMicros);
// nextFreeTicketMicros - nowMicros <= timeoutMicros
// 最大等待值一定<=timeoutMicros
return max(momentAvailable - nowMicros, 0);
}
final long reserveEarliestAvailable(int requiredPermits, long nowMicros) {
// 基于当前时间nowMicros更新令牌数,以及下次生产令牌的时间
// 如果当前时间大于nextFreeTicketMicros则将nextFreeTicketMicros更新为当前时间nowMicros
// 因为令牌数已经满足需求数,可以直接返回
resync(nowMicros);
long returnValue = nextFreeTicketMicros;
double storedPermitsToSpend = min(requiredPermits, this.storedPermits);
double freshPermits = requiredPermits - storedPermitsToSpend;
// 1. 令牌库存 >= 需求,不需要等待,waitMicros=0,storedPermitsToWaitTime,预热版本存在冷启动时间
// 2. 令牌库存 < 需求,需要等待,等待时间为缺口数量*生产单个令牌时间,预热版本会额外增加预热时间storedPermitsToWaitTime
long waitMicros = storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)
+ (long) (freshPermits * stableIntervalMicros);
try {
this.nextFreeTicketMicros = LongMath.checkedAdd(nextFreeTicketMicros, waitMicros);
} catch (ArithmeticException e) {
this.nextFreeTicketMicros = Long.MAX_VALUE;
}
// 如果存在库存递减令牌数
this.storedPermits -= storedPermitsToSpend;
// 返回nextFreeTicketMicros
return returnValue;
}
无并发问题,那么为什么会超速呢?-_-!!!
查看官网已经有人发现了限速不精确的问题(https://github.com/google/guava/issues/5296),但是与我的场景不符
复现问题
模拟线上场景尝试复现问题
public static void main(String[] args) throws InterruptedException {
ExecutorService SYSTEM_REPORT_THREAD_POOL = Executors.newFixedThreadPool(6);
RateLimiter rateLimiter = RateLimiter.create(2);
Thread.sleep(2000);
for (int i = 0; i < 3; i++) {
final int j = i;
SYSTEM_REPORT_THREAD_POOL.submit(new Thread(() -> {
if (rateLimiter.tryAcquire(5, TimeUnit.MILLISECONDS)) {
log.info("do something");
}
}));
}
Thread.sleep(Integer.MAX_VALUE);
}
输出结果
11:33:24.620 [pool-1-thread-1] INFO com.....monitor.manager.IAcsClientManager - do something
11:33:24.620 [pool-1-thread-3] INFO com.....monitor.manager.IAcsClientManager - do something
11:33:24.620 [pool-1-thread-2] INFO com.....monitor.manager.IAcsClientManager - do something
问题原因
断点关注第三次放行的过程数据,发现com.google.common.util.concurrent.RateLimiter#reserveEarliestAvailable方法返回值依然等于nowMicros,该方法返回nextFreeTicketMicros值,但是由于前两次的获取令牌时,令牌库存满足,所以nextFreeTicketMicros不会递增即等于上次获取令牌的时间,第三个线程的nowMicros一定大于第二个线程的nowMicros,所以resync方法会将nextFreeTicketMicros更新为第三个线程的nowMicros。有点绕,搞个流程配个图咯
- 线程1 执行
- tryAcquire:线程1尝试获取令牌
- resync:nowMicros > nextFreeTicketMicros,resync将nextFreeTicketMicros更新为当前时间nowMicros,刷新令牌桶库存
- 计算wait时间waitMicros=令牌缺口数量*生产单个令牌所需时间**(仅考虑非预热版本,预热版本需要额外增加预热时间)**
- add nextFreeTicketMicros:nextFreeTicketMicros=nextFreeTicketMicros + waitMicros**(缺口数量为0时则waitMicros=0)**
- 令牌数-min(申请令牌数,令牌库存)
- 返回add前的nextFreeTicketMicros**(缺口数量为0时,则nextFreeTicketMicros为当前时间nowMicros)**
- 返回休眠时间=max(momentAvailable - nowMicros, 0)。momentAvailable为add nextFreeTicketMicros前的nextFreeTicketMicros时间,此时=nextFreeTicketMicros=nowMicros。因此休眠时间为0,直接返回true,获取令牌成功
- 线程2流程与线程1完全相同,因为排他锁,必然线程2的nowMicros>线程1的nowMicros,流程与线程1完全相同,最终nextFreeTicketMicros=线程2的nowMicros,休眠时间为0
- 线程3的nowMicros>线程2的nowMicros,由于缺口数量不为0,此时申请的令牌数为3,库存为2,缺口为1,nextFreeTicketMicros会被递增至nextFreeTicketMicros+waitMicros,返回递增前的nextFreeTicketMicros,此时依然为线程3的nowMicros,因为线程3的nowMicros>递增前的nextFreeTicketMicros(线程2的nowMicros)
- 此时出现超速为1的场景,即限速2。
小结
令牌桶满的时候再次进入多个线程会出现超速1的场景
解决方案
- 改用预热版本令牌桶 warmUp
- 针对于我们当前场景对限流器进行交叉使用也可以减少该问题的出现
总结
优势
- 相比sentinel限流,guava api用法更为简单,学习成本低。api用例如下
// 限流器构造方法
RateLimiter rateLimiter = RateLimiter.create(2);
// 获取令牌 最大等待令牌的生产时间,如果预计等待时间超出指定超时值则直接返回false,
// 否则休眠后返回获取令牌成功
// 案例为 5 ms
if (rateLimiter.tryAcquire(5, TimeUnit.MILLISECONDS)) {
// do something
}
劣势
- 不能动态调整限流值
- 不支持分布式
- 限流算法单一,guava采用令牌桶算法实现
个人认为劣势中的2,3点是可接受的,分布式可以通过改造封装支持,如果服务上层负载均衡理想,那么单机的限流也未尝不可,毕竟限流值每台机器均分即可。个人拙见,欢迎大家留言讨论哦-