前言
Sentinel 处理流程是基于slot链(ProcessorSlotChain)来完成的,比如限流、熔断等,其中重要的一个slot就是StatisticSlot,它是做各种数据统计的,而限流/熔断的数据判断来源就是StatisticSlot,StatisticSlot的各种数据统计都是基于滑动窗口来完成的,因此本文会结合源码一步步分析StatisticSlot中滑动窗口的实现原理。
一 StatisticSlot数据采集的entry方法源码分析
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
boolean prioritized, Object... args) throws Throwable {
try {
// Do some checking.
//next(下一个)节点调用Entry方法
fireEntry(context, resourceWrapper, node, count, prioritized, args);
// 如果能通过SlotChain中后面的Slot的entry方法,说明没有被限流或降级
// Request passed, add thread count and pass count.
//当前线程数加1
node.increaseThreadNum();
//通过的请求加上count
node.addPassRequest(count); //@1
// 元节点通过请求数和当前线程(LongAdder curThreadNum)计数器加1
if (context.getCurEntry().getOriginNode() != null) {
// Add count for origin node.
context.getCurEntry().getOriginNode().increaseThreadNum();
context.getCurEntry().getOriginNode().addPassRequest(count);
}
// 入口节点通过请求数和当前线程(LongAdder curThreadNum)计数器加1
if (resourceWrapper.getEntryType() == EntryType.IN) {
// Add count for global inbound entry node for global statistics.
Constants.ENTRY_NODE.increaseThreadNum();
Constants.ENTRY_NODE.addPassRequest(count);
}
// Handle pass event with registered entry callback handlers. 注册的扩展点的数据统计
for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
handler.onPass(context, resourceWrapper, node, count, args);
}
} catch (PriorityWaitException ex) {
node.increaseThreadNum();
if (context.getCurEntry().getOriginNode() != null) {
// Add count for origin node.
context.getCurEntry().getOriginNode().increaseThreadNum();
}
if (resourceWrapper.getEntryType() == EntryType.IN) {
// Add count for global inbound entry node for global statistics.
Constants.ENTRY_NODE.increaseThreadNum();
}
// Handle pass event with registered entry callback handlers.
for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
handler.onPass(context, resourceWrapper, node, count, args);
}
} catch (BlockException e) {
// Blocked, set block exception to current entry.
context.getCurEntry().setError(e);
// Add block count.
node.increaseBlockQps(count); //@2
if (context.getCurEntry().getOriginNode() != null) { context.getCurEntry().getOriginNode().increaseBlockQps(count);
}
if (resourceWrapper.getEntryType() == EntryType.IN) {
// Add count for global inbound entry node for global statistics.
Constants.ENTRY_NODE.increaseBlockQps(count);
}
// Handle block event with registered entry callback handlers.
for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
handler.onBlocked(e, context, resourceWrapper, node, count, args);
}
throw e;
} catch (Throwable e) {
// Unexpected error, set error to current entry.
context.getCurEntry().setError(e);
// This should not happen.
node.increaseExceptionQps(count); //@3
if (context.getCurEntry().getOriginNode() != null) {
context.getCurEntry().getOriginNode().increaseExceptionQps(count);
}
if (resourceWrapper.getEntryType() == EntryType.IN) {
Constants.ENTRY_NODE.increaseExceptionQps(count);
}
throw e;
}
}
由前面文章可知责任链的调用模式是以entry方法为入口在entry方法中处理完功能逻辑后调用fireEntry方法指向下一个节点的entry方法。而 StatisticSlot的fireEntry方法调用顺序前置,这样做的目的是先进行规则验证,如果规则验证不通过则进入相应的catch异常统计异常数据,规则验证通过统计统计成功数据。
源码中的 @1,@2,@3 方法都是由StatisticNode中的两个关键属性实现的
/**
*默认采样数为2 采样间隔为1000 (0~499 500~999两个窗口)
* Holds statistics of the recent {@code INTERVAL} seconds. The {@code INTERVAL} is divided into time spans
* by given {@code sampleCount}. 默认窗口数为2 采样间隔为1000 0~499 500~999
*/
private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
IntervalProperty.INTERVAL);
/**
* 默认采样数为60 采样间隔为60*1000 (0~1000 1000~2000 ... 共60个窗口)
* Holds statistics of the recent 60 seconds. The windowLengthInMs is deliberately set to 1000 milliseconds,
* meaning each bucket per second, in this way we can get accurate statistics of each second.
*/
private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);
我们可以认为rollingCounterInSecond 和rollingCounterInMinute 分别是秒级滚动计数器和分级滚动计数器。
二 分析滚动计数器
以秒级滚动计数器来看
private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
IntervalProperty.INTERVAL);
由代码看rollingCounterInSecond 是ArrayMetric的实例化对象。
private final LeapArray<MetricBucket> data;
public ArrayMetric(int sampleCount, int intervalInMs) {
this.data = new OccupiableBucketLeapArray(sampleCount, intervalInMs);
}
数据统计容器为LeapArray,而数据以MetricBucket实列为载体
1.LeapArray 属性
protected int windowLengthInMs; // 窗口长度
protected int sampleCount; // 样品数量
protected int intervalInMs; // 间期
// 采样的时间窗口数组
protected final AtomicReferenceArray<WindowWrap<T>> array;
2.leapArray构造器
```java
public LeapArray(int sampleCount, int intervalInMs) {
AssertUtil.isTrue(sampleCount > 0, "bucket count is invalid: " + sampleCount);
AssertUtil.isTrue(intervalInMs > 0, "total time interval of the sliding window should be positive");
AssertUtil.isTrue(intervalInMs % sampleCount == 0, "time span needs to be evenly divided");
this.windowLengthInMs = intervalInMs / sampleCount;
this.intervalInMs = intervalInMs;
this.sampleCount = sampleCount;
this.array = new AtomicReferenceArray<>(sampleCount);
}
代码看到这里总结一下
rollingCounterInSecond 秒级滚动计数器实际是以一个大小为sampleCount的AtomicReferenceArray容器存放WindowWrap数据进行数据统计(WindowWrap实际是MetricBucket包装类)
那么我们简单看一下这个WindowWrap包装类(窗口包装类)
/**
* Time length of a single window bucket in milliseconds.
*/
private final long windowLengthInMs;
/**
* Start timestamp of the window in milliseconds.
*/
private long windowStart;
/**
* Statistic data. 默认MetricBucket
*/
private T value;
/**
* @param windowLengthInMs a single window bucket's time length in milliseconds.
* @param windowStart the start timestamp of the window
* @param value statistic data
*/
public WindowWrap(long windowLengthInMs, long windowStart, T value) {
this.windowLengthInMs = windowLengthInMs;
this.windowStart = windowStart;
this.value = value;
}
我们可以看出这是一个包装类,这里的 T value 我们可以认为是MetricBucket
结合leapArray的关键属性我们可知rollingCounterInSecond 采用滑动窗口的方式计数。
三 滑动窗口计数原理分析
以addPass(count)为例数据统计调用关系图
前面已经分析了rollingCouterInSecond是LeapArray data 机型数据统计
WindowWrap<MetricBucket> wrap = data.currentWindow(); //@1
public WindowWrap<T> currentWindow() {
// 设置当前时间窗口到窗口列表
return currentWindow(TimeUtil.currentTimeMillis()); //@2
}
public WindowWrap<T> currentWindow(long timeMillis) {
if (timeMillis < 0) {
return null;
}
// 判读当前时间属于哪个窗口
int idx = calculateTimeIdx(timeMillis); //@3
// Calculate current bucket start time. 计算当前窗口开始时间
long windowStart = calculateWindowStart(timeMillis); //@4
/*
* Get bucket item at given time from the array.
*
* (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
* (2) Bucket is up-to-date, then just return the bucket.
* (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
*/
while (true) {
// 获取数组中的老数据
WindowWrap<T> old = array.get(idx);// @5
if (old == null) {
/*
* B0 B1 B2 NULL B4
* ||_______|_______|_______|_______|_______||___
* 200 400 600 800 1000 1200 timestamp
* ^
* time=888
* bucket is empty, so create new and update
*
* If the old bucket is absent, then we create a new bucket at {@code windowStart},
* then try to update circular array via a CAS operation. Only one thread can
* succeed to update, while other threads yield its time slice.
*/
WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
// 通过cas判断
if (array.compareAndSet(idx, null, window)) {
// Successfully updated, return the created bucket.
return window;
} else {
// Contention failed, the thread will yield its time slice to wait for bucket available.
Thread.yield();
}
// 如果对应时间窗口的开始时间与计算得到的开始时间一样
// 那么代表当前即是我们要找的窗口对象,直接返回
} else if (windowStart == old.windowStart()) {
/*
* B0 B1 B2 B3 B4
* ||_______|_______|_______|_______|_______||___
* 200 400 600 800 1000 1200 timestamp
* ^
* time=888
* startTime of Bucket 3: 800, so it's up-to-date
*
* If current {@code windowStart} is equal to the start timestamp of old bucket,
* that means the time is within the bucket, so directly return the bucket.
*/
return old;
} else if (windowStart > old.windowStart()) {
/*
* (old)
* B0 B1 B2 NULL B4
* |_______||_______|_______|_______|_______|_______||___
* ... 1200 1400 1600 1800 2000 2200 timestamp
* ^
* time=1676
* startTime of Bucket 2: 400, deprecated, should be reset
*
* If the start timestamp of old bucket is behind provided time, that means
* the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
* Note that the reset and clean-up operations are hard to be atomic,
* so we need a update lock to guarantee the correctness of bucket update.
*
* The update lock is conditional (tiny scope) and will take effect only when
* bucket is deprecated, so in most cases it won't lead to performance loss.
*/
if (updateLock.tryLock()) {
try {
//如果当前的开始时间大于原开始时间,那么就更新到新的开始时间
// Successfully get the update lock, now we reset the bucket.
return resetWindowTo(old, windowStart);//@6
} finally {
updateLock.unlock();
}
} else {
// Contention failed, the thread will yield its time slice to wait for bucket available.
Thread.yield();
}
} else if (windowStart < old.windowStart()) {
// Should not go through here, as the provided time is already behind.
return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
}
}
}
@1 根据获取leapArray中存放的窗口数据
@2 获取当前窗口数据
@3 根据当前时间计算当前时间所属窗口位置
/**
* calculateTimeIdx方法用当前的时间戳除以每个窗口的大小,
* windowLengthInMs = intervalInMs/sampleCount
* 再和array数据取模。array数据是一个容量为60的数组,
* 代表被统计的60秒分割的60个小窗口。
* @param timeMillis
* @return
*/
private int calculateTimeIdx(/*@Valid*/ long timeMillis) {
long timeId = timeMillis / windowLengthInMs;
// Calculate current index so we can map the timestamp to the leap array.
return (int)(timeId % array.length());
}
@4 计算窗口实际开始时间
/**
* 当前时间减去(当前时间根据窗口长度取模的值)
* @param timeMillis
* @return
*/
protected long calculateWindowStart(/*@Valid*/ long timeMillis) {
return timeMillis - timeMillis % windowLengthInMs;
}
@5 根据窗口位置获取窗口数据
该位置不存在窗口则新建窗口通过cas放入窗口集合中返回一个新窗口
@6 当前窗口开始时间大于旧窗口开始时间窗口向前滑动
@Override
protected WindowWrap<MetricBucket> resetWindowTo(WindowWrap<MetricBucket> w, long startTime) {
// Update the start time and reset value.
w.resetTo(startTime);
w.value().reset();
return w;
}