由一个开发问题，引发对alibab-sentinel看源码（记录过程-下）

最新推荐文章于 2024-07-23 17:37:48 发布

明天一定.

最新推荐文章于 2024-07-23 17:37:48 发布

阅读量301

点赞数

分类专栏：源码文章标签： sentinel java

本文链接：https://blog.csdn.net/wai_58934/article/details/127339216

版权

源码专栏收录该内容

9 篇文章 0 订阅

订阅专栏

由一个开发问题，引发对alibab-sentinel看源码（记录过程-上）_明天一定.的博客-CSDN博客

书接上回………………

不会吧，还是你，FlowRuleManager.loadRules(flowRules);我是既开心又开心的(当然开心啊，解决办法找到了，那就是啥都不用做就可以解决我的疑虑，继续愉快的开发了，我不信github20k星星的项目会有这种低级bug，肯定是我思想低级了)。

but，这里还有but，为了不得过且过(为了日赚更多money)，利用周六日继续往下深入看吧！

思路

刚开始我想规则和记录的qps数有直接关系，所以我才会想到那种可能出现的问题（也就是第二次loadRules的时候把之前的qps覆盖掉）。

不过既然FlowRuleManager.loadRules(flowRules)是可以‘无伤’动态替换规则，那就意味着我刚开始的想法是错的（回头想想，我刚开始的想法就很离谱）。真相就是，他们只是对应关系，对之前规则的瞬时全部替换，对目前存有的qps没有关系。

惯例，先把简单使用写上（上文有加载规则的代码）

        Entry entry;
        try {
            entry = SphU.entry("a");
        } catch (BlockException e) {
            log.error("err");
        }finally{
            if (!ObjectUtils.isEmpty(entry)){
                    entry.exit();
                }
        }

从SphU.entry开始看

进入方法，一直往下点，径直进入CtSph::entryWithPriority

    private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... args)
        throws BlockException {
        Context context = ContextUtil.getContext();
        if (context instanceof NullContext) {
            // The {@link NullContext} indicates that the amount of context has exceeded the threshold,
            // so here init the entry only. No rule checking will be done.
            return new CtEntry(resourceWrapper, null, context);
        }

        if (context == null) {
            // Using default context.
            context = InternalContextUtil.internalEnter(Constants.CONTEXT_DEFAULT_NAME);
        }

        // Global switch is close, no rule checking will do.
        if (!Constants.ON) {
            return new CtEntry(resourceWrapper, null, context);
        }

        ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);

        /*
         * Means amount of resources (slot chain) exceeds {@link Constants.MAX_SLOT_CHAIN_SIZE},
         * so no rule checking will be done.
         */
        if (chain == null) {
            return new CtEntry(resourceWrapper, null, context);
        }

        Entry e = new CtEntry(resourceWrapper, chain, context);
        try {
            chain.entry(context, resourceWrapper, null, count, prioritized, args);
        } catch (BlockException e1) {
            e.exit(count, args);
            throw e1;
        } catch (Throwable e1) {
            // This should not happen, unless there are errors existing in Sentinel internal.
            RecordLog.info("Sentinel unexpected exception", e1);
        }
        return e;
    }

上边除了异常判断，就只剩下

//获取调用链上下文
1、Context context = ContextUtil.getContext();
//获取chain
2、ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);
//根据资源ID、处理器链、上下文环境构建 CtEntry 对象。
3、Entry e = new CtEntry(resourceWrapper, chain, context);
// 链调用entry
4、chain.entry(context, resourceWrapper, null, count, prioritized, args);

我们着重来看2和4。

分析2，找资源的处理链条

我们先来看2。其实看到chain这个词，很容易想到设计模式中的责任链模式

责任链模式

顾名思义，责任链模式（Chain of Responsibility Pattern）为请求创建了一个接收者对象的链。这种模式给予请求的类型，对请求的发送者和接收者进行解耦。这种类型的设计模式属于行为型模式。

在这种模式中，通常每个接收者都包含对另一个接收者的引用。如果一个对象不能处理该请求，那么它会把相同的请求传给下一个接收者，依此类推。

看一下2的代码lookProcessChain()。

ProcessorSlot<Object> lookProcessChain(ResourceWrapper resourceWrapper) {
        // 根据请求资源确定map里是否有这个slotChain
        ProcessorSlotChain chain = chainMap.get(resourceWrapper);
        // double cheack
        if (chain == null) {
            synchronized (LOCK) {
                chain = chainMap.get(resourceWrapper);
                if (chain == null) {
                    // Entry size limit.
                    if (chainMap.size() >= Constants.MAX_SLOT_CHAIN_SIZE) {
                        return null;
                    }
                    // 如果不存在这个则新建一个chain，则新建，并加入chainMap
                    // 一个chain关联一个资源，这一点很重要，后面分析node节点结构时会用到
                    chain = SlotChainProvider.newSlotChain();
                    Map<ResourceWrapper, ProcessorSlotChain> newMap = new HashMap<ResourceWrapper, ProcessorSlotChain>(
                        chainMap.size() + 1);
                    newMap.putAll(chainMap);
                    newMap.put(resourceWrapper, chain);
                    chainMap = newMap;
                }
            }
        }
        return chain;
    }

SlotChainProvider.newSlotChain();

public static ProcessorSlotChain newSlotChain() {
        if (builder != null) {
            return builder.build();
        }
        // 检测是否有spi扩展，没有则调用默认的builder = new DefaultSlotChainBuilder();
        resolveSlotChainBuilder();

        if (builder == null) {
            RecordLog.warn("[SlotChainProvider] Wrong state when resolving slot chain builder, using default");
            builder = new DefaultSlotChainBuilder();
        }
        return builder.build();
    }

builder.build()可以看出默认构建一个slotchain有这么多的链条，每个链条都是一个的slot

public ProcessorSlotChain build() {
        ProcessorSlotChain chain = new DefaultProcessorSlotChain();
        chain.addLast(new NodeSelectorSlot());
        chain.addLast(new ClusterBuilderSlot());
        chain.addLast(new LogSlot());
        chain.addLast(new StatisticSlot());
        chain.addLast(new SystemSlot());
        chain.addLast(new AuthoritySlot());
        chain.addLast(new FlowSlot());
        chain.addLast(new DegradeSlot());

        return chain;
    }

分析4

由分析2可得知，这个chain是DefaultProcessorSlotChain。点进它的entry方法

public void entry(Context context, ResourceWrapper resourceWrapper, Object t, int count, boolean prioritized, Object... args) throws Throwable {
        this.first.transformEntry(context, resourceWrapper, t, count, prioritized, args);
    }

继续点金transformEntry方法

public void fireEntry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
        throws Throwable {
        if (next != null) {
            next.transformEntry(context, resourceWrapper, obj, count, prioritized, args);
        }
    }

紧接着是

public void entry(Context context, ResourceWrapper resourceWrapper, Object t, int count, boolean prioritized, Object... args) throws Throwable {
            super.fireEntry(context, resourceWrapper, t, count, prioritized, args);
        }

然后是

public void fireEntry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
        throws Throwable {
        if (next != null) {
            next.transformEntry(context, resourceWrapper, obj, count, prioritized, args);
        }
    }

然后又调用了entry方法

由上边分析二中next的加入顺序是NodeSelectorSlot的实现类……我就不一一点了，大致意思是每个链都会执行。

NodeSelectorSlot构建节点树

每一个context会先创建一个EntranceNode入口node，然后挂到Constants.ROOT下

从这里也就可以看出，每一个资源对应自己的node，所以就算你重新load了一下rule，他也不影响你下边实例里边的计数。真相了！

public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
        throws Throwable {
        // 注意，使用的是上下文名字获取
        DefaultNode node = map.get(context.getName());
        // 看是否有这个节点，没有则double cheack创建更新
        if (node == null) {
            synchronized (this) {
                node = map.get(context.getName());
                if (node == null) {
                    node = Env.nodeBuilder.buildTreeNode(resourceWrapper, null);
                    HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
                    cacheMap.putAll(map);
                    cacheMap.put(context.getName(), node);
                    map = cacheMap;
                }
                // Build invocation tree
                ((DefaultNode)context.getLastNode()).addChild(node);
            }
        }

        context.setCurNode(node);
        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

结构如下

ClusterBuilderSlot

相同的资源有一个ClusterBuilderSlot，见上图，不再贴代码

StatisticSlot计数

主要方法

node.addPassRequest(count);

public void addPassRequest(int count) {
        // 记录当前链路的请求数
        super.addPassRequest(count);
        // 记入其他链路的请求数
        this.clusterNode.addPassRequest(count);
    }

进入super的方法

public void addPassRequest(int count) {
        rollingCounterInSecond.addPass(count);
        rollingCounterInMinute.addPass(count);
    }

进行计数操作，FlowSlot后边解析

FlowSlot

一进来先check

checkFlow(resourceWrapper, context, node, count, prioritized);

void checkFlow(ResourceWrapper resource, Context context, DefaultNode node, int count, boolean prioritized) throws BlockException {
        // Flow rule map cannot be null.
        Map<String, List<FlowRule>> flowRules = FlowRuleManager.getFlowRuleMap();

        List<FlowRule> rules = flowRules.get(resource.getName());
        if (rules != null) {
            for (FlowRule rule : rules) {
                // 判断能否通过
                if (!canPassCheck(rule, context, node, count, prioritized)) {
                    throw new FlowException(rule.getLimitApp());
                }
            }
        }
    }

代码中出现了熟悉的flowRuleMap，还记得吗，在我们loadrules的时候，把规则放到了这里，所以再次验证，更新map，不会影响先前的规则记录的qps数。

接着canPassCheck往下走进入到FlowRuleChecker的passCheck方法

static boolean passCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                          boolean prioritized) {
        // 拿到资源名字
        String limitApp = rule.getLimitApp();
        if (limitApp == null) {
            return true;
        }
        // 是否是集群模式
        if (rule.isClusterMode()) {
            return passClusterCheck(rule, context, node, acquireCount, prioritized);
        }
        
        return passLocalCheck(rule, context, node, acquireCount, prioritized);
    }

默认不是集群会进入passLocalCheck

private static boolean passLocalCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                          boolean prioritized) {
        // 根据请求和策略选择节点
        Node selectedNode = selectNodeByRequesterAndStrategy(rule, context, node);
        if (selectedNode == null) {
            return true;
        }

        return rule.getRater().canPass(selectedNode, acquireCount);
    }

选择节点后就开始校验了，进到canPass方法(这里我们主要看滑动窗口，所以进入DefaultController::canPass)

public boolean canPass(Node node, int acquireCount, boolean prioritized) {
        // 计算可用数——重点
        int curCount = avgUsedTokens(node);
        if (curCount + acquireCount > count) {
            return false;
        }

        return true;
    }

点进去

private int avgUsedTokens(Node node) {
        if (node == null) {
            return -1;
        }
        // 是否是限并发，不是的话走后边方法（限流）
        return grade == RuleConstant.FLOW_GRADE_THREAD ? node.curThreadNum() : (int)node.passQps();
    }

这里就要说时间滑动窗口算法了。node.passQps记录的是节点上通过的qps数，这就要重新返回去看StatisticSlot

计数

StatisticSlot::passQps

public long passQps() {
        return rollingCounterInSecond.pass() / (long) rollingCounterInSecond.getWindowIntervalInSec();
    }

rollingCounterInSecond

private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
    IntervalProperty.INTERVAL);

ArrayMetric：数据中心，是一个数组。下边讲

参数：

intervalInMs代表滑动时间窗口时间间隔。默认值1000ms
sampleCount代表样板数量，也就是滑动窗口分为几份。默认值为2

重回StatisticSlot看Node计数

找到StatisticSlot::addPassRequest中的rollingCounterInSecond.addPass(count);点进去

public void addPass(int count) {
        // 找窗口时区
        WindowWrap<MetricBucket> wrap = data.currentWindow();
        // 这个窗口做增加操作
        wrap.value().addPass(count);
    }

滑动窗口大致图形是这个样子

但是因为时间是无穷无尽的，所以，把这个数据中心ArrayMetric设计为一个环形数组。

像这样的（只是画出大概形状，不是准确这么划分）

然后默认一个窗口1000s分为两份，也就是说，只需要看当前数组的前后一个index即可。

我们回到上边代码的data.currentWindow()看是如何做的

代码太长，我就一点一点分析，就不贴currentWindow()里边代码了。

首先看currentWindow方法中的使用的calculateTimeIdx

protected int calculateTimeIdx(/*@Valid*/ long timeMillis) {
        // 时间除以时间间隔区块，以默认为例就是500
        long timeId = timeMillis / windowLengthInMs;
        // 看在第几个区块上，然后除以环形数组个数以使得每次都会在环形数组上有一个区间与之对应
        // Calculate current index so we can map the timestamp to the leap array.
        return (int)(timeId % array.length());
    }

再看如何计算这个格子的起始时间calculateWindowStart(timeMillis)

protected long calculateWindowStart(/*@Valid*/ long timeMillis) {
        return timeMillis - timeMillis % windowLengthInMs;
    }

我们来拿第1300ms举例子，首先1300余500得到300，再用1300-300得到1000，所以1300所在格子的起始时间是1000，这样就算出来了。

接下来就是看判断当前落在哪个window了。

/*
         * Get bucket item at given time from the array.
         *
         * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
         * (2) Bucket is up-to-date, then just return the bucket.
         * (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
         */
        while (true) {
            WindowWrap<T> old = array.get(idx);
            if (old == null) {
                /*
                 *     B0       B1      B2    NULL      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            bucket is empty, so create new and update
                 *
                 * If the old bucket is absent, then we create a new bucket at {@code windowStart},
                 * then try to update circular array via a CAS operation. Only one thread can
                 * succeed to update, while other threads yield its time slice.
                 */
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket());
                if (array.compareAndSet(idx, null, window)) {
                    // Successfully updated, return the created bucket.
                    return window;
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {
                /*
                 *     B0       B1      B2     B3      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            startTime of Bucket 3: 800, so it's up-to-date
                 *
                 * If current {@code windowStart} is equal to the start timestamp of old bucket,
                 * that means the time is within the bucket, so directly return the bucket.
                 */
                return old;
            } else if (windowStart > old.windowStart()) {
                /*
                 *   (old)
                 *             B0       B1      B2    NULL      B4
                 * |_______||_______|_______|_______|_______|_______||___
                 * ...    1200     1400    1600    1800    2000    2200  timestamp
                 *                              ^
                 *                           time=1676
                 *          startTime of Bucket 2: 400, deprecated, should be reset
                 *
                 * If the start timestamp of old bucket is behind provided time, that means
                 * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
                 * Note that the reset and clean-up operations are hard to be atomic,
                 * so we need a update lock to guarantee the correctness of bucket update.
                 *
                 * The update lock is conditional (tiny scope) and will take effect only when
                 * bucket is deprecated, so in most cases it won't lead to performance loss.
                 */
                if (updateLock.tryLock()) {
                    try {
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) {
                // Should not go through here, as the provided time is already behind.
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket());
            }
        }

注解说的明明白白

如果桶(window上的实体)不存在，创建新的桶并且cas更新
如果桶更新过了，则直接返回旧桶
桶过期的话就重置之前的桶和当前的桶

最后重新回到StatisticSlot::passQps来看直接计算通过数

rollingCounterInSecond.pass() / (long) rollingCounterInSecond.getWindowIntervalInSec();

大致是通过数除以时间就得出qps了

具体看怎么计算通过数

public long pass() {
        // 拿到当前窗口，上边解析过
        data.currentWindow();
        long pass = 0;
        // 拿到符合条件的两个窗口
        List<MetricBucket> list = data.values();
        // 遍历累加通过数
        for (MetricBucket window : list) {
            pass += window.pass();
        }
        return pass;
    }