sentinel限流相关指标统计源码分析

最新推荐文章于 2024-05-08 05:05:54 发布

tinysakurac

最新推荐文章于 2024-05-08 05:05:54 发布

阅读量1.9k

点赞数

分类专栏： sentinel

本文链接：https://blog.csdn.net/m0_37556444/article/details/102782182

版权

本文深入探讨Sentinel的限流机制，从滑动窗口模型、StatisticSlot、NodeSelectorSlot、ClusterBuilderSlot和FlowSlot的角度分析Sentinel如何记录和使用信号量进行流量控制。Sentinel使用LeapArray作为滑动窗口数据结构，通过Node和ArrayMetric存储统计信息。NodeSelectorSlot和ClusterBuilderSlot分别用于选择资源节点和聚合统计信息，FlowSlot根据统计信号量执行限流决策。

摘要由CSDN通过智能技术生成

文章目录

前言

Sentinel 可以通过并发线程数模式的流量控制来提供信号量隔离的功能。并且结合基于响应时间的熔断降级模式，可以在不稳定资源的平均响应时间比较高的时候自动降级，防止过多的慢调用占满并发数，影响整个系统。

sentinel的限流是基于信号量机制的，因此必定在底层维护了一套限流相关指标的信号量，下面从源码的角度分析一个请求进入被sentinel保护的资源，sentinel是如何记录信号量，又是如何通过比对配置的限流规则与信号量记录对请求进行限流的。

官方架构图

首先通过官方的架构图对sentinel整体架构有个大致的了解，本篇博客接下来研究的中心为NodeSelectorSlot，StatisticSlot与ClusterBuilderSlot，并通过FlowSlot说明不同维度的限流规则都是这么使用这些统计数据的。

在 Sentinel 里面，所有的资源都对应一个资源名称（resourceName），每次资源调用都会创建一个 Entry 对象。Entry 可以通过对主流框架的适配自动创建，也可以通过注解的方式或调用 SphU API 显式创建。Entry 创建的时候，同时也会创建一系列功能插槽（slot chain），这些插槽有不同的职责，例如:
1.NodeSelectorSlot 负责收集资源的路径，并将这些资源的调用路径，以树状结构存储起来，用于根据调用路径来限流降级；
2.ClusterBuilderSlot 则用于存储资源的统计信息以及调用者信息，例如该资源的 RT, QPS, thread count 等等，这些信息将用作为多维度限流，降级的依据；
3.StatisticSlot 则用于记录、统计不同纬度的 runtime 指标监控信息；
4.FlowSlot 则用于根据预设的限流规则以及前面 slot 统计的状态，来进行流量控制；
5.AuthoritySlot 则根据配置的黑白名单和调用来源信息，来做黑白名单控制；
6.DegradeSlot 则通过统计信息以及预设的规则，来做熔断降级；
7.SystemSlot 则通过系统的状态，例如 load1 等，来控制总的入口流量；

滑动窗口模型

首先来看sentinel底层用于存储统计信号量的数据结构，滑动窗口
com.alibaba.csp.sentinel.slots.statistic.base.LeapArray

public abstract class LeapArray<T> {
   
    //单位时间窗口长度
    protected int windowLengthInMs;
    //总的桶个数
    protected int sampleCount;
    //总的时间长度
    protected int intervalInMs;
    //记录的窗口数，长度与sampleCount一样
    protected final AtomicReferenceArray<WindowWrap<T>> array;

    /**
     * The conditional (predicate) update lock is used only when current bucket is deprecated.
     */
    private final ReentrantLock updateLock = new ReentrantLock();
    ...

可以看到这个抽象类持有这五个属性，其中限流相关的信号量被WindowWrap包裹着放在一个线程安全的数组里方便随时存取。
com.alibaba.csp.sentinel.slots.statistic.base.WindowWrap

public class WindowWrap<T> {
   

    /**
     * Time length of a single window bucket in milliseconds.
     * 单位时间窗口长度
     */
    private final long windowLengthInMs;

    /**
     * Start timestamp of the window in milliseconds.
     * 窗口开始的时间
     */
    private long windowStart;

    /**
     * Statistic data.
     * 实际存放的统计数据
     */
    private T value;

LeapArray是怎么工作的
当第一个请求到来，Sentinel会创建一个特殊的时间片(time-span)去保存运行时的数据，比如:响应时间(rt),QPS, block request,在这里叫做滑动窗口(window bucket)，这个滑动窗口通过sample count定义。Sentinel通过滑动窗口有效的数据来决定当前请求是否通过，滑动窗口将记录所有的qps，将其与规则中定义的阈值进行比较。
不同的请求进来，根据不同的时间存放在不同滑动窗口中。
请求不断的进入系统，先前的滑动窗口将会过期无效。

下面看看LeapArray的几个重要方法
com.alibaba.csp.sentinel.slots.statistic.base.LeapArray#calculateTimeIdx
计算当前时间的这个请求应该放在哪个滑动窗口中，返回的是窗口在array的下标

private int calculateTimeIdx(/*@Valid*/ long timeMillis) {
   
   long timeId = timeMillis / windowLengthInMs;
   // Calculate current index so we can map the timestamp to the leap array.
   return (int)(timeId % array.length());
}

com.alibaba.csp.sentinel.slots.statistic.base.LeapArray#calculateWindowStart
计算窗口(WindowWrap)的windowStart属性,可以看到windowStart都是timeMills的整数倍

protected long calculateWindowStart(/*@Valid*/ long timeMillis) {
   
    return timeMillis - timeMillis % windowLengthInMs;
}

com.alibaba.csp.sentinel.slots.statistic.base.LeapArray#currentWindow(long)
最重要的一个方法，获取当前时间的对应窗口，功能总结起来就三句话，有则取出，无则设置，过期则更新

public WindowWrap<T> currentWindow(long timeMillis) {
   
        if (timeMillis < 0) {
   
            return null;
        }

        int idx = calculateTimeIdx(timeMillis);
        // Calculate current bucket start time.
        long windowStart = calculateWindowStart(timeMillis);

        /*
         * Get bucket item at given time from the array.
         *
         * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
         * (2) Bucket is up-to-date, then just return the bucket.
         * (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
         */
        while (true) {
   
            WindowWrap<T> old = array.get(idx);
            if (old == null) {
   
                // 无则设置
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {
   
                    // Successfully updated, return the created bucket.
                    return window;
                } else {
   
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {
   
                // 有则取出
                return old;
            } else if (windowStart > old.windowStart()) {
   
                // 过期则更新，这里为了防止并发更新用了锁
                if (updateLock.tryLock()) {
   
                    try {
   
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
   
                        updateLock.unlock();
                    }
                } else {
   
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) {
   
                // Should not go through here, as the provided time is already behind.
                // 这里是异常情况，实际不会走到该分支
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }
        }
    }

StatisticSlot

StatisticSlot是slotChain中负责记录统计数据的slot，因此自然使用了LeapArray，下面通过分析源码说明StatisticSlot是怎么通过LeapArray记录限流信号量的。
分析slot自然首先从它的entry方法入手
com.alibaba.csp.sentinel.slots.statistic.StatisticSlot#entry

@Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
   
        try {
   
            // Do some checking.
            fireEntry(context, resourceWrapper, node, count, prioritized, args);

            // Request passed, add thread count and pass count.
            node.increaseThreadNum();
            node.addPassRequest(count);

            if (context.getCurEntry().getOriginNode() != null) {
   
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
                context.getCurEntry().getOriginNode().addPassRequest(count);
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
   
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
                Constants.ENTRY_NODE.addPassRequest(count);
            }

            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
   
                handler.onPass(context, resourceWrapper, node, count, args);
            }
        } catch (PriorityWaitException ex) {
   
        ...

这里贴的代码省略了异常情况，我们先看看正常情况下的处理流程

首先对传入方法的node进行线程数和通过请求数的增加
然后获取context持有的当前请求的源node进行线程数和通过请求数的增加
判断请求类型是不是In类型，是则对全局的ENTRY_NODE进行线程数和请求数的增加
框架扩展点，使用者可以切入进来实现一些逻辑

可以看到所有信号量都是通过node来记录的，那么node是什么？

Node

com.alibaba.csp.sentinel.node.Node
node是sentinel的一个底层接口，提供了所有设置和获取流控信号量的接口，实现类需要实现这些方法来完成对流控信号量的获取和设置。

public interface Node extends OccupySupport, DebugSupport {
   

    /**
     * Get incoming request per minute ({@code pass + block}).
     *
     * @return total request count per minute
     */
    long totalRequest();

    /**
     * Get pass count per minute.
     *
     * @return total passed request count per minute
     * @since 1.5.0
     */
    long totalPass();

    /**
     * Get {@link Entry#exit()} count per minute.
     *
     * @return total completed request count per minute
     */
    long totalSuccess();

    /**
     * Get blocked request count per minute (totalBlockRequest).
     *
     * @return total blocked request count per minute
     */
    long blockRequest();

    /**
     * Get exception count per minute.
     *
     * @return total business exception count per minute
     */
    long totalException();

    /**
     * Get pass request per second.
     *
     * @return QPS of passed requests
     */
    double passQps();

    /**
     * Get block request per second.
     *
     * @return QPS of blocked requests
     */
    double blockQps();

    /**
     * Get {@link #passQps()} + {@link #blockQps()} request per second.
     *
     * @return QPS of passed and blocked requests
     */
    double totalQps();

    /**
     * Get {@link Entry#exit()} request per second.
     *
     * @return QPS of completed requests
     */
    double successQps();

    /**
     * Get estimated max success QPS till now.
     *
     * @return max completed QPS
     */
    double maxSuccessQps();

    /**
     * Get exception count per second.
     *
     * @return QPS of exception occurs
     */

最低0.47元/天解锁文章

tinysakurac

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
sentinel限流相关指标统计源码分析

前言Sentinel 可以通过并发线程数模式的流量控制来提供信号量隔离的功能。并且结合基于响应时间的熔断降级模式，可以在不稳定资源的平均响应时间比较高的时候自动降级，防止过多的慢调用占满并发数，影响整个系统。sentinel的限流是基于信号量机制的，因此必定在底层维护了一套限流相关指标的信号量，下面从源码的角度分析一个请求进入被sentinel保护的资源，sentinel是如何记录信号量，...
复制链接

扫一扫

专栏目录