基于consul与ribbon灰度的问题

最新推荐文章于 2025-04-28 18:54:27 发布

吊儿郎当当

最新推荐文章于 2025-04-28 18:54:27 发布

阅读量632

点赞数 1

本文链接：https://blog.csdn.net/ruizhige/article/details/119296752

版权

本文探讨了在使用灰度实例进行压测时影响到正常实例的问题，深入分析了Netflix负载均衡器中ZoneAvoidanceRule实现导致的流量分布不均现象，并详细解释了其背后的原因。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

实现方案描述

所有服务都会注册到consul上，灰度实例注册到consul上时打上一个tag，有请求打过来时根据配置的规则判断是否需要进行灰度处理，若需要灰度处理，则进行负载均衡时根据tag进行过滤，获取指定灰度tag的实例进行流量转发。

遇到的问题

当使用灰度实例进行压测时影响到了正常实例，导致流量分布不均，个别实例CPU负载较高，其他实例CPU负载下降明显。

问题定位

默认IRule实现com.netflix.loadbalancer.ZoneAvoidanceRule的父类com.netflix.loadbalancer.PredicateBasedRule的choose方法

/**
 * Get a server by calling {@link AbstractServerPredicate#chooseRandomlyAfterFiltering(java.util.List, Object)}.
 * The performance for this method is O(n) where n is number of servers to be filtered.
 */
@Override
public Server choose(Object key) {
    ILoadBalancer lb = getLoadBalancer();
    Optional<Server> server = getPredicate().chooseRoundRobinAfterFiltering(lb.getAllServers(), key);
    if (server.isPresent()) {
        return server.get();
    } else {
        return null;
    }       
}

从com.netflix.loadbalancer.AbstractServerPredicate获取server实例

/**
  * Choose a server in a round robin fashion after the predicate filters a given list of servers and load balancer key. 
  */
public Optional<Server> chooseRoundRobinAfterFiltering(List<Server> servers, Object loadBalancerKey) {
    List<Server> eligible = getEligibleServers(servers, loadBalancerKey);
    if (eligible.size() == 0) {
        return Optional.absent();
    }
    // 根据下标获取服务
    return Optional.of(eligible.get(incrementAndGetModulo(eligible.size())));
}


/**
 * Referenced from RoundRobinRule
 * Inspired by the implementation of {@link AtomicInteger#incrementAndGet()}.
 *
 * @param modulo The modulo to bound the value of the counter.
 * @return The next value.
 */
private int incrementAndGetModulo(int modulo) {
    for (;;) {
        int current = nextIndex.get();
        int next = (current + 1) % modulo;
        if (nextIndex.compareAndSet(current, next) && current < modulo)
            return current;
    }
}

错误主要发生在incrementAndGetModulo这个方法。同一个服务会使用同一个负载均衡器，无论它们注册到consul时是否带有灰度标签，正清情况下（无灰度请求）incrementAndGetModulo会按部就班的进行递增、取模，获取实例下标，形参modulo就是过滤后可选服务集合的数量；当引入灰度实例后modulo参数不一致，如正常实例50，灰度实例1，那么modulo分别为50和1，而且它们同属一个服务，共用一个loadbalance，会操作同一个nextIndex，所以灰度流量打乱了正常请求的计数器，导致了流量分布不均。