dubbo之集群容错API

一户董

已于 2022-03-16 10:58:36 修改

阅读量2.4k

点赞数

分类专栏： dubbo 文章标签： dubbo 集群容错 failover loadbalance failfast

于 2022-02-26 17:31:30 首次发布

本文链接：https://blog.csdn.net/wang0907/article/details/123153069

版权

dubbo 专栏收录该内容

35 篇文章 4 订阅

订阅专栏

写在前面

dubbo为了实现服务的高可用，提供了集群容错的功能，本文我们就一起来看下集群调用接口com.alibaba.dubbo.rpc.cluster.Cluster以及其实现类和相关类。

集群容错对应的顶层接口是com.alibaba.dubbo.rpc.cluster.Cluster,源码如下：

@SPI(FailoverCluster.NAME)
public interface Cluster {
    // 基于Directory创建一个统一的Invoker调用，从而实现对一个服务类的一组服务方法统一管理和调用
    @Adaptive
    <T> Invoker<T> join(Directory<T> directory) throws RpcException;
}

@SPI(FailoverCluster.NAME)默认是用名称为NAME = "failover";的扩展类，即默认使用failover故障转移策略，当一个节点调用失败，尝试其他节点，直到成功，是高可用典型的策略。@Adaptive定义自适应接口，使用URL中的cluster属性值来获取目标扩展类。
其子类如下图红框中对应的是6中集群容错模式实现类：

在这里插入图片描述

接下来我们就分别看下每种XXXCluster集群容错类。

1：FailoverCluster

failover集群容错策略对应的实现类，失败自动切换，当出现了调用失败时会自动重试其他服务器，但是重试会带来更长的延迟，一般用于读操作,因此可以通过设置retry参数来限制重试次数，一般生产环境建议设置为retry=2，即重试2次，不包含第一次，理论当重试2次还不能成功时，可能是程序本身确实出问题了，原理如下图：

在这里插入图片描述

FailoverCluster源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailoverCluster
public class FailoverCluster implements Cluster {
    public final static String NAME = "failover";

    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new FailoverClusterInvoker<T>(directory);
    }
}

注意到join方法返回的Invoker是FailoverClusterInvoker，这是真正调用服务提供者方法时的入口类，具体参考1.1.1：FailoverClusterInvoker。

1.1：FailoverClusterInvoker

该类源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailoverClusterInvoker
public class FailoverClusterInvoker<T> extends AbstractClusterInvoker<T> {
    private static final Logger logger = LoggerFactory.getLogger(FailoverClusterInvoker.class);

    public FailoverClusterInvoker(Directory<T> directory) {
        super(directory);
    }
    
    // invocation：封装是要调用方法信息，方法名称，参数类型，传入的参数等，
    // 如调用scopeRemoteService.sayHi("helloooooo") 
    // -> 
    // RpcInvocation [methodName=sayHi, parameterTypes=[class java.lang.String], arguments=[helloooooo], attachments={}]
    @Override
    @SuppressWarnings({"unchecked", "rawtypes"})
    public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        // 2022年2月26日16:02:24
        List<Invoker<T>> copyinvokers = invokers;
        // 2022年2月26日16:12:08
        checkInvokers(copyinvokers, invocation);
        // 从url中获取配置的重试次数，如<dubbo:reference ... retries="10"/> 生成的URL是zookeeper://127.0.0.1:2181/com.alibaba.dubbo.registry.RegistryService?...&retries=10...
        // 这里获取的结果就是11，+1的原因是重试不包含第一次，默认是“DEFAULT_RETRIES = 2;”次
        int len = getUrl().getMethodParameter(invocation.getMethodName(), Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;
        if (len <= 0) {
            len = 1;
        }
        // 最后一个重试的异常信息
        RpcException le = null;
        // 已经调用过的invoker集合
        List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size());
        Set<String> providers = new HashSet<String>(len);
        // 此处是failover机制的核心代码！！！
        // 循环调用重试次数，直到成功或者是重试次数用完
        for (int i = 0; i < len; i++) {
            // invoker个数发生了改变，这里可以不考虑这种情况
            if (i > 0) {
                checkWhetherDestroyed();
                copyinvokers = list(invocation);
                checkInvokers(copyinvokers, invocation);
            }
            // 2022年2月26日17:25:38
            Invoker<T> invoker = select(loadbalance, invocation, copyinvokers, invoked);
            // 添加到已执行invoker集合中
            invoked.add(invoker);
            RpcContext.getContext().setInvokers((List) invoked);
            try {
                // 执行目标方法，并获取结果
                Result result = invoker.invoke(invocation);
                if (le != null && logger.isWarnEnabled()) {
                    logger.warn("Although retry the method " + invocation.getMethodName()
                            + " in the service " + getInterface().getName()
                            + " was successful by the provider " + invoker.getUrl().getAddress()
                            + ", but there have been failed providers " + providers
                            + " (" + providers.size() + "/" + copyinvokers.size()
                            + ") from the registry " + directory.getUrl().getAddress()
                            + " on the consumer " + NetUtils.getLocalHost()
                            + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                            + le.getMessage(), le);
                }
                // 返回执行结果
                return result;
            } catch (RpcException e) {
                if (e.isBiz()) { 
                    throw e;
                }
                le = e;
            } catch (Throwable e) {
                le = new RpcException(e.getMessage(), e);
            } finally {
                providers.add(invoker.getUrl().getAddress());
            }
        }
        throw new RpcException(le != null ? le.getCode() : 0, "Failed to invoke the method "
                + invocation.getMethodName() + " in the service " + getInterface().getName()
                + ". Tried " + len + " times of the providers " + providers
                + " (" + providers.size() + "/" + copyinvokers.size()
                + ") from the registry " + directory.getUrl().getAddress()
                + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
                + Version.getVersion() + ". Last error is: "
                + (le != null ? le.getMessage() : ""), le != null && le.getCause() != null ? le.getCause() : le);
    }

}

2022年2月26日16:02:24处是当前调用的服务类所有可用的提供者集合，如当有如下2个服务提供者实例：

在这里插入图片描述

时，invokers的值如下：

在这里插入图片描述

2022年2月26日16:12:08处是检查invoker集合不为空，即有可用的服务提供者，源码如下：

class FakeCls {
    // com.alibaba.dubbo.rpc.cluster.support.AbstractClusterInvoker.checkInvokers
    protected void checkInvokers(List<Invoker<T>> invokers, Invocation invocation) {
        if (invokers == null || invokers.isEmpty()) {
            throw new RpcException("Failed to invoke the method "
                    + invocation.getMethodName() + " in the service " + getInterface().getName()
                    + ". No provider available for the service " + directory.getUrl().getServiceKey()
                    + " from registry " + directory.getUrl().getAddress()
                    + " on the consumer " + NetUtils.getLocalHost()
                    + " using the dubbo version " + Version.getVersion()
                    + ". Please check if the providers have been started and registered.");
        }
    }
}

2022年2月26日17:25:38处是根据负载均衡策略选择一个invoker来进行调用，关于负载均衡策略具体参考dubbo之负载均衡策略
一文。

2：FailfastCluster

failfast是一种快速失败的集群容错策略，当调用失败时立即返回，一般用于非幂等的写操作，源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailfastCluster
public class FailfastCluster implements Cluster {

    public final static String NAME = "failfast";

    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new FailfastClusterInvoker<T>(directory);
    }
}

对应的执行器类是FailfastClusterInvoker，具体参考1.2.1：FailfastClusterInvoker。

2.1：FailfastClusterInvoker

源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailfastClusterInvoker
// 只执行一次，当发生了执行错误时立即抛出异常信息，一般用于非幂等（non-idempotent [aɪ'dempətənt]）的写操作。
public class FailfastClusterInvoker<T> extends AbstractClusterInvoker<T> {

    public FailfastClusterInvoker(Directory<T> directory) {
        super(directory);
    }

    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        // 2022年3月10日16:24:12
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        // 执行失败则直接抛出
        try {
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            if (e instanceof RpcException && ((RpcException) e).isBiz()) {
                throw (RpcException) e;
            }
            throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failfast invoke providers " + invoker.getUrl() + " " + loadbalance.getClass().getSimpleName() + " select from all providers " + invokers + " for service " + getInterface().getName() + " method " + invocation.getMethodName() + " on consumer " + NetUtils.getLocalHost() + " use dubbo version " + Version.getVersion() + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
        }
    }
}

2022年3月10日16:24:12通过负载均衡器选择一个invoker，具体可参考dubbo之负载均衡策略一文。

3：FailsafeCluster

failsafe集群容错策略对应的dubbo实现类，当调用失败时直接忽略错误并且不抛出异常，源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailsafeCluster
public class FailsafeCluster implements Cluster {

    public final static String NAME = "failsafe";

    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new FailsafeClusterInvoker<T>(directory);
    }

}

对应的执行类是FailsafeClusterInvoker，具体参考1.3.1：FailsafeClusterInvoker。

3.1：FailsafeClusterInvoker

源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailsafeClusterInvoker
// 当执行失败时，日志记录错误信息，通过返回一个同的RpcResult的方式来忽略错误，一般用于写审计日志的场景（即记录某操作日志，小部分丢失影响不大）
public class FailsafeClusterInvoker<T> extends AbstractClusterInvoker<T> {
    private static final Logger logger = LoggerFactory.getLogger(FailsafeClusterInvoker.class);

    public FailsafeClusterInvoker(Directory<T> directory) {
        super(directory);
    }

    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            // 检查有invokers
            checkInvokers(invokers, invocation);
            // 2022年3月10日16:41:40
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            // 发生错误，记录错误日志
            logger.error("Failsafe ignore exception: " + e.getMessage(), e);
            // 返回一个空的RpcResult作为结果
            return new RpcResult(); 
        }
    }
}

2022年3月10日16:41:40通过负载均衡器选择一个invoker，具体可参考dubbo之负载均衡策略一文。

4：BroadcastCluster

broadcast集群容错策略对应的dubbo实现类，广播方式调用每个服务提供者，只有所有的成功了才算成功，只要有一个失败则失败，并抛出异常，一般用于更新服务提供者端数据状态的场景，源码如下：

// com.alibaba.dubbo.rpc.cluster.support.BroadcastCluster
public class BroadcastCluster implements Cluster {
    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new BroadcastClusterInvoker<T>(directory);
    }
}

对应的invoker是BroadcastClusterInvoker，具体参考1.4.1：BroadcastClusterInvoker。

4.1：BroadcastClusterInvoker

源码如下：

// com.alibaba.dubbo.rpc.cluster.support.BroadcastClusterInvoker
public class BroadcastClusterInvoker<T> extends AbstractClusterInvoker<T> {

    private static final Logger logger = LoggerFactory.getLogger(BroadcastClusterInvoker.class);

    public BroadcastClusterInvoker(Directory<T> directory) {
        super(directory);
    }

    @Override
    @SuppressWarnings({"unchecked", "rawtypes"})
    public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        RpcContext.getContext().setInvokers((List) invokers);
        RpcException exception = null;
        Result result = null;
        // 循环调用
        for (Invoker<T> invoker : invokers) {
            try {
                result = invoker.invoke(invocation);
            // 发生异常则记录异常信息，并打印错误日志
            } catch (RpcException e) {
                exception = e;
                logger.warn(e.getMessage(), e);
            } catch (Throwable e) {
                exception = new RpcException(e.getMessage(), e);
                logger.warn(e.getMessage(), e);
            }
        }
        // 如果是异常对象不为空，则说明在循环调用invoker时有部分发生了错误，则直接抛出该异常，认为调用失败
        if (exception != null) {
            throw exception;
        }
        // 执行到这里，说明每个invoker都执行成功了，并且这里返回的result是最后一个服务提供者invoker调用的结果
        return result;
    }
}

5：FailbackCluster

failback集群容错策略对应的dubbo实现类，对于失败的调用进行记录，定时重新调用，一般用于消息通知的场景，源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailbackCluster
public class FailbackCluster implements Cluster {

    public final static String NAME = "failback";

    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new FailbackClusterInvoker<T>(directory);
    }

}

对应的集群invoker是FailbackClusterInvoker，具体参考1.5.1：FailbackClusterInvoker。

5.1：FailbackClusterInvoker

源码如下：

// com.alibaba.dubbo.rpc.cluster.support.FailbackClusterInvoker
// 如果是失败了，则记录失败的请求，周期性的重试。这种策略对于消息提醒的服务特别有用。
public class FailbackClusterInvoker<T> extends AbstractClusterInvoker<T> {

    private static final Logger logger = LoggerFactory.getLogger(FailbackClusterInvoker.class);
    // 充实周期
    private static final long RETRY_FAILED_PERIOD = 5 * 1000;

    // 当时重试线程池
    private final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(2,
            new NamedInternalThreadFactory("failback-cluster-timer", true));
    // 失败的调用
    private final ConcurrentMap<Invocation, AbstractClusterInvoker<?>> failed = new ConcurrentHashMap<Invocation, AbstractClusterInvoker<?>>();
    private volatile ScheduledFuture<?> retryFuture;

    public FailbackClusterInvoker(Directory<T> directory) {
        super(directory);
    }
    
    // 添加到失败记录，稍后重试
    private void addFailed(Invocation invocation, AbstractClusterInvoker<?> router) {
        if (retryFuture == null) {
            synchronized (this) {
                if (retryFuture == null) {
                    retryFuture = scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {
                        @Override
                        public void run() {
                            try {
                                retryFailed();
                            } catch (Throwable t) {  
                                logger.error("Unexpected error occur at collect statistic", t);
                            }
                        }
                    }, RETRY_FAILED_PERIOD, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
                }
            }
        }
        failed.put(invocation, router);
    }
    
    // 重试失败的调用
    void retryFailed() {
        if (failed.size() == 0) {
            return;
        }
        for (Map.Entry<Invocation, AbstractClusterInvoker<?>> entry : new HashMap<Invocation, AbstractClusterInvoker<?>>(
                failed).entrySet()) {
            Invocation invocation = entry.getKey();
            Invoker<?> invoker = entry.getValue();
            try {
                invoker.invoke(invocation);
                // 调用成功，从失败调用记录中移除
                failed.remove(invocation);
            } catch (Throwable e) {
                // 重试再次失败，打印再次等待重试的日志
                logger.error("Failed retry to invoke method " + invocation.getMethodName() + ", waiting again.", e);
            }
        }
    }

    // 首次正常执行
    @Override
    protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            // 检测有可用invoker
            checkInvokers(invokers, invocation);
            // 2022年3月10日17:47:15
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            logger.error("Failback to invoke method " + invocation.getMethodName() + ", wait for retry in background. Ignored exception: "
                    + e.getMessage() + ", ", e);
            // 发生异常，记录到失败集合，稍后重试
            addFailed(invocation, this);
            // 通过返回空的RpcResult忽略错误
            return new RpcResult();
        }
    }

}

2022年3月10日17:47:15处通过负载均衡器选择一个invoker，具体可参考dubbo之负载均衡策略一文。

6：ForkingCluster

forkiing集群容错策略对应的dubbo实现类，同时调用多个个数可配置invoker，使用最先返回的结果，一般用于对读性能要求比较高的场景，但是需要消耗更多的服务器资源，源码如下：

// com.alibaba.dubbo.rpc.cluster.support.ForkingCluster
public class ForkingCluster implements Cluster {

    public final static String NAME = "forking";

    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new ForkingClusterInvoker<T>(directory);
    }

}

对应的集群调用invoker是ForkingClusterInvoker，具体参考1.6.1：ForkingClusterInvoker。

6.1：ForkingClusterInvoker

源码如下：

// com.alibaba.dubbo.rpc.cluster.support.ForkingClusterInvoker
// 并发调用指定个数的invoker，一般用于要求比较高的（demanding）实时读操作，但是会浪费更多的服务器资源。
public class ForkingClusterInvoker<T> extends AbstractClusterInvoker<T> {
    private final ExecutorService executor = Executors.newCachedThreadPool(
            new NamedInternalThreadFactory("forking-cluster-timer", true));

    public ForkingClusterInvoker(Directory<T> directory) {
        super(directory);
    }

    @Override
    @SuppressWarnings({"unchecked", "rawtypes"})
    public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            // 检测存在可用的invoker
            checkInvokers(invokers, invocation);
            // 选择的要并发调用invoker集合
            final List<Invoker<T>> selected;
            // 并发调用invoker的个数
            final int forks = getUrl().getParameter(Constants.FORKS_KEY, Constants.DEFAULT_FORKS);
            // 超时时长
            final int timeout = getUrl().getParameter(Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT);
            // 如果不是一个合法的整数，或者是超过了invoker的个数，则使用所有的invoker
            if (forks <= 0 || forks >= invokers.size()) {
                selected = invokers;
            // 从总的invoker中选择forks个
            } else {
                selected = new ArrayList<Invoker<T>>();
                // 选择forks个invoker
                for (int i = 0; i < forks; i++) {
                    Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                    // 这里有个问题，那就是如果是返回了同一个invoker则不再重复添加，比如使用一致性哈希算法来选择invoker时，选择到的永远是同一个，是不是就不会有多个并发调用的效果了？？？
                    // 以上问题经过测试不会发生，因为会通过方法com.alibaba.dubbo.rpc.cluster.support.AbstractClusterInvoker.reselect先将已经选择的invoker剔除掉
                    // ,然后再选择，因为已经剔除掉了，所以肯定不会重复选择
                    if (!selected.contains(invoker)) {
                        // 添加到选择的invoker集合中
                        selected.add(invoker);
                    }
                }
            }
            // 设置选中的invoker集合到rpc上下文中
            RpcContext.getContext().setInvokers((List) selected);
            final AtomicInteger count = new AtomicInteger();
            // 2022年3月15日17:52:09
            final BlockingQueue<Object> ref = new LinkedBlockingQueue<Object>();
            // 使用线程池同步执行选中的invoker
            for (final Invoker<T> invoker : selected) {
                executor.execute(new Runnable() {
                    @Override
                    public void run() {
                        try {
                            // 元素返回则插入到阻塞队列中
                            Result result = invoker.invoke(invocation);
                            ref.offer(result);
                        } catch (Throwable e) {
                            int value = count.incrementAndGet();
                            // 如果是所有的调用都异常了，则在阻塞队列中放入异常对象本身
                            if (value >= selected.size()) {
                                ref.offer(e);
                            }
                        }
                    }
                });
            }
            try {
                // 使用带有超时的poll方法获取队列的数据，这里获取到的就是并发调用的invoker最先返回的结果
                Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
                // 如果是所有的invoker调用都异常了，这里会为true，这里是异常信息是最后一个返回的invoker调用的异常信息
                if (ret instanceof Throwable) {
                    Throwable e = (Throwable) ret;
                    throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
                }
                return (Result) ret;
            } catch (InterruptedException e) {
                throw new RpcException("Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e);
            }
        } finally {
            RpcContext.getContext().clearAttachments();
        }
    }
}