Dubbo——集群容错解析

Tarzan写bug

已于 2022-08-21 16:43:03 修改

阅读量248

点赞数

文章标签： dubbo java 分布式

于 2022-08-21 16:33:39 首次发布

本文链接：https://blog.csdn.net/weixin_39544175/article/details/126452186

版权

Dubbo——集群容错解析

1. 解析目的

在设计分布式中间件时，在集群模式下提供容错机制的思考；
在项目中使用了Dubbo的，更加了解Dubbo的容错机制，比如常见的重试机制在写操作时造成的问题。

2. 集群容错

即集群环境下，当调用其中一个服务提供者异常时，是进行重试？还是返回异常？Dubbo提供了多种策略，每种策略对应不同的场景。

3. 源码分析

集群封装接口Cluster: 主要用于抽象集群，实现类中主要是生成对应的ClusterInvoker接口实现类

@SPI(Cluster.DEFAULT)
public interface Cluster {
    String DEFAULT = FailoverCluster.NAME;

    /**
     * 获取一个ClusterInvoker
     */
    @Adaptive
    <T> Invoker<T> join(Directory<T> directory) throws RpcException;

    /**
     * 通过名称SPI获取Cluster接口实现类
     */
    static Cluster getCluster(String name) {
        return getCluster(name, true);
    }

    static Cluster getCluster(String name, boolean wrap) {
        if (StringUtils.isEmpty(name)) {
            name = Cluster.DEFAULT;
        }
        // 通过SPI形式获取Cluster实现类
        return ExtensionLoader.getExtensionLoader(Cluster.class).getExtension(name, wrap);
    }
}

集群抽象类AbstractCluster: 使用到模板方法，定义基本的逻辑框架

	@Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return buildClusterInterceptors(doJoin(directory), directory.getUrl().getParameter(REFERENCE_INTERCEPTOR_KEY));
    }

    /**
   	 * 获取ClusterInvoker实现类
     */
    protected abstract <T> AbstractClusterInvoker<T> doJoin(Directory<T> directory) throws RpcException;

具体返回ClusterInvoker服务提供者的是doJoin()，所以AbstractCluster的子类都是实现这个接口，而这个接口的实现逻辑都是实例化对应的ClusterInvoker.

如FailoverCluster:

	@Override
    public <T> AbstractClusterInvoker<T> doJoin(Directory<T> directory) throws RpcException {
        return new FailoverClusterInvoker<>(directory);
    }

再如FailbackCluster:

	@Override
    public <T> AbstractClusterInvoker<T> doJoin(Directory<T> directory) throws RpcException {
        return new FailbackClusterInvoker<>(directory);
    }

其他的AbstractCluster子类的doJoin()也是同样简单，都是创建对应的ClusterInvoker实现。

集群服务提供者接口ClusterInvoker：集群容错的主要逻辑实现，从继承Invoker接口可以看出这里设计是将ClusterInvoker看成是一个服务提供者
```
public interface ClusterInvoker<T> extends Invoker<T> {
    URL getRegistryUrl();

    Directory<T> getDirectory();
}
```

集群服务提供者抽象类AbstractClusterInvoker：定义集群服务提供者的调用模板，由于是一个服务提供者，可以直接看invoker()

	@Override
    public Result invoke(final Invocation invocation) throws RpcException {
        // 检查服务提供者是否已被销毁
        checkWhetherDestroyed();

        // 绑定参数到请求
        Map<String, Object> contextAttachments = RpcContext.getContext().getObjectAttachments();
        if (contextAttachments != null && contextAttachments.size() != 0) {
            ((RpcInvocation) invocation).addObjectAttachments(contextAttachments);
        }
        // 获取服务提供者列表
        List<Invoker<T>> invokers = list(invocation);
        // 获取负载均衡对象
        LoadBalance loadbalance = initLoadBalance(invokers, invocation);
        RpcUtils.attachInvocationIdIfAsync(getUrl(), invocation);
        // 调用doInvoker方法实现远程调用，子类主要实现这个方法来定义不同集群的容错策略
        return doInvoke(invocation, invokers, loadbalance);
    }

	protected List<Invoker<T>> list(Invocation invocation) throws RpcException {
        return directory.list(invocation);
    }

    /**
     * 初始化负载均衡对象
     */
    protected LoadBalance initLoadBalance(List<Invoker<T>> invokers, Invocation invocation) {
        // 如果服务提供者集合不为空，则通过SPI获取负载均衡对象，默认加权随机负载均衡
        if (CollectionUtils.isNotEmpty(invokers)) {
            return ExtensionLoader.getExtensionLoader(LoadBalance.class).getExtension(invokers.get(0).getUrl()
                    .getMethodParameter(RpcUtils.getMethodName(invocation), LOADBALANCE_KEY, DEFAULT_LOADBALANCE));
        } else {
            // 否则返回加权随机负载均衡
            return ExtensionLoader.getExtensionLoader(LoadBalance.class).getExtension(DEFAULT_LOADBALANCE);
        }
    }

逻辑框架：先获取服务提供者列表，然后获取负载均衡策略，最后调用抽象方法doInvoke()。所以AbstractClusterInvoker的子类主要实现这个抽象方法。

在这里插入图片描述

4. 失败重试(failover)

失败重试是Dubbo默认的集群容错策略，当调用服务提供者失败时，根据设定的重试次数（默认是2次），重新选择服务提供者进行调用。适合应用于查询的场景，当个别服务提供者因为机器或网络原因造成调用失败时，去重试其他服务提供者，从而不影响此次请求失败。注意在执行创建或更新请求时，保证接口的幂等性。

FailoverClusterInvoker源码分析:

	@Override
    @SuppressWarnings({"unchecked", "rawtypes"})
    public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        List<Invoker<T>> copyInvokers = invokers;
        // 检查服务提供者是否为空
        checkInvokers(copyInvokers, invocation);
        String methodName = RpcUtils.getMethodName(invocation);
        // 重试次数+1，包含当前请求
        int len = getUrl().getMethodParameter(methodName, RETRIES_KEY, DEFAULT_RETRIES) + 1;
        if (len <= 0) {
            len = 1;
        }
        // 记录最后的异常
        RpcException le = null;
        // 保存调用过的服务提供者
        List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyInvokers.size());
        // 保存调用过的服务提供者地址
        Set<String> providers = new HashSet<String>(len);
        // 根据重试次数+1循环
        for (int i = 0; i < len; i++) {
            // 进行重试的时候重新获取服务提供者进行校验
            if (i > 0) {
                checkWhetherDestroyed();
                // 重新获取服务提供者
                copyInvokers = list(invocation);
                // 再次检查服务提供者
                checkInvokers(copyInvokers, invocation);
            }
            // 选择获取服务提供者
            Invoker<T> invoker = select(loadbalance, invocation, copyInvokers, invoked);
            invoked.add(invoker);
            RpcContext.getContext().setInvokers((List) invoked);
            try {
                // 服务提供者远程调用
                Result result = invoker.invoke(invocation);
                // 重试的时候打印之前的异常
                if (le != null && logger.isWarnEnabled()) {
                    logger.warn("Although retry the method " + methodName
                            + " in the service " + getInterface().getName()
                            + " was successful by the provider " + invoker.getUrl().getAddress()
                            + ", but there have been failed providers " + providers
                            + " (" + providers.size() + "/" + copyInvokers.size()
                            + ") from the registry " + directory.getUrl().getAddress()
                            + " on the consumer " + NetUtils.getLocalHost()
                            + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                            + le.getMessage(), le);
                }
                return result;
            } catch (RpcException e) {
                // 业务异常则直接抛出
                if (e.isBiz()) {
                    throw e;
                }
                // 异常暂存到le变量中
                le = e;
            } catch (Throwable e) {
                // 其他异常则封装成RpcException
                le = new RpcException(e.getMessage(), e);
            } finally {
                // 加到服务提供者集合中
                providers.add(invoker.getUrl().getAddress());
            }
        }
        // 经过重试后仍失败则将最后的异常抛出
        throw new RpcException(le.getCode(), "Failed to invoke the method "
                + methodName + " in the service " + getInterface().getName()
                + ". Tried " + len + " times of the providers " + providers
                + " (" + providers.size() + "/" + copyInvokers.size()
                + ") from the registry " + directory.getUrl().getAddress()
                + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
                + Version.getVersion() + ". Last error is: "
                + le.getMessage(), le.getCause() != null ? le.getCause() : le);
    }

这里在选择服务提供者的时候调用了AbstractClusterInvoker中的select()，该方法放在抽象类中说明其他实现类也会用到，之所以放在这里进行分析，是因为需要结合传参进行理解。分析下是如何选择服务提供者的

	/**
     * 选择服务提供者
     */
    protected Invoker<T> select(LoadBalance loadbalance, Invocation invocation,
                                List<Invoker<T>> invokers, List<Invoker<T>> selected) throws RpcException {

        if (CollectionUtils.isEmpty(invokers)) {
            return null;
        }
        String methodName = invocation == null ? StringUtils.EMPTY_STRING : invocation.getMethodName();
        // 获取sticky(粘滞)参数
        boolean sticky = invokers.get(0).getUrl()
                .getMethodParameter(methodName, CLUSTER_STICKY_KEY, DEFAULT_CLUSTER_STICKY);

        // 如果stickyInvoker不为null且服务提供者集合不包括该stickyInvoker
        // 表示stickInvoker已经宕机了，则将stickyInvoker置为null
        if (stickyInvoker != null && !invokers.contains(stickyInvoker)) {
            stickyInvoker = null;
        }

        // 当sticky=true且stickyInvoker不等于null，表示stickyInvoker还存在
        // 且已被选择的服务提供者为null或不包含stickyInvoker,表示stickyInvoker没有被选择过，返回粘滞服务提供者stickyInvoker
        // 当开启服务提供者的可靠性检测时需要检测粘滞服务提供者的可靠性
        if (sticky && stickyInvoker != null && (selected == null || !selected.contains(stickyInvoker))) {
            if (availablecheck && stickyInvoker.isAvailable()) {
                return stickyInvoker;
            }
        }

        // 根据负载均衡选择服务提供者
        Invoker<T> invoker = doSelect(loadbalance, invocation, invokers, selected);

        // 如果开启了sticky则将invoker赋值给stickInvoker
        if (sticky) {
            stickyInvoker = invoker;
        }
        return invoker;
    }

	private Invoker<T> doSelect(LoadBalance loadbalance, Invocation invocation,
                                List<Invoker<T>> invokers, List<Invoker<T>> selected) throws RpcException {

        if (CollectionUtils.isEmpty(invokers)) {
            return null;
        }
        if (invokers.size() == 1) {
            return invokers.get(0);
        }
        // 根据负载均衡策略返回服务提供者
        Invoker<T> invoker = loadbalance.select(invokers, getUrl(), invocation);

        // 已经选择过的服务提供者不为空且invoker在其中，表示invoker已经之前被调用过
        // 或者invoker没通过可靠性检测
        // 以上两种情况要进行重选
        if ((selected != null && selected.contains(invoker))
                || (!invoker.isAvailable() && getUrl() != null && availablecheck)) {
            try {
                // 重选
                Invoker<T> rInvoker = reselect(loadbalance, invocation, invokers, selected, availablecheck);
                if (rInvoker != null) {
                    invoker = rInvoker;
                } else {
                    // 获取invoker在服务提供者集合中的下标，返回下标+1的服务提供者
                    // 如果下标是最后一个，则返回第一个服务提供者
                    int index = invokers.indexOf(invoker);
                    try {
                        //Avoid collision
                        invoker = invokers.get((index + 1) % invokers.size());
                    } catch (Exception e) {
                        logger.warn(e.getMessage() + " may because invokers list dynamic change, ignore.", e);
                    }
                }
            } catch (Throwable t) {
                logger.error("cluster reselect fail reason is :" + t.getMessage() + " if can not solve, you can set cluster.availablecheck=false in url", t);
            }
        }
        return invoker;
    }

	/**
     * 重选
     */
    private Invoker<T> reselect(LoadBalance loadbalance, Invocation invocation,
                                List<Invoker<T>> invokers, List<Invoker<T>> selected, boolean availablecheck) throws RpcException {

        // 因为重选，所以排除之前选择的一个
        List<Invoker<T>> reselectInvokers = new ArrayList<>(
                invokers.size() > 1 ? (invokers.size() - 1) : invokers.size());

        // 循环服务提供者集合，过滤掉不在selected中的
        for (Invoker<T> invoker : invokers) {
            // 过滤不可靠的服务提供者
            if (availablecheck && !invoker.isAvailable()) {
                continue;
            }
            // 已被选择的服务提供者集合为null或不包含循环的服务提供者，则加入到reselectInvokers集合
            if (selected == null || !selected.contains(invoker)) {
                reselectInvokers.add(invoker);
            }
        }

        // 通过负载均衡选择服务提供者
        if (!reselectInvokers.isEmpty()) {
            return loadbalance.select(reselectInvokers, getUrl(), invocation);
        }

        // 当服务提供者都被选择过了，则在已被选择的服务提供者中过滤掉不可靠的
        if (selected != null) {
            for (Invoker<T> invoker : selected) {
                if ((invoker.isAvailable()) // available first
                        && !reselectInvokers.contains(invoker)) {
                    reselectInvokers.add(invoker);
                }
            }
        }
        // 通过负载均衡选择服务提供者
        if (!reselectInvokers.isEmpty()) {
            return loadbalance.select(reselectInvokers, getUrl(), invocation);
        }

        return null;
    }

选择服务提供者的过程：优先考虑粘滞，通过负载均衡策略选择服务提供者，若选择出来的服务提供者已经被调用过了则进行重选。

5. 快速失败(failfast)

快速失败是调用一次服务提供者报错后，直接抛出。适用于创建和更新操作。

FailfastClusterInvoker源码分析：

	@Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        // 检查服务提供者是否为空
        checkInvokers(invokers, invocation);
        // 选择获取服务提供者
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        try {
            // 返回调用结果
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            // 业务异常则直接抛出
            if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
                throw (RpcException) e;
            }
            // 封装成RpcException抛出
            throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0,
                    "Failfast invoke providers " + invoker.getUrl() + " " + loadbalance.getClass().getSimpleName()
                            + " select from all providers " + invokers + " for service " + getInterface().getName()
                            + " method " + invocation.getMethodName() + " on consumer " + NetUtils.getLocalHost()
                            + " use dubbo version " + Version.getVersion()
                            + ", but no luck to perform the invocation. Last error is: " + e.getMessage(),
                    e.getCause() != null ? e.getCause() : e);
        }
    }

6. 失败安全(failsafe)

失败安全是调用服务提供者报错后，打印日志而不抛出。

FailsafeClusterInvoker源码分析：

	@Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            // 检查服务提供者是否为空
            checkInvokers(invokers, invocation);
            // 选择服务提供者
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            // 返回服务提供者调用结果
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            // 异常则记录日志，不抛出异常
            logger.error("Failsafe ignore exception: " + e.getMessage(), e);
            // 返回空对象
            return AsyncRpcResult.newDefaultAsyncResult(null, null, invocation);
        }
    }

7. 失败后台重试(failback)

这种策略是失败重试+失败安全的结合体，在调用服务提供者失败后返回空对象，并开启定时任务进行重试。

	@Override
    protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        Invoker<T> invoker = null;
        try {
            // 校验服务提供者是否为空
            checkInvokers(invokers, invocation);
            // 选择服务提供者
            invoker = select(loadbalance, invocation, invokers, null);
            // 服务提供者调用
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            // 报错打印异常，不抛出
            logger.error("Failback to invoke method " + invocation.getMethodName() + ", wait for retry in background. Ignored exception: "
                    + e.getMessage() + ", ", e);
            // 加入到后台定时任务中
            addFailed(loadbalance, invocation, invokers, invoker);
            // 返回空对象
            return AsyncRpcResult.newDefaultAsyncResult(null, null, invocation);
        }
    }
    
    private void addFailed(LoadBalance loadbalance, Invocation invocation, List<Invoker<T>> invokers, Invoker<T> lastInvoker) {
        // 初始化定时器
        if (failTimer == null) {
            synchronized (this) {
                if (failTimer == null) {
                    failTimer = new HashedWheelTimer(
                            new NamedThreadFactory("failback-cluster-timer", true),
                            1,
                            TimeUnit.SECONDS, 32, failbackTasks);
                }
            }
        }
        // 实例化任务
        RetryTimerTask retryTimerTask = new RetryTimerTask(loadbalance, invocation, invokers, lastInvoker, retries, RETRY_FAILED_PERIOD);
        try {
            // 定时器加上任务
            failTimer.newTimeout(retryTimerTask, RETRY_FAILED_PERIOD, TimeUnit.SECONDS);
        } catch (Throwable e) {
            logger.error("Failback background works error,invocation->" + invocation + ", exception: " + e.getMessage());
        }
    }

这里不对定时器HashedWheelTimer和定时任务RetryTimerTask做详细解析，后面将单独一篇文章讲解。

8. 并行调用(forking)

并行调用会对多个服务提供者同时进行调用，只要其中一个服务提供者返回结果则返回，有异常直接抛出。适用于对查询响应速度要求高的场景。

ForkingClusterInvoker源码分析：

    @Override
    @SuppressWarnings({"unchecked", "rawtypes"})
    public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            // 检查服务提供者是否为空
            checkInvokers(invokers, invocation);
            final List<Invoker<T>> selected;
            // 获取并发调用数，默认为2
            final int forks = getUrl().getParameter(FORKS_KEY, DEFAULT_FORKS);
            // 获取超时时间，默认1秒
            final int timeout = getUrl().getParameter(TIMEOUT_KEY, DEFAULT_TIMEOUT);
            // 并发调用数小于0或并发调用数大于等于服务提供者数量
            if (forks <= 0 || forks >= invokers.size()) {
                // 则将服务提供者集合赋值给被选择的服务提供者
                selected = invokers;
            } else {
                // forks>0且小于服务提供者数量时，表示需要在服务提供者中选择forks个提供者出来
                selected = new ArrayList<>(forks);
                // 直到selected的数量等于forks数量
                while (selected.size() < forks) {
                    // 选择一个服务提供者
                    Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                    // 没有包含当前服务提供者则加入到被选择的服务提供者中
                    // 这里预防选择服务提供者时选择到同一个
                    if (!selected.contains(invoker)) {
                        selected.add(invoker);
                    }
                }
            }
            RpcContext.getContext().setInvokers((List) selected);
            // 记录异常的次数，这里用Atomic因为是多线程调用多个服务提供者的
            final AtomicInteger count = new AtomicInteger();
            final BlockingQueue<Object> ref = new LinkedBlockingQueue<>();
            // 循环被选择的服务提供者
            for (final Invoker<T> invoker : selected) {
                // 在线程池中调用服务提供者，将结果放到阻塞队列中
                executor.execute(() -> {
                    try {
                        Result result = invoker.invoke(invocation);
                        ref.offer(result);
                    } catch (Throwable e) {
                        // 返回异常时，count自增
                        int value = count.incrementAndGet();
                        // 当异常的次数大于等于被选择服务提供者次数时，将异常放到队列中
                        if (value >= selected.size()) {
                            ref.offer(e);
                        }
                    }
                });
            }
            try {
                // 等待队列结果，设置了等待超时时间
                Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
                // 当结果是异常的时候，封装成RpcException抛出
                if (ret instanceof Throwable) {
                    Throwable e = (Throwable) ret;
                    throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
                }
                // 返回调用结果
                return (Result) ret;
            } catch (InterruptedException e) {
                // 等待队列超时
                throw new RpcException("Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e);
            }
        } finally {
            // clear attachments which is binding to current thread.
            RpcContext.getContext().clearAttachments();
        }
    }

forking策略逻辑主要是先选择获取并行调用的服务提供者，然后将选择好后的服务提供者放到线程池中调用，将调用结果放到阻塞队列中，再从阻塞队列中获取结果。

9. 广播调用(broadcast)

广播调用会循环调用所有的服务提供者，只有当所有的服务提供者返回结果才返回，否则抛错。适用于更新缓存等操作。

BroadcastClusterInvoker源码分析：

	@Override
    @SuppressWarnings({"unchecked", "rawtypes"})
    public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        // 检查服务提供者是否为空
        checkInvokers(invokers, invocation);
        RpcContext.getContext().setInvokers((List) invokers);
        RpcException exception = null;
        Result result = null;
        // 循环服务提供者
        for (Invoker<T> invoker : invokers) {
            try {
                // 调用服务提供者
                result = invoker.invoke(invocation);
            } catch (RpcException e) {
                // 异常记录
                exception = e;
                logger.warn(e.getMessage(), e);
            } catch (Throwable e) {
                exception = new RpcException(e.getMessage(), e);
                logger.warn(e.getMessage(), e);
            }
        }
        // 有异常则抛出
        if (exception != null) {
            throw exception;
        }
        // 返回调用结果
        return result;
    }

10. 可靠性调用(available)

即只有对可靠的服务提供者进行调用，如果没有发现可靠的服务提供者则抛错。适用于避免无效的调用。

AvailableClusterInvoker源码分析：

	@Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        // 循环服务提供者，如果服务提供者可靠则调用，没有可靠的服务提供者则返回异常
        for (Invoker<T> invoker : invokers) {
            if (invoker.isAvailable()) {
                return invoker.invoke(invocation);
            }
        }
        throw new RpcException("No provider available in " + invokers);
    }

前面的几种策略源码是不是很简单，接下来放个大招

11. 结果合并(mergeable)

结果合并是请求多个服务提供者，将返回结果合并返回给客户端。

MergeableClusterInvoker源码分析：

	@Override
    protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        // 检查服务提供者是否为空
        checkInvokers(invokers, invocation);
        // 获取merger属性值
        String merger = getUrl().getMethodParameter(invocation.getMethodName(), MERGER_KEY);
        // 如果merger属性值为空，则只选择一个group进行调用
        if (ConfigUtils.isEmpty(merger)) {
            for (final Invoker<T> invoker : invokers) {
                if (invoker.isAvailable()) {
                    try {
                        return invoker.invoke(invocation);
                    } catch (RpcException e) {
                        if (e.isNoInvokerAvailableAfterFilter()) {
                            log.debug("No available provider for service" + getUrl().getServiceKey() + " on group " + invoker.getUrl().getParameter(GROUP_KEY) + ", will continue to try another group.");
                        } else {
                            throw e;
                        }
                    }
                }
            }
            return invokers.iterator().next().invoke(invocation);
        }

        // 获取方法返回值类型
        Class<?> returnType;
        try {
            returnType = getInterface().getMethod(
                    invocation.getMethodName(), invocation.getParameterTypes()).getReturnType();
        } catch (NoSuchMethodException e) {
            returnType = null;
        }

        // 服务提供者调用结果集
        Map<String, Result> results = new HashMap<>();
        for (final Invoker<T> invoker : invokers) {
            RpcInvocation subInvocation = new RpcInvocation(invocation, invoker);
            subInvocation.setAttachment(ASYNC_KEY, "true");
            results.put(invoker.getUrl().getServiceKey(), invoker.invoke(subInvocation));
        }

        Object result = null;

        List<Result> resultList = new ArrayList<Result>(results.size());

        // 循环调用结果集
        for (Map.Entry<String, Result> entry : results.entrySet()) {
            // 获取返回结果
            Result asyncResult = entry.getValue();
            try {
                Result r = asyncResult.get();
                // 如果异常则打印异常
                if (r.hasException()) {
                    log.error("Invoke " + getGroupDescFromServiceKey(entry.getKey()) +
                                    " failed: " + r.getException().getMessage(),
                            r.getException());
                } else {
                    // 添加到结果集合中
                    resultList.add(r);
                }
            } catch (Exception e) {
                // 调用失败抛异常
                throw new RpcException("Failed to invoke service " + entry.getKey() + ": " + e.getMessage(), e);
            }
        }

        // 结果集为空返回空对象
        if (resultList.isEmpty()) {
            return AsyncRpcResult.newDefaultAsyncResult(invocation);
        } else if (resultList.size() == 1) {
            // 只有一个结果集则直接返回
            return resultList.iterator().next();
        }

        // 返回值类型为Void则返回空对象
        if (returnType == void.class) {
            return AsyncRpcResult.newDefaultAsyncResult(invocation);
        }

        // 返回值类型中的方法做为合并方法
        if (merger.startsWith(".")) {
            // 获取方法名
            merger = merger.substring(1);
            Method method;
            try {
                // 反射获取方法
                method = returnType.getMethod(merger, returnType);
            } catch (NoSuchMethodException e) {
                throw new RpcException("Can not merge result because missing method [ " + merger + " ] in class [ " +
                        returnType.getName() + " ]");
            }
            if (!Modifier.isPublic(method.getModifiers())) {
                method.setAccessible(true);
            }
            // 移除结果集中的第一个元素
            result = resultList.remove(0).getValue();
            try {
                // 当合并方法的返回值不等于Void,且合并方法的返回值跟调用方法的返回值类型相同
                if (method.getReturnType() != void.class
                        && method.getReturnType().isAssignableFrom(result.getClass())) {
                    // 循环结果集做为参数执行合并方法
                    for (Result r : resultList) {
                        result = method.invoke(result, r.getValue());
                    }
                } else {
                    // 当合并方法返回值为Void且返回值跟调用方法的返回值类型不同时，只执行合并方法，不保留结果
                    for (Result r : resultList) {
                        method.invoke(result, r.getValue());
                    }
                }
            } catch (Exception e) {
                throw new RpcException("Can not merge result: " + e.getMessage(), e);
            }
        } else {
            // 指定合并方法
            Merger resultMerger;
            if (ConfigUtils.isDefault(merger)) {
                resultMerger = MergerFactory.getMerger(returnType);
            } else {
                resultMerger = ExtensionLoader.getExtensionLoader(Merger.class).getExtension(merger);
            }
            if (resultMerger != null) {
                List<Object> rets = new ArrayList<Object>(resultList.size());
                for (Result r : resultList) {
                    rets.add(r.getValue());
                }
                // 执行合并方法
                result = resultMerger.merge(
                        rets.toArray((Object[]) Array.newInstance(returnType, 0)));
            } else {
                throw new RpcException("There is no merger to merge result.");
            }
        }
        return AsyncRpcResult.newDefaultAsyncResult(result, invocation);
    }

使用举例说明：

一个服务有两个实现，分为不同的组

@Reference(group="spring,go")
private OrderService orderSerivce;

合并有两种形式：

根据方法返回值类型指定合并方法，即合并方法定义在返回值类中。merger参数值为.方法名
```
ReferenceConfig config;
Map map = new HashMap();
map.put("merger", ".merger");
config.setParameters(map);
```
设定合并方法返回值类型，合并方法可不在返回值类中。merger参数值为合并返回值数据类型
```
ReferenceConfig config;
Map map = new HashMap();
map.put("merger", "list");
config.setParameters(map);
```

12. 区域性选择(zone-aware)

区域性选择就是对多注册中心的情况按区域选择。

ZoneAwareClusterInvoker源码分析：

	@Override
    @SuppressWarnings({"unchecked", "rawtypes"})
    public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        for (Invoker<T> invoker : invokers) {
            ClusterInvoker<T> clusterInvoker = (ClusterInvoker<T>) invoker;
            // 如果有服务提供者有设置preferred=true的时候优先调用这个服务提供者
            if (clusterInvoker.isAvailable() && clusterInvoker.getRegistryUrl()
                    .getParameter(REGISTRY_KEY + "." + PREFERRED_KEY, false)) {
                return clusterInvoker.invoke(invocation);
            }
        }
        
        // 获取调用上的区域
        String zone = invocation.getAttachment(REGISTRY_ZONE);
        if (StringUtils.isNotEmpty(zone)) {
            for (Invoker<T> invoker : invokers) {
                ClusterInvoker<T> clusterInvoker = (ClusterInvoker<T>) invoker;
                // 调用区域相同的服务提供者
                if (clusterInvoker.isAvailable() && zone.equals(clusterInvoker.getRegistryUrl().getParameter(REGISTRY_KEY + "." + ZONE_KEY))) {
                    return clusterInvoker.invoke(invocation);
                }
            }
            // 如果设定了registry_zone_force=true则表示没有找到区域就抛出异常
            String force = invocation.getAttachment(REGISTRY_ZONE_FORCE);
            if (StringUtils.isNotEmpty(force) && "true".equalsIgnoreCase(force)) {
                throw new IllegalStateException("No registry instance in zone or no available providers in the registry, zone: "
                        + zone
                        + ", registries: " + invokers.stream().map(invoker -> ((MockClusterInvoker<T>) invoker).getRegistryUrl().toString()).collect(Collectors.joining(",")));
            }
        }

        
        // 根据负载均衡选择服务提供者，如果选择的服务提供者可靠则调用
        Invoker<T> balancedInvoker = select(loadbalance, invocation, invokers, null);
        if (balancedInvoker.isAvailable()) {
            return balancedInvoker.invoke(invocation);
        }
        
        // 选择一个可靠的服务提供者调用
        for (Invoker<T> invoker : invokers) {
            ClusterInvoker<T> clusterInvoker = (ClusterInvoker<T>) invoker;
            if (clusterInvoker.isAvailable()) {
                return clusterInvoker.invoke(invocation);
            }
        }
        
        // 选择第一个服务提供者调用
        return invokers.get(0).invoke(invocation);
    }

使用举例说明：

优先选择preferred=true的注册中心

<dubbo:registry address="zookeeper://127.0.0.1:2181" preferred="true"/>

优先选择同区域的注册中心

<dubbo:registry address="zookeeper://127.0.0.1:2181" zone="guangdong"/>
<dubbo:registry address="zookeeper://127.0.0.1:2181" zone="shanghai"/>

13. 各种集群容错策略应用场景

集群容错策略	策略内容	应用场景
失败重试(Failover)	Dubbo缺省集群容错策略；调用失败后对其他服务提供者进行重试	查询请求
快速失败(Failfast)	调用失败后抛出异常	创建或更新请求
失败安全(Failsafe)	调用失败后不抛出异常，打印日志，返回空对象	写入审计日志，后续进行人工重试
失败后台重试(Failback)	调用失败后打印日志，返回空对象，后台开启定时器进行重试	不影响核心业务，比如注册后发短信等
并行调用(Forking)	同时对多个服务提供者调用，其中一个调用成功后则返回，当都调用失败则抛出异常	对响应速度要求比较高的请求
广播调用(Broadcast)	对全部服务提供者进行调用，只有全部服务提供者成功才返回，否则抛出异常	需要所有服务提供者都处理的请求，比如对本地缓存的操作
可靠性调用(Available)	只对可靠的服务提供者进行调用	避免无效的调用，不需要负载均衡的场景
结果合并(mergeable)	对多个服务提供者进行调用，并将结果进行合并	菜单服务
区域性选择(zone-aware)	可根据区域选择注册中心	多注册中心

14. 对方法级别设置重试次数

问题：Dubbo默认的集群策略是失败重试，默认重试2次。一个service接口一般会有查询、创建、更新等方法，而查询需要重试，但创建，更新方法一般不需要重试，这时就需要对方法级别设置重试次数，这里用注解的形式。

@DubboService(parameters = {"insertMethod.retries", "0"})

参考链接：

https://dubbo.apache.org/zh/docsv2.7/dev/source/cluster/

谢谢阅读，就分享到这，未完待续…

欢迎同频共振的那一部分人

作者公众号：Tarzan写bug

Tarzan写bug

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Dubbo——集群容错解析

Dubbo集群容错源码分析，及使用场景
复制链接

扫一扫