springcloud-ribbon重试机制详解

最新推荐文章于 2024-08-06 00:54:28 发布

sunxy24

最新推荐文章于 2024-08-06 00:54:28 发布

阅读量4.3k

点赞数 11

分类专栏： SpringCloud 文章标签： springcloud ribbon ribbon重试 spring-retry

本文链接：https://blog.csdn.net/sunxy24/article/details/87839856

版权

SpringCloud 专栏收录该内容

2 篇文章 1 订阅

订阅专栏

一、版本信息

springboot：2.0.5.RELEASE
springcloud：Finchley.RELEASE

二、配置说明

spring-retry依赖的导入：

 		<!--重试依赖-->
        <dependency>
            <groupId>org.springframework.retry</groupId>
            <artifactId>spring-retry</artifactId>
        </dependency>

给RestTemplate bean，加上@LoadBalanced注解。(不过这个注解主要是提供负载均衡的功能，重试主要不是因为这个注解而生效的，这篇文章就不讲这个注解了)

@Configuration
public class BeanConfigs {

    @Bean
    @LoadBalanced  //加了这个注解才能通过注册中心注册的服务名来进行访问服务。

    RestTemplate restTemplate() {
        return new RestTemplate();
    }
}

ribbon重试配置：

ribbon:
  OkToRetryOnAllOperations: true # 是否对所有请求都进行重试
  MaxAutoRetries: 0     #重试次数
  MaxAutoRetriesNextServer: 2   #重试切换实例次数
  ConnectTimeout: 250
  ReadTimeout: 300
  retryableStatusCodes: 503,500 # 这里不配置其实也可以，不配置的时候，仅仅当请求服务实例报错的时候重试，配置了的时候，当请求服务实例出现这里指定的状态，也会重试

这里为什么要引spring-retry包，ribbon的配置为什么这么写，在下文中都有详细讲解。

三、源码跟进

重试相关代码定位

根据引包，找到ribbon的springboot自动配置类。

在这里插入图片描述

在自动配置类中找到了重试相关的初始化配置。

在这里插入图片描述

这里会初始化一个RibbonLoadBalancedRetryFactory Bean到spring容器。(这个bean灰常重要，在下面会经常用到)

注意这里的@ConditionalOnClass(name = "org.springframework.retry.support.RetryTemplate")
在这里插入图片描述
这就是想开启ribbon的重试，必须引用spring-retry依赖的原因。

由于ribbon的重试是在负载均衡功能里面的一项功能（默认是关闭的），我们也来看看ribbon的负载均衡相关配置。

ribbon的负载均衡功能并不是由spring-cloud-starter-netflix-ribbon依赖提供的，是由spring-cloud-commons包提供的，所以我们在这个commons包中可以找到负载均衡自动配置类 LoadBalancerAutoConfiguration。

在这里插入图片描述
我们观察一下LoadBalancerAutoConfiguration里的这两个内部类：

内部类1：

在这里插入图片描述

内部类2：

在这里插入图片描述

这两个内部类就是ribbon实现负载均衡的秘密所在。每个内部类中的`ribbonInterceptor`方法，提供了一个负载均衡拦截器，`restTemplateCustomizer`方法，给resttemplate加了上述的拦截器。因此当使用resttemplate进行http通信时，拦截器都会先拦截http请求或者返回，进行负载均衡处理。

  很明显的可以看出来，内部类2是带有重试机制的负载均衡配置类。当工程引用了`spring-retry`依赖时，内部类1则不会实例化，只会实例化内部类2。

因此我们的重点就来到了`RetryLoadBalancerInterceptor`这个拦截器上，就是它给http请求加上了负载均衡和重试功能。我们看看这里初始化`RetryLoadBalancerInterceptor`的几个参数。

loadBalancerClient ：负载均衡的核心实现类，主要功能是选择服务的实例。
properties ： application.yml或者application.properties配置文件中的spring.cloud.loadbalancer.retry配置，只有一个key为enabled，value默认为true。
requestFactory 根据普通的http请求创建负载均衡的请求的工厂类。
loadBalancedRetryFactory 就是第2点中注册进ioc容器中的RibbonLoadBalancedRetryFactory bean，用于生成一些重试相关的策略和重试动作的监听器等。

小结：这里我们定位到关键类RibbonLoadBalancedRetryFactory和RetryLoadBalancerInterceptor。

源码调试

接下来我们具体debug模式一步步调试一下重试相关的代码。准备一个eureka注册中心，两个eureka client实例，端口分别为8091和8092，一个ribbon消费者。

eureka-client服务提供方的代码如下：

    @Value("${server.port}")
    private String port;

	/**
     * 该接口用于测试feign/ribbon的body传参
     * @param user
     * @return
     */
    @PostMapping("/eureka_hello2")
    public User test2(@RequestBody User user) {

        return new User(user.getName(), user.getAge(), port);
    }

ribbon-consumer服务的请求代码如下：

    @PostMapping
    public User hello2(@RequestBody User user) {
        return restTemplate.postForObject("http://eureka-client/eureka-client/eureka_hello2", user, User.class);
    }

在这里插入图片描述
先模拟一遍正常负载均衡调用(只看重点代码)：

发送post请求，进入到resttemplet的请求发送函数：

在这里插入图片描述

进入到RetryLoadBalancerInterceptor拦截器的intercept方法：

先生成一个重试机制：

在这里插入图片描述

备注：每次请求之前，ribbon也不知道这次请求要不要重试，所以先实例化出policy，等到请求到服务实例这一步的时候再根据policy，由policy的retry函数来决定这次要不要重试。

在RibbonLoadBalancedRetryFactory类的createRetryPolicy方法中，new了一个重试策略出来。

我们看看RibbonLoadBalancedRetryPolicy的构造函数的4个参数：

service：服务名，就是eureka注册的服务名
lbContext : ribbon配置上下文。这个类继承自LoadBalancerContext类，在LoadBalancerContext类里提供了initWithNiwsConfig方法，可以通过下面讲的第4个参数iclientconfig，来初始化ribbon相关配置。

/**
     * Set necessary parameters from client configuration and register with Servo monitors.
     */
    @Override
    public void initWithNiwsConfig(IClientConfig clientConfig) {
        if (clientConfig == null) {
            return;    
        }
        clientName = clientConfig.getClientName();
        if (clientName == null) {
            clientName = "default";
        }
        vipAddresses = clientConfig.resolveDeploymentContextbasedVipAddresses();
        maxAutoRetries = clientConfig.getPropertyAsInteger(CommonClientConfigKey.MaxAutoRetries, DefaultClientConfigImpl.DEFAULT_MAX_AUTO_RETRIES);
        maxAutoRetriesNextServer = clientConfig.getPropertyAsInteger(CommonClientConfigKey.MaxAutoRetriesNextServer,maxAutoRetriesNextServer);

        okToRetryOnAllOperations = clientConfig.getPropertyAsBoolean(CommonClientConfigKey.OkToRetryOnAllOperations, okToRetryOnAllOperations);
        defaultRetryHandler = new DefaultLoadBalancerRetryHandler(clientConfig);
        
        tracer = getExecuteTracer();

        Monitors.registerObject("Client_" + clientName, this);
    }

serviceInstanceChooser：服务实例选择器，其实就是前面提到的LoadBalancerClient，帮助我们根据服务名选择服务实例的类。
clientFactory.getClientConfig(service) ：返回的是一个iclientconfig，这里是根据服务名，获取相应的ribbon配置，其实就是一中讲的ribbon重试的配置：

ribbon:
  OkToRetryOnAllOperations: true # 是否对所有请求都进行重试
  MaxAutoRetries: 0     #重试次数
  MaxAutoRetriesNextServer: 1   #重试切换实例次数
  ConnectTimeout: 100
  ReadTimeout: 300

这里的第2个参数和第4个参数都可以获取ribbon配置上下文，至于源码中为什么要这样重复写，目前我也没太看明白。

进入RibbonLoadBalancedRetryPolicy类的构造函数，生成重试策略：

在这里插入图片描述

这部分除了一些赋值操作外，要关注的一个点就是这里从上述的第4个参数iclientconfig中获取了ribbon配置的状态码填进了policy实例的retryableStatusCodes数组中，这个数组的作用下面会提到。

回到拦截器的intercept方法，构造完重试策略后，生成了一个重试模板RetryTemplate，并把生成的策略填充进去。

在这里插入图片描述

稍微讲下createRetryTemplate函数：

private RetryTemplate createRetryTemplate(String serviceName, HttpRequest request, LoadBalancedRetryPolicy retryPolicy) {
		RetryTemplate template = new RetryTemplate();//生成一个新的retry模板。
		BackOffPolicy backOffPolicy = lbRetryFactory.createBackOffPolicy(serviceName);// 生成一个重试的备选策略，这里看了源码，直接返回的是null，应该是还没完善。
		template.setBackOffPolicy(backOffPolicy == null ? new NoBackOffPolicy() : backOffPolicy);// 由于上一行，这里直接忽略。
		template.setThrowLastExceptionOnExhausted(true);//一个开关，意思说当你重试次数都用完的时候，抛出之前出现的最后一个异常
		RetryListener[] retryListeners = lbRetryFactory.createRetryListeners(serviceName);
		if (retryListeners != null && retryListeners.length != 0) {
			template.setListeners(retryListeners);
		}//给template设置重试监听器，来监听重试次数是否到达上限等操作
		template.setRetryPolicy( //设置第二点构造的重试策略
				!lbProperties.isEnabled() || retryPolicy == null ? new NeverRetryPolicy()
						: new InterceptorRetryPolicy(request, retryPolicy, loadBalancer,
						serviceName));
		return template;
	}

回到intercept，到了retryTemplate的excute方法，就是这里进行重试的判断和重试的处理。这部分代码写的实在是太过于绕，这里简单说下：

在这里插入图片描述
这个拦截器的最终返回其实就是retryTemplate的excute方法的调用结果，我们在上图可以看到excute方法传的参数是两个lambda表达式。先直接进入到RetryTemplate类的excute方法看看：

接收参数为两个回调函数，调用的核心方法是doExecute，我们跟进到doExcute方法。

这个doExcute方法就是整个重试的核心了，方法比较长，这里我按个人理解列一下注释

protected <T, E extends Throwable> T doExecute(RetryCallback<T, E> retryCallback,
			RecoveryCallback<T> recoveryCallback, RetryState state)
			throws E, ExhaustedRetryException {

		RetryPolicy retryPolicy = this.retryPolicy; // 前面set进来的new InterceptorRetryPolicy(request, retryPolicy, loadBalancer,serviceName))
		BackOffPolicy backOffPolicy = this.backOffPolicy; // 补偿策略，其实没什么卵用，为null

		// Allow the retry policy to initialise itself...
		RetryContext context = open(retryPolicy, state); // 根据重试策略和重试状态生成一个重试上下文，记录重试次数，重试次数是否耗尽，上一个异常 等信息
		if (this.logger.isTraceEnabled()) {
			this.logger.trace("RetryContext retrieved: " + context);
		}

		// Make sure the context is available globally for clients who need
		// it...
    	// 全局重试处理上下文，不是重点，忽略
		RetrySynchronizationManager.register(context);
		
		Throwable lastException = null; // 记录上一个出现的异常

		boolean exhausted = false; // 记录重试次数是否用尽
		try {

			// Give clients a chance to enhance the context...
			boolean running = doOpenInterceptors(retryCallback, context); // 也是个一脸懵逼的方法，跟进去方法源码发现一定返回的是true

            
			if (!running) { // 由于上面的是true，这三行就不用看了
				throw new TerminatedRetryException(
						"Retry terminated abnormally by interceptor before first attempt");
			}

			// Get or Start the backoff context...
			BackOffContext backOffContext = null; // 补偿机制都没有，上下文肯定也是null
			Object resource = context.getAttribute("backOffContext"); // 也是null
			// 接下来几行也不用看，都是null
			if (resource instanceof BackOffContext) {
				backOffContext = (BackOffContext) resource;
			}

			if (backOffContext == null) {
				backOffContext = backOffPolicy.start(context);
				if (backOffContext != null) {
					context.setAttribute("backOffContext", backOffContext);
				}
			}

			/*
			 * We allow the whole loop to be skipped if the policy or context already
			 * forbid the first try. This is used in the case of external retry to allow a
			 * recovery in handleRetryExhausted without the callback processing (which
			 * would throw an exception).
			 */
            
            // 这里的retry实际上调用的是InterceptorRetryPolicy类的canRetry方法，根据前几行的retryContext来判断要不要重试，当retryContext的retrycount为0的时候，默认返回true，代表可以重试
			while (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {
			// 当然，虽然上面的可以重试状态为true，但是当请求正常返回，没用异常或者超时的情况下，是不会进行重试的。所以这里catch了异常情况，进行重试。
				try {
					if (this.logger.isDebugEnabled()) {
						this.logger.debug("Retry: count=" + context.getRetryCount());
					}
					// Reset the last exception, so if we are successful
					// the close interceptors will not think we failed...
					lastException = null;
                    // ========1========
					return retryCallback.doWithRetry(context);
				}
                // 当上面的doWithRetry方法抛出异常的时候，会被catch (这里下面在讲标注1的时候也会提到)
                // 这里就是重试的部分，下面讲标注2的时候会详细说
				catch (Throwable e) {
					// 记录最新一个异常
					lastException = e;

					try {
                        // ========2========
						registerThrowable(retryPolicy, state, context, e);
					}
					catch (Exception ex) {
						throw new TerminatedRetryException("Could not register throwable",
								ex);
					}
					finally {
						doOnErrorInterceptors(retryCallback, context, e);
					}

					if (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {
						try {
							backOffPolicy.backOff(backOffContext);
						}
						catch (BackOffInterruptedException ex) {
							lastException = e;
							// back off was prevented by another thread - fail the retry
							if (this.logger.isDebugEnabled()) {
								this.logger
										.debug("Abort retry because interrupted: count="
												+ context.getRetryCount());
							}
							throw ex;
						}
					}

					if (this.logger.isDebugEnabled()) {
						this.logger.debug(
								"Checking for rethrow: count=" + context.getRetryCount());
					}

					if (shouldRethrow(retryPolicy, context, state)) {
						if (this.logger.isDebugEnabled()) {
							this.logger.debug("Rethrow in retry for policy: count="
									+ context.getRetryCount());
						}
						throw RetryTemplate.<E>wrapIfNecessary(e);
					}

				}

				/*
				 * A stateful attempt that can retry may rethrow the exception before now,
				 * but if we get this far in a stateful retry there's a reason for it,
				 * like a circuit breaker or a rollback classifier.
				 */
				if (state != null && context.hasAttribute(GLOBAL_STATE)) {
					break;
				}
			}

			if (state == null && this.logger.isDebugEnabled()) {
				this.logger.debug(
						"Retry failed last attempt: count=" + context.getRetryCount());
			}

			exhausted = true;
			return handleRetryExhausted(recoveryCallback, context, state);

		}
		catch (Throwable e) {
			throw RetryTemplate.<E>wrapIfNecessary(e);
		}
		finally {
			close(retryPolicy, context, state, lastException == null || exhausted);
			doCloseInterceptors(retryCallback, context, lastException);
			RetrySynchronizationManager.clear();
		}

	}

当请求第一个实例能正常请求到并且能正常返回的时候，上面的代码其实执行完了doWithRetry这个函数的时候，就已经返回了，后面的暂时先不看。

我们来看看上面标注的1和2。

标注1：

return retryCallback.doWithRetry(context);

这里其实就是执行回调函数，我们看看一开始RetryLoadBalancerInterceptor类的intercept方法，在执行retryTemplate的excute方法的时候，把函数当参数传进去了。

在这里插入图片描述
这里入参传了两个lambda的回调，对应excute方法的两个入参。先看第一个，RetryCallback的实现:

// 定义回调函数的入参，这个参数在前面讲doExcute时候，由open(retryPolicy, state)生成。
context -> {
    // 服务实例的获取
			ServiceInstance serviceInstance = null;
    // 在前面讲的retryTemplate类中的canRetry函数调用的InterceptorRetryPolicy类的canRetry函数其实已经有一步是获取serviceInstance，set到lbContext中了（假如获取的时候报错，则set的值为null）。所以这里直接获取
			if (context instanceof LoadBalancedRetryContext) {
				LoadBalancedRetryContext lbContext = (LoadBalancedRetryContext) context;
				serviceInstance = lbContext.getServiceInstance();
			}
   		   // 假如上一步没取到，重新再取一次
			if (serviceInstance == null) {
				serviceInstance = loadBalancer.choose(serviceName);
			}
  // 重中之重1，这一步的调用，假如报错，会被捕获（这里在之前给出RetryTemplate的doExcute方法的注释的时候有提到),进行请求重试
  // 根据获取的服务实例去构造http请求
			ClientHttpResponse response = RetryLoadBalancerInterceptor.this.loadBalancer.execute(
					serviceName, serviceInstance,
					requestFactory.createRequest(request, body, execution));
    // 重中之重2
    // 获取这次http调用的返回码，http层级的，不是框架二次封装的那种。
			int statusCode = response.getRawStatusCode();
    // 当重试策略存在重试策略里存的状态码，包含上一步调用返回码的时候，关闭这次请求(划重点！)，抛出一个异常，而这次异常会直接被捕获到(同上一步)，进行请求重试！
			if (retryPolicy != null && retryPolicy.retryableStatusCode(statusCode)) {
				byte[] bodyCopy = StreamUtils.copyToByteArray(response.getBody());
				response.close();
				throw new ClientHttpResponseStatusCodeException(serviceName, response, bodyCopy);
			}
			return response;
		}

上面的retryableStatusCode方法如下：

public boolean retryableStatusCode(int statusCode) {
		return retryableStatusCodes.contains(statusCode);
	}

其实就是判断入参的状态码，在不在第4点讲的重试policy中根据配置初始化出来的数组（这个数组不一定非要是错误码才会进行重试，随用户怎么配置都可以，甚至你可以让请求成功返回200的时候也去重试）中。

在就进行重试，不在就直接返回调用结果。

小结：根据上面的源码分析，我们可以得出两种发生重试的场景：

1）对获取到的服务实例进行http请求的时候报错

2）上面的请求返回值被包含在用户配置ribbon的retryableStatusCodes参数中。

这两种情况都会导致重试。

由于这里我模拟的是服务实例正常的情况，http返回码是200，没有报错，并且我在配置文件中配置的需要重试的返回码是500和503（其实这个没有太大的配置的必要），那么是显然不会进行重试的。因此这里就正常返回了。

在这里插入图片描述

当重复请求的时候，返回结果是8901和8092交替出现。

接下来模拟一下client服务的某一个实例突然挂掉，ribbon进行重试的场景。

在这里插入图片描述

断点打在intercept函数的发送http请求这一步，由上面的红色框可以出来这次准备请求的是端口为8091的这个实例，我们把这个实例给干掉。

在这里插入图片描述

此次http请求访问实例异常，先记录异常到lastException，然后进入重试逻辑。这里也是之前的标注2。

标注2：

registerThrowable(retryPolicy, state, context, e);

我们再跟进上面的方法，最终会进入到RibbonLoadBalancedRetryPolicy类的registerThrowable方法:

在这里插入图片描述
ribbon就是在这里实现的重试逻辑：

首先调用updateServerInstanceStats方法，这个方法的作用是更新loadBalancedRetryContext这个上下文维护的服务实例的状态，比如我刚刚干掉了8091端口的实例，这里就会把8091的这个实例状态给置为不可用。

然后根据loadBalancedRetryContext上下文中，ribbon的配置(最大重试次数，是否所有方法都重试等这些配置),来决定要不要重试当前实例，如果不能重试当前实例，那么要不要重试下一个实例。最终根据loadBalanceChooser.choose(serviceId)来重新选取一个服务实例，并set到loadBalancedRetryContext上下文中。

这里简单说一下loadBalanceChooser.choose(serviceId)这个方法，是根据serviceId，按照负载均衡规则，从ribbon缓存的服务实例列表里，选取服务实例。默认的规则是轮询。

再说下canRetry方法，这个方法是policy根据retryContext来决策当前服务实例能否进行重试的方法：

在这里插入图片描述

可见，get请求是默认重试（前提是你引了spring-retry依赖）的，获取是用户配置了ribbon.OkToRetryOnAllOperations = true 这条配置。

registerThrowable方法执行完后，重新进入到retryTemplate的doExcute方法中（因为while循环的存在），并且此时重试上下文context中的服务实例，已经被重新选取了，此时再重新执行回调函数retryCallback.doWithRetry(context)，这时候选取的服务实例已经是正常的端口为8092的实例了。如下图：

在这里插入图片描述

最终请求的是8092的实例，正确返回：

在这里插入图片描述

小结：在retryTemplate的doExcute方法中，会调用RetryLoadBalancerInterceptor的excuete方法中传入的回调函数，在回调函数中根据lbContext上下文中的serviceInstance去构造最终的http请求，假如报错，或者请求返回码在用户配置的返回码中，则会进入重试逻辑，ribbon会重新选取一个服务实例，继续前面的步骤。

总结：ribbon的重试依赖于spring-retry包，只有引用了这个包，springboot的自动配置才会启用ribbon重试相关的配置。当当前的服务实例请求失败，或者请求成功，但是获取的返回码在用户配置的返回码中，会根据用户的相应配置触发重试。