Spring Cloud Feign组件的重试策略及负载均衡策略
简介
在使用Spring Cloud微服务框架的过程中,经常会使用Feign组件进行服务间的远程调用。微服务远程调用技术的dubbo
具备负载均衡策略(轮训、最小连接数、随机轮训、加权轮训)及失败策略(快速失败、失败重试),那么 Feign的负载均衡策略是什么? 失败后是否会重试,重试策略又是什么? 带着这些疑问,查阅了一些资料和源码。
Spring Cloud集成Feign的流程:
- 利用
FeignAutoConfiguration
的自动配置,和EnableFeignClients
的自动注册生成Feign的代理类; - 使用工厂模式
FactoryBean
的实现类FeignClientFactoryBean.getObject
将FeignClient
的bean注入到Spring容器中; - 代理类使用
hystrix
进行资源隔离,构造了向负载均衡选中的server发送http
请求的RequestTemplate
,并进行编码和解码等一系列操作。
粗略了解完整体流程后,下面进行一些细节分析:
Feign重试策略
SynchronousMethodHandler.invoke
的处理逻辑:
@Override
public Object invoke(Object[] argv) throws Throwable {
RequestTemplate template = buildTemplateFromArgs.create(argv);
Options options = findOptions(argv);
Retryer retryer = this.retryer.clone();
while (true) {
try {
return executeAndDecode(template, options);
} catch (RetryableException e) {
try {
retryer.continueOrPropagate(e);
} catch (RetryableException th) {
Throwable cause = th.getCause();
if (propagationPolicy == UNWRAP && cause != null) {
throw cause;
} else {
throw th;
}
}
if (logLevel != Logger.Level.NONE) {
logger.logRetry(metadata.configKey(), logLevel);
}
continue;
}
}
}
- 上面的逻辑很简单:构造 template 并进行服务间的
http
调用,然后对返回结果进行解码 - 当抛出
RetryableException
后,异常逻辑是否重试? 重试多少次?带着这些问题,看了retryer.continueOrPropagate(e);
的源码
public void continueOrPropagate(RetryableException e) {
if (attempt++ >= maxAttempts) {
throw e;
}
long interval;
if (e.retryAfter() != null) {
interval = e.retryAfter().getTime() - currentTimeMillis();
if (interval > maxPeriod) {
interval = maxPeriod;
}
if (interval < 0) {
return;
}
} else {
interval = nextMaxInterval();
}
try {
Thread.sleep(interval);
} catch (InterruptedException ignored) {
Thread.currentThread().interrupt();
throw e;
}
sleptForMillis += interval;
}
- 当重试次数大于默认次数5时候,直接抛出异常,不再重试;否则每隔一段时间(默认值最大1 ms)后重试一次。
在生产环境需要关闭
feign
的重试操作。原因如下:
- 一般情况下,第一次失败,重试也会失败,极端情况下将导致不断的进行重试,这将会导致服务器性能下降,影响核心功能
- 对于不是幂等的接口,重试很有可能导致业务逻辑的错误,引发其他问题
Feign负载均衡策略
那么负载均衡的策略又是什么呢?分析SynchronousMethodHandler.executeAndDecode
便可知晓
Object executeAndDecode(RequestTemplate template, Options options) throws Throwable {
Request request = targetRequest(template);
if (logLevel != Logger.Level.NONE) {
logger.logRequest(metadata.configKey(), logLevel, request);
}
Response response;
long start = System.nanoTime();
try {
response = client.execute(request, options);
// ensure the request is set. TODO: remove in Feign 12
response = response.toBuilder()
.request(request)
.requestTemplate(template)
.build();
} catch (IOException e) {
if (logLevel != Logger.Level.NONE) {
logger.logIOException(metadata.configKey(), logLevel, e, elapsedTime(start));
}
throw errorExecuting(request, e);
}
long elapsedTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
if (decoder != null)
return decoder.decode(response, metadata.returnType());
CompletableFuture<Object> resultFuture = new CompletableFuture<>();
asyncResponseHandler.handleResponse(resultFuture, metadata.configKey(), response,
metadata.returnType(),
elapsedTime);
try {
if (!resultFuture.isDone())
throw new IllegalStateException("Response handling not done");
return resultFuture.join();
} catch (CompletionException e) {
Throwable cause = e.getCause();
if (cause != null)
throw cause;
throw e;
}
}
主要做了两件事:发送HTTP请求,解码响应数据
response = client.execute(request, options);
中的client
有Default
、LoadBalancerFeignClient
两种实现方式 ,其中LoadBalancerFeignClient
通过FeignClientFactoryBean.getObject
进行负载均衡的相关设置。
下面重点看 LoadBalancerFeignClient execute(request, options)
@Override
public Response execute(Request request, Request.Options options) throws IOException {
try {
URI asUri = URI.create(request.url());
String clientName = asUri.getHost();
URI uriWithoutHost = cleanUrl(request.url(), clientName);
FeignLoadBalancer.RibbonRequest ribbonRequest = new FeignLoadBalancer.RibbonRequest(
this.delegate, request, uriWithoutHost);
IClientConfig requestConfig = getClientConfig(options, clientName);
return lbClient(clientName).executeWithLoadBalancer(ribbonRequest,
requestConfig).toResponse();
}
catch (ClientException e) {
IOException io = findIOException(e);
if (io != null) {
throw io;
}
throw new RuntimeException(e);
}
}
从代码FeignLoadBalancer.RibbonRequest
可以知道Feign
的负载均衡还是通过Ribbon
实现的,那么Ribbon
又是如何实现负载均衡的呢?
public Observable<T> submit(final ServerOperation<T> operation) {
final ExecutionInfoContext context = new ExecutionInfoContext();
if (listenerInvoker != null) {
try {
listenerInvoker.onExecutionStart();
} catch (AbortExecutionException e) {
return Observable.error(e);
}
}
final int maxRetrysSame = retryHandler.getMaxRetriesOnSameServer();
final int maxRetrysNext = retryHandler.getMaxRetriesOnNextServer();
// Use the load balancer
Observable<T> o =
(server == null ? selectServer() : Observable.just(server))
.concatMap(new Func1<Server, Observable<T>>() {
@Override
// Called for each server being selected
public Observable<T> call(Server server) {
context.setServer(server);
final ServerStats stats = loadBalancerContext.getServerStats(server);
// Called for each attempt and retry
Observable<T> o = Observable
.just(server)
.concatMap(new Func1<Server, Observable<T>>() {
@Override
public Observable<T> call(final Server server) {
context.incAttemptCount();
loadBalancerContext.noteOpenConnection(stats);
if (listenerInvoker != null) {
try {
listenerInvoker.onStartWithServer(context.toExecutionInfo());
} catch (AbortExecutionException e) {
return Observable.error(e);
}
}
final Stopwatch tracer = loadBalancerContext.getExecuteTracer().start();
return operation.call(server).doOnEach(new Observer<T>() {
private T entity;
@Override
public void onCompleted() {
recordStats(tracer, stats, entity, null);
// TODO: What to do if onNext or onError are never called?
}
@Override
public void onError(Throwable e) {
recordStats(tracer, stats, null, e);
logger.debug("Got error {} when executed on server {}", e, server);
if (listenerInvoker != null) {
listenerInvoker.onExceptionWithServer(e, context.toExecutionInfo());
}
}
@Override
public void onNext(T entity) {
this.entity = entity;
if (listenerInvoker != null) {
listenerInvoker.onExecutionSuccess(entity, context.toExecutionInfo());
}
}
private void recordStats(Stopwatch tracer, ServerStats stats, Object entity, Throwable exception) {
tracer.stop();
loadBalancerContext.noteRequestCompletion(stats, entity, exception, tracer.getDuration(TimeUnit.MILLISECONDS), retryHandler);
}
});
}
});
if (maxRetrysSame > 0)
o = o.retry(retryPolicy(maxRetrysSame, true));
return o;
}
});
if (maxRetrysNext > 0 && server == null)
o = o.retry(retryPolicy(maxRetrysNext, false));
return o.onErrorResumeNext(new Func1<Throwable, Observable<T>>() {
@Override
public Observable<T> call(Throwable e) {
if (context.getAttemptCount() > 0) {
if (maxRetrysNext > 0 && context.getServerAttemptCount() == (maxRetrysNext + 1)) {
e = new ClientException(ClientException.ErrorType.NUMBEROF_RETRIES_NEXTSERVER_EXCEEDED,
"Number of retries on next server exceeded max " + maxRetrysNext
+ " retries, while making a call for: " + context.getServer(), e);
}
else if (maxRetrysSame > 0 && context.getAttemptCount() == (maxRetrysSame + 1)) {
e = new ClientException(ClientException.ErrorType.NUMBEROF_RETRIES_EXEEDED,
"Number of retries exceeded max " + maxRetrysSame
+ " retries, while making a call for: " + context.getServer(), e);
}
}
if (listenerInvoker != null) {
listenerInvoker.onExecutionFailed(e, context.toFinalExecutionInfo());
}
return Observable.error(e);
}
});
}
通过对上面代码(重点看17行)的分析,发现Ribbon
和Hystrix
都是利用了rxjava
来实现负载均衡的。重点分析17行代码,方法selectServer()
选择了指定的Server,其负载均衡的策略ILoadBalancer
主要有如下几种实现方式:
BaseLoadBalancer
采用了规则为RoundRobinRule
的轮训规则DynamicServerListLoadBalancer
继承了BaseLoadBalancer
运行时改变Server列表NoOpLoadBalancer
什么操作都不做ZoneAwareLoadBalancer
根据区域Zone
分组的实例列表