Hystrix官方已经停止开发了,Hystrix官方推荐使用新一代熔断器作为Resilience4j。作为新一代的熔断器,Resilience4j有很多优势,比如依赖少,模块化程度较好等优势。
Resilience4j是受Hystrix启发而做的熔断器,通过管理远程调用的容错处理来帮助实现一个健壮的系统。resilience4j提供了更好用的API,并且提供了很多其他功能比如Rate Limiter(限流器)、Bulkhead(舱壁隔离)、熔断器、重试、缓存、限时器等。
-
限时器【TimeLimiterAutoConfiguration】:引入resilience4j-timelimiter[6]依赖。可以使用TimeLimiter限制调用远程服务所花费的时间。
-
重试【RetryAutoConfiguration】:引入resilience4j-retry[4]库。
用户可参与的设置:
- 最大尝试数。
- 重试前等待时间。
- 自定义函数,用于修改故障后的等待间隔。
- 自定义谓词,用于评估异常是否应重试。
-
舱壁隔离【BulkheadAutoConfiguration】:引入resilience4j-bulkhead[3]依赖。可以限制对特定服务的并发调用数。
用户可参与的设置:允许的最大并行数、线程等待的最大时间。
-
限流器【RateLimiterAutoConfiguration】:引入resilience4j-ratelimiter[2]依赖。可以允许限制对某些服务的访问。用户可参与的设置:limit刷新周期、刷新周期的权限限制、默认等待权限持续时间。
-
熔断器【CircuitBreakerAutoConfiguration】
熔断器有三种可能状态:
- 关闭:服务正常,不需要进行短路。
- 打开:远程服务宕机,所有请求都短路。
- 半开:进入打开状态一段时间后,熔断器允许检查远程服务是否恢复。
用户可参与的设置:
- 熔断器进入打开状态的阈值。
- 等待时间,即熔断器进入打开状态到半开状态需要等待的时间。
- 熔断器半开或者闭合时,ring buffer的大小。
- 处理自定义事件的监听器。
- 自定义谓词,用于评估异常是否应算作故障,从而提高故障率。
DefaultTargeter情况下读超时、连接超时的配置属性完全由FeignEncoderProperties类控制。注意与低版本的区别
CircuitBreakerStateMachine、CircuitBreakerState、CircuitBreakerMetrics、FixedSizeSlidingWindowMetrics、SlidingTimeWindowMetrics类之间的关系:
由 CircuitBreakerStateMachine -> CircuitBreakerState -> CircuitBreakerState内部调用CircuitBreakerMetrics -> CircuitBreakerMetrics触发FixedSizeSlidingWindowMetrics、SlidingTimeWindowMetrics类。
1.熔断策略
熔断器内部维护了如下6种不同的状态。ClosedState是CircuitBreakerStateMachine初始化状态。
每个【状态CircuitBreakerState】实例化时都伴随一个与其对应的类CircuitBreakerMetrics
的实例化。如下所示,CircuitBreakerMetrics类的作用:
- 维护了可以统计当前ID运行相关指标的接口,即
基于计数的滑动窗口 FixedSizeSlidingWindowMetrics 和基于时间滑动窗口 SlidingTimeWindowMetrics
。 - 可以通过
CircuitBreakerConfig
的属性之SlidingWindowType
选择不同的指标统计方式。
public interface CircuitBreaker {
// circuitBreaker:CircuitBreakerStateMachine
static <T> Callable<T> decorateCallable(CircuitBreaker circuitBreaker, Callable<T> callable) {
return () -> {
circuitBreaker.acquirePermission();//时机1
long start = circuitBreaker.getCurrentTimestamp();
try {
T result = callable.call();
long duration = circuitBreaker.getCurrentTimestamp() - start;
circuitBreaker.onResult(duration, circuitBreaker.getTimestampUnit(), result);//时机2
return result;
} catch (Exception exception) {
long duration = circuitBreaker.getCurrentTimestamp() - start;
circuitBreaker.onError(duration, circuitBreaker.getTimestampUnit(), exception);//时机3
throw exception;
}
};
}
}
如上所示主要通过CircuitBreakerStateMachine类的三个方法【acquirePermission、onResult、onError】触发具体的熔断策略,最终通过CircuitBreakerStateMachine触发具体CircuitBreakerState
执行。
时机1:根据当前的熔断状态选择继续访问下游服务还是直接降级处理。
- 如果是OpenState,则当前时间超过waitIntervalFunctionInOpenState则继续访问下游服务,否则直接降级处理。
- 如果是HalfOpenState,如果permittedNumberOfCallsInHalfOpenState【默认为10】即运行放过的请求达到0则直接降级处理,否则继续访问下游服务。
- 如果是ForcedOpenState,则所有请求直接降级处理。
- 如果是DisabledState,则所有请求任何时候都访问下游服务。
时机2 & 时机3:目的是改变熔断器的状态,从而影响时机1的执行。一旦时机1执行过程中发生异常则请求就会放弃调用下游服务,直接选择降级逻辑。
注意:时机2、时机3如果不符合执行条件或者没有影响熔断器初始状态,则时机1永远是执行熔断器状态为关闭情况下的逻辑,此时对请求没有任何影响。
时机1 & 时机2 & 时机3:三者不会影响请求在执行过程中是否执行降级逻辑。
1.1.CircuitBreakerState
public final class CircuitBreakerStateMachine implements CircuitBreaker {
// 维护当前ID所处的熔断状态,初始化状态ClosedState
private final AtomicReference<CircuitBreakerState> stateReference;
private interface CircuitBreakerState {
boolean tryAcquirePermission();
void acquirePermission();
void releasePermission();
void onError(long duration, TimeUnit durationUnit, Throwable throwable);
void onSuccess(long duration, TimeUnit durationUnit);
CircuitBreaker.State getState();
CircuitBreakerMetrics getMetrics();
}
}
1.2.CircuitBreakerStateMachine
CircuitBreakerStateMachine内部维护不同类型的熔断状态。
public final class CircuitBreakerStateMachine implements CircuitBreaker {
private final AtomicReference<CircuitBreakerState> stateReference;
private CircuitBreakerMetrics(int sws,SlidingWindowType swt,CircuitBreakerConfig config,Clock clock) {
if (swt == config.SlidingWindowType.COUNT_BASED) {//默认取值
this.metrics = new FixedSizeSlidingWindowMetrics(sws);
this.minimumNumberOfCalls = Math.min(circuitBreakerConfig.getMinimumNumberOfCalls(), sws);
} else {
this.metrics = new SlidingTimeWindowMetrics(sws, clock);
this.minimumNumberOfCalls = config.getMinimumNumberOfCalls();
}
this.failureRateThreshold = config.getFailureRateThreshold();
this.slowCallRateThreshold = config.getSlowCallRateThreshold();
this.slowCallDurationThresholdInNanos = config.getSlowCallDurationThreshold().toNanos();
this.numberOfNotPermittedCalls = new LongAdder();
}
@Override
public void acquirePermission() {
stateReference.get().acquirePermission();
}
public void onError(long duration, TimeUnit durationUnit, Throwable throwable) {
if (throwable instanceof CompletionException || throwable instanceof ExecutionException) {
Throwable cause = throwable.getCause();
handleThrowable(duration, durationUnit, cause);
} else {
handleThrowable(duration, durationUnit, throwable);//error1
}
}
public void onSuccess(long duration, TimeUnit durationUnit) {
publishSuccessEvent(duration, durationUnit);
stateReference.get().onSuccess(duration, durationUnit);
}
private void handleThrowable(long duration, TimeUnit durationUnit, Throwable throwable) {
// resilience4j.circuitbreaker.configs.default.ignoreExceptionPredicate:判断是否忽略异常
if (circuitBreakerConfig.getIgnoreExceptionPredicate().test(throwable)) {
releasePermission();
publishCircuitIgnoredErrorEvent(name, duration, durationUnit, throwable);
// resilience4j.circuitbreaker.configs.default.recordExceptionPredicate:判断是否记录异常【默认方式】
} else if (circuitBreakerConfig.getRecordExceptionPredicate().test(throwable)) {
publishCircuitErrorEvent(name, duration, durationUnit, throwable);
//首次触发时,其熔断器的状态为ClosedState
stateReference.get().onError(duration, durationUnit, throwable);//error2
} else {
publishSuccessEvent(duration, durationUnit);
stateReference.get().onSuccess(duration, durationUnit);
}
}
public void transitionToOpenState() {
stateTransition(OPEN,currentState -> new OpenState(currentState.attempts() + 1, currentState.getMetrics()));
}
//改变属性stateReference持有实例为OpenState
private void stateTransition(State newState,UnaryOperator<CircuitBreakerState> newStateGenerator) {
CircuitBreakerState previousState = stateReference.getAndUpdate(currentState -> {
StateTransition.transitionBetween(getName(), currentState.getState(), newState);
currentState.preTransitionHook();
return newStateGenerator.apply(currentState);
});
...
}
public void transitionToHalfOpenState() {
stateTransition(HALF_OPEN, currentState -> new HalfOpenState(currentState.attempts()));
}
}
stateReference#onSuccess || stateReference#onError:其实是调用具体某个熔断器状态中对应的方法,其目的是尽量改变熔断器的状态。
1.3.CircuitBreakerMetrics
内部利用FixedSizeSlidingWindowMetrics or SlidingTimeWindowMetrics统计当前ID对应的指标数据。
执行目标方法不管成功与否都会触发CircuitBreakerMetrics对当前ID目标方法调用情况进行统计。
class CircuitBreakerMetrics implements CircuitBreaker.Metrics {
private final Metrics metrics;
private final float failureRateThreshold;
private final float slowCallRateThreshold;
private final long slowCallDurationThresholdInNanos;
private final LongAdder numberOfNotPermittedCalls;
private int minimumNumberOfCalls;
// 计数滑动窗口之FixedSizeSlidingWindowMetrics || 基于时间滑动窗口之SlidingTimeWindowMetrics
private final Metrics metrics;
public Result onSuccess(long duration, TimeUnit durationUnit) {
Snapshot snapshot;
// resilience4j.circuitbreaker.configs.default.slowCallDurationThresholdInNanos
if (durationUnit.toNanos(duration) > slowCallDurationThresholdInNanos) {
snapshot = metrics.record(duration, durationUnit, Outcome.SLOW_SUCCESS);
} else {
//FixedSizeSlidingWindowMetrics || SlidingTimeWindowMetrics
snapshot = metrics.record(duration, durationUnit, Outcome.SUCCESS);
}
return checkIfThresholdsExceeded(snapshot);
}
public Result onError(long duration, TimeUnit durationUnit) {//error4
Snapshot snapshot;
// resilience4j.circuitbreaker.configs.default.slowCallDurationThreshold:判断是否为慢请求【6000000000】。
if (durationUnit.toNanos(duration) > slowCallDurationThresholdInNanos) {
snapshot = metrics.record(duration, durationUnit, Outcome.SLOW_ERROR);// 慢请求
} else {
//FixedSizeSlidingWindowMetrics || SlidingTimeWindowMetrics
snapshot = metrics.record(duration, durationUnit, Outcome.ERROR);// 快请求
}
return checkIfThresholdsExceeded(snapshot);
}
private Result checkIfThresholdsExceeded(Snapshot snapshot) {
float failureRateInPercentage = getFailureRate(snapshot);
float slowCallsInPercentage = getSlowCallRate(snapshot);
if (failureRateInPercentage == -1 || slowCallsInPercentage == -1) {
return Result.BELOW_MINIMUM_CALLS_THRESHOLD;//表示失败次数没有达到阈值
}
//resilience4j.circuitbreaker.configs.default.failureRateThreshold:失败的比率,默认为50
//resilience4j.circuitbreaker.configs.default.slowCallRateThreshold:慢调用失败的比率,默认为100
if (failureRateInPercentage >= failureRateThreshold && slowCallsInPercentage >= slowCallRateThreshold) {
return Result.ABOVE_THRESHOLDS;
}
if (failureRateInPercentage >= failureRateThreshold) {
return Result.FAILURE_RATE_ABOVE_THRESHOLDS;// 条件只打到失败比率
}
if (slowCallsInPercentage >= slowCallRateThreshold) {
return Result.SLOW_CALL_RATE_ABOVE_THRESHOLDS;// 条件只打到慢调用比率
}
return Result.BELOW_THRESHOLDS;
}
private float getSlowCallRate(Snapshot snapshot) {
//resilience4j.circuitbreaker.configs.default.totalNumberOfCalls
int bufferedCalls = snapshot.getTotalNumberOfCalls();
if (bufferedCalls == 0 || bufferedCalls < minimumNumberOfCalls) {
return -1.0f;
}
return snapshot.getSlowCallRate();
}
private float getFailureRate(Snapshot snapshot) {
//resilience4j.circuitbreaker.configs.default.totalNumberOfCalls:当前ID调用的实时总次数
int bufferedCalls = snapshot.getTotalNumberOfCalls();
//resilience4j.circuitbreaker.configs.default.minimumNumberOfCalls:当前ID配置的最少调用次数
if (bufferedCalls == 0 || bufferedCalls < minimumNumberOfCalls) {
return -1.0f;
}
return snapshot.getFailureRate();
}
enum Result {
BELOW_THRESHOLDS,
FAILURE_RATE_ABOVE_THRESHOLDS,
SLOW_CALL_RATE_ABOVE_THRESHOLDS,
ABOVE_THRESHOLDS,
BELOW_MINIMUM_CALLS_THRESHOLD;
public static boolean hasExceededThresholds(Result result) {
return hasFailureRateExceededThreshold(result) || hasSlowCallRateExceededThreshold(result);
}
public static boolean hasFailureRateExceededThreshold(Result result) {
return result == ABOVE_THRESHOLDS || result == FAILURE_RATE_ABOVE_THRESHOLDS;
}
public static boolean hasSlowCallRateExceededThreshold(Result result) {
return result == ABOVE_THRESHOLDS || result == SLOW_CALL_RATE_ABOVE_THRESHOLDS;
}
}
}
hasExceededThresholds:失败比率、慢调用比率两者存在超过阈值的情况或者两者都超过阈值。
minimumNumberOfCalls:调用次数小于该变量配置值的情况下是不会触发熔断,但是可以降级。此时熔断状态就是始终处于关闭状态。
public class SnapshotImpl implements Snapshot {
private final long totalDurationInMillis;
private final int totalNumberOfSlowCalls;
private final int totalNumberOfSlowFailedCalls;
private final int totalNumberOfFailedCalls;
private final int totalNumberOfCalls;
...
@Override
public float getSlowCallRate() {
if (totalNumberOfCalls == 0) {
return 0;
}
return totalNumberOfSlowCalls * 100.0f / totalNumberOfCalls;
}
@Override
public float getFailureRate() {
if (totalNumberOfCalls == 0) {
return 0;
}
//失败总次数占用调用总次数的比例
return totalNumberOfFailedCalls * 100.0f / totalNumberOfCalls;
}
@Override
public Duration getAverageDuration() {
if (totalNumberOfCalls == 0) {
return Duration.ZERO;
}
return Duration.ofMillis(totalDurationInMillis / totalNumberOfCalls);
}
}
1.4.熔断器之CircuitBreakerState
具体的CircuitBreakerState根据CircuitBreakerMetrics方法的返回结果切换熔断器状态。
1.4.1.熔断器之OpenState
public final class CircuitBreakerStateMachine implements CircuitBreaker {
OpenState(final int attempts, CircuitBreakerMetrics circuitBreakerMetrics) {
this.attempts = attempts;
// 经过waitIntervalFunctionInOpenState毫秒后尝试将熔断器状态修改为半开状态
// resilience4j.circuitbreaker.configs.default.waitIntervalFunctionInOpenState:默认6秒【60000】。
final long waitDurationInMillis = circuitBreakerConfig.getWaitIntervalFunctionInOpenState().apply(attempts);
this.retryAfterWaitDuration = clock.instant().plus(waitDurationInMillis, MILLIS);
this.circuitBreakerMetrics = circuitBreakerMetrics;
// 只有当前开关automaticTransitionFromOpenToHalfOpenEnabled为true,才会尝试将熔断器状态修改为半开状态
//resilience4j.circuitbreaker.configs.default.automaticTransitionFromOpenToHalfOpenEnabled:默认值false
if (circuitBreakerConfig.isAutomaticTransitionFromOpenToHalfOpenEnabled()) {
ScheduledExecutorService scheduledExecutorService = schedulerFactory.getScheduler();
transitionToHalfOpenFuture = scheduledExecutorService
.schedule(this::toHalfOpenState, waitDurationInMillis, TimeUnit.MILLISECONDS);
} else {
transitionToHalfOpenFuture = null;
}
isOpen = new AtomicBoolean(true);
}
private void toHalfOpenState() {
if (isOpen.compareAndSet(true, false)) {
transitionToHalfOpenState();//CircuitBreakerStateMachine#transitionToHalfOpenState
}
}
@Override
public boolean tryAcquirePermission() {
// 当前请求的时间达到retryAfterWaitDuration秒后,说明熔断器状态需要修改为半开状态
if (clock.instant().isAfter(retryAfterWaitDuration)) {
toHalfOpenState();// 修改为半开状态
return true;// 请求继续访问下游服务
}
circuitBreakerMetrics.onCallNotPermitted();
return false;// 直接降级处理
}
@Override
public void acquirePermission() {
if (!tryAcquirePermission()) {
throw CallNotPermittedException.createCallNotPermittedException(CircuitBreakerStateMachine.this);
}
}
}
即使automaticTransitionFromOpenToHalfOpenEnabled为false,但是waitIntervalFunctionInOpenState配置值一定存在。如上得知经过waitIntervalFunctionInOpenState时间后照样可以将熔断状态从打开状态变更为半开状态。
1.4.2.熔断器之HalfOpenState
public final class CircuitBreakerStateMachine implements CircuitBreaker {
private final CircuitBreakerConfig circuitBreakerConfig;
private final Clock c;
private class HalfOpenState implements CircuitBreakerState {
private final AtomicInteger permittedNumberOfCalls;
private final AtomicBoolean isHalfOpen;
private final int attempts;
private final CircuitBreakerMetrics circuitBreakerMetrics;
@Nullable
private final ScheduledFuture<?> transitionToOpenFuture;
HalfOpenState(int attempts) {
// 半开状态时允许请求通过继续调用下游服务的个数。默认为10个请求
// resilience4j.circuitbreaker.configs.default.permittedNumberOfCallsInHalfOpenState:默认10。
int permittedNumber =circuitBreakerConfig.getPermittedNumberOfCallsInHalfOpenState();
this.circuitBreakerMetrics = CircuitBreakerMetrics.forHalfOpen(permittedNumber,circuitBreakerConfig, c);
this.permittedNumberOfCalls = new AtomicInteger(permittedNumber);
this.isHalfOpen = new AtomicBoolean(true);
this.attempts = attempts;
// 经过 maxWaitDurationInHalfOpenState 秒之后尝试将 半开状态修改为打开状态
// resilience4j.circuitbreaker.configs.default.maxWaitDurationInHalfOpenState:默认0。
Duration maxWaitDurationInHalfOpenState = circuitBreakerConfig.getMaxWaitDurationInHalfOpenState();
final long maxWaitDurationInHalfOpenState = maxWaitDurationInHalfOpenState.toMillis();
if (maxWaitDurationInHalfOpenState >= 1) {
ScheduledExecutorService scheduledExecutorService = schedulerFactory.getScheduler();
transitionToOpenFuture = scheduledExecutorService
.schedule(this::toOpenState, maxWaitDurationInHalfOpenState, TimeUnit.MILLISECONDS);
} else {
transitionToOpenFuture = null;
}
}
@Override
public boolean tryAcquirePermission() {
if (permittedNumberOfCalls.getAndUpdate(current -> current == 0 ? current : --current) > 0) {
return true;// 请求继续访问下游服务
}
circuitBreakerMetrics.onCallNotPermitted();
return false;// 直接降级处理
}
@Override
public void acquirePermission() {
if (!tryAcquirePermission()) {//只要条件满足则抛出异常,就会执行降级策略
throw CallNotPermittedException.createCallNotPermittedException(CircuitBreakerStateMachine.this);
}
}
private void toOpenState() {
if (isHalfOpen.compareAndSet(true, false)) {
transitionToOpenState();
}
}
@Override
public void releasePermission() {
permittedNumberOfCalls.incrementAndGet();
}
@Override
public void onError(long duration, TimeUnit durationUnit, Throwable throwable) {
checkIfThresholdsExceeded(circuitBreakerMetrics.onError(duration, durationUnit));
}
@Override
public void onSuccess(long duration, TimeUnit durationUnit) {
checkIfThresholdsExceeded(circuitBreakerMetrics.onSuccess(duration, durationUnit));
}
private void checkIfThresholdsExceeded(Result result) {
if (Result.hasExceededThresholds(result)) {
if (isHalfOpen.compareAndSet(true, false)) {
transitionToOpenState();
}
}
if (result == BELOW_THRESHOLDS) {// 慢调用 & 错误次数均没有达到上限
if (isHalfOpen.compareAndSet(true, false)) {
transitionToClosedState();
}
}
}
...
}
}
如果在半开状态下存在请求访问成功则将熔断状态修改为关闭或者打开状态。
1.4.3.熔断器之ClosedState
public final class CircuitBreakerStateMachine implements CircuitBreaker {
private class ClosedState implements CircuitBreakerState {
private final CircuitBreakerMetrics circuitBreakerMetrics;
public void onError(long duration, TimeUnit durationUnit, Throwable throwable) {
checkIfThresholdsExceeded(circuitBreakerMetrics.onError(duration, durationUnit));//error3
}
private void checkIfThresholdsExceeded(Result result) {
if (Result.hasExceededThresholds(result)) {// Result特定取值才会触发熔断器状态的改变
if (isClosed.compareAndSet(true, false)) {
publishCircuitThresholdsExceededEvent(result, circuitBreakerMetrics);
transitionToOpenState();//将熔断器关闭状态切换为open状态
}
}
}
}
}