Spring Cloud Hystrix 源码系列:熔断器

circuit-breaker, circuit表示电路,译为熔断器非常精准,而Hystrix属于自动恢复的智能熔断器,它保护的着你的系统,宿主在调用方的应用系统中,避免因为依赖系统的异常或宕机而引发一系列连锁反应。Hystrix 原理也比较简单,在一个时间窗口下,通过不断收集依赖服务(第三方)请求指标信息(sucess、failure、timeout、rejection),当达到设定熔断条件时(默认是请求失败率达到50%)则进行熔断。本文基于hystrix-core 1.5.18(近年来几乎很少更新,建议升级)。


1. 基本原理

1.4 断路器打开意味着什么

2. 配置篇

2.1 CircuitBreaker open 和circuitBreaker.forceClosed 及circuitBreaker.forceOpen是如何工作的

3. 源码

3.1 HystrixCircuitBreaker

3.2  实现的子类

3.3 HystrixCircuitBreaker的初始化

3.4 HystrixCircuitBreakerImpl


3.6 与HystrixEventStream的关联

3.7 HealthCountsStream

1. 基本原理

在统计中,会使用一定数量的样本,并将样本进行分组,最后进行统计分析。Hystrix 有点类似,例如:以秒为单位来统计请求的处理情况(成功请求数量、失败请求数、超时请求数、被拒绝的请求数),然后每次取最近10秒的数据来进行计算,如果失败率超过50%,就进行熔断,不再处理任何请求。Hystrix官网的一张图:

1.1 桶

假定以秒为单位来统计请求处理情况,上面每个格子代表1秒,格子中的数据就是1秒内各处理结果的请求数量,格子称为 Bucket(译为桶)

1.2 滑动窗口

若每次的决策都以10个Bucket的数据为依据,计算10个Bucket的请求处理情况,当失败率超过50%时就熔断。10个Bucket就是10秒,这个10秒就是一个 滑动窗口(Rolling window)滑动意味着:在没有熔断时,每当收集好一个新的Bucket后,就会丢弃掉最旧的一个Bucket(深色的 [ 23 5 2 0 ] )就是被丢弃的桶)。

1.3 官方完整的流程图



1.4 断路器打开意味着什么

断路器处于 OPEN 状态时,链路处于非健康状态,命令执行时,直接调用回退逻辑,跳过正常逻辑。

2. 配置篇







2.1 CircuitBreaker open 和circuitBreaker.forceClosed 及circuitBreaker.forceOpen是如何工作的

 * ForcedOpen | ForcedClosed | CircuitBreaker open due to health ||| Expected Result
 * T | T | T ||| OPEN (true)
 * T | T | F ||| OPEN (true)
 * T | F | T ||| OPEN (true)
 * T | F | F ||| OPEN (true)
 * F | T | T ||| CLOSED (false)
 * F | T | F ||| CLOSED (false)
 * F | F | T ||| OPEN (true)
 * F | F | F ||| CLOSED (false)
 * @return boolean
public boolean isCircuitBreakerOpen() {
    return properties.circuitBreakerForceOpen().get() || (!properties.circuitBreakerForceClosed().get() && circuitBreaker.isOpen());

3. 源码

3.1 HystrixCircuitBreaker

public interface HystrixCircuitBreaker {
     * Every {@link HystrixCommand} requests asks this if it is allowed to proceed or not.
     * <p>
     * This takes into account the half-open logic which allows some requests through when determining if it should be closed again.
     * @return boolean whether a request should be permitted
    public boolean allowRequest();
     * Whether the circuit is currently open (tripped).
     * @return boolean state of circuit breaker
    public boolean isOpen();
     * Invoked on successful executions from {@link HystrixCommand} as part of feedback mechanism when in a half-open state.
    void markSuccess();

3.2  实现的子类


  • NoOpCircuitBreaker :的断路器实现,用于不开启断路器功能的情况
  • HystrixCircuitBreakerImpl :完整的断路器实现

3.3 HystrixCircuitBreaker的初始化


private static HystrixCircuitBreaker initCircuitBreaker(boolean enabled, HystrixCircuitBreaker fromConstructor,
                                                        HystrixCommandGroupKey groupKey, HystrixCommandKey commandKey,
                                                        HystrixCommandProperties properties, HystrixCommandMetrics metrics) {
    if (enabled) {// 如果启用了熔断器
        if (fromConstructor == null) {//若commandKey没有对应的CircuitBreaker,则创建
            // get the default implementation of HystrixCircuitBreaker
            return HystrixCircuitBreaker.Factory.getInstance(commandKey, groupKey, properties, metrics);
        } else {
            return fromConstructor;
    } else {
        return new NoOpCircuitBreaker();
public static HystrixCircuitBreaker getInstance(HystrixCommandKey key, HystrixCommandGroupKey group, HystrixCommandProperties properties, HystrixCommandMetrics metrics) {
    // 如果有则返回现有的
	// this should find it for all but the first time
    HystrixCircuitBreaker previouslyCached = circuitBreakersByCommand.get(key.name());
    if (previouslyCached != null) {
        return previouslyCached;
    // if we get here this is the first time so we need to initialize

    // Create and add to the map ... use putIfAbsent to atomically handle the possible race-condition of
    // 2 threads hitting this point at the same time and let ConcurrentHashMap provide us our thread-safety
    // If 2 threads hit here only one will get added and the other will get a non-null response instead.
	// 如果没有则创建并cache
    HystrixCircuitBreaker cbForCommand = circuitBreakersByCommand.putIfAbsent(key.name(), new HystrixCircuitBreakerImpl(key, group, properties, metrics));
    if (cbForCommand == null) {
        // this means the putIfAbsent step just created a new one so let's retrieve and return it
        return circuitBreakersByCommand.get(key.name());
    } else {
        // this means a race occurred and while attempting to 'put' another one got there before
        // and we instead retrieved it and will now return it
        return cbForCommand;

3.4 HystrixCircuitBreakerImpl


static class HystrixCircuitBreakerImpl implements HystrixCircuitBreaker {
    private final HystrixCommandProperties properties;
    private final HystrixCommandMetrics metrics;
    /* track whether this circuit is open/closed at any given point in time (default to false==closed) */
    private AtomicBoolean circuitOpen = new AtomicBoolean(false);

    /* when the circuit was marked open or was last allowed to try a 'singleTest' */
    private AtomicLong circuitOpenedOrLastTestedTime = new AtomicLong();
    protected HystrixCircuitBreakerImpl(HystrixCommandKey key, HystrixCommandGroupKey commandGroup, HystrixCommandProperties properties, HystrixCommandMetrics metrics) {
        this.properties = properties;
        this.metrics = metrics;
    //关闭熔断器并reset metrics
    public void markSuccess() {
        if (circuitOpen.get()) {
            if (circuitOpen.compareAndSet(true, false)) {
                //win the thread race to reset metrics
                //Unsubscribe from the current stream to reset the health counts stream.  This only affects the health counts view,
                //and all other metric consumers are unaffected by the reset
    public boolean allowRequest() {
        if (properties.circuitBreakerForceOpen().get()) {
            // properties have asked us to force the circuit open so we will allow NO requests
            return false;
        if (properties.circuitBreakerForceClosed().get()) {
            // we still want to allow isOpen() to perform it's calculations so we simulate normal behavior
            // properties have asked us to ignore errors so we will ignore the results of isOpen and just allow all traffic through
            return true;
        return !isOpen() || allowSingleTest();
    public boolean allowSingleTest() {
        long timeCircuitOpenedOrWasLastTested = circuitOpenedOrLastTestedTime.get();
        // 1) if the circuit is open
        // 2) and it's been longer than 'sleepWindow' since we opened the circuit
        if (circuitOpen.get() && System.currentTimeMillis() > timeCircuitOpenedOrWasLastTested + properties.circuitBreakerSleepWindowInMilliseconds().get()) {
            // We push the 'circuitOpenedTime' ahead by 'sleepWindow' since we have allowed one request to try.
            // If it succeeds the circuit will be closed, otherwise another singleTest will be allowed at the end of the 'sleepWindow'.
            if (circuitOpenedOrLastTestedTime.compareAndSet(timeCircuitOpenedOrWasLastTested, System.currentTimeMillis())) {
                // if this returns true that means we set the time so we'll return true to allow the singleTest
                // if it returned false it means another thread raced us and allowed the singleTest before we did
                return true;
        return false;
    public boolean isOpen() {
        if (circuitOpen.get()) {
            // if we're open we immediately return true and don't bother attempting to 'close' ourself as that is left to allowSingleTest and a subsequent successful test to close
            return true;
        // we're closed, so let's see if errors have made us so we should trip the circuit open
        HealthCounts health = metrics.getHealthCounts();

        // check if we are past the statisticalWindowVolumeThreshold
        if (health.getTotalRequests() < properties.circuitBreakerRequestVolumeThreshold().get()) {
            // we are not past the minimum volume threshold for the statisticalWindow so we'll return false immediately and not calculate anything
            return false;

        if (health.getErrorPercentage() < properties.circuitBreakerErrorThresholdPercentage().get()) {
            return false;
        } else {
            // our failure rate is too high, trip the circuit
            if (circuitOpen.compareAndSet(false, true)) {
                // if the previousValue was false then we want to set the currentTime
                return true;
            } else {
                // How could previousValue be true? If another thread was going through this code at the same time a race-condition could have
                // caused another thread to set it to true already even though we were in the process of doing the same
                // In this case, we know the circuit is open, so let the other thread set the currentTime and report back that the circuit is open
                return true;



public static class HealthCounts {
    private final long totalCount;
	//错误请求数(failure + success + timeout + threadPoolRejected + semaphoreRejected)
    private final long errorCount;
    private final int errorPercentage;
    public HealthCounts plus(long[] eventTypeCounts) {
        long updatedTotalCount = totalCount;
        long updatedErrorCount = errorCount;

        long successCount = eventTypeCounts[HystrixEventType.SUCCESS.ordinal()];
        long failureCount = eventTypeCounts[HystrixEventType.FAILURE.ordinal()];
        long timeoutCount = eventTypeCounts[HystrixEventType.TIMEOUT.ordinal()];
        long threadPoolRejectedCount = eventTypeCounts[HystrixEventType.THREAD_POOL_REJECTED.ordinal()];
        long semaphoreRejectedCount = eventTypeCounts[HystrixEventType.SEMAPHORE_REJECTED.ordinal()];

        updatedTotalCount += (successCount + failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
        updatedErrorCount += (failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
        return new HealthCounts(updatedTotalCount, updatedErrorCount);


在最新Hystrix 1.5.18版本已经移除了Status,在HystrixCircuitBreakerImpl已经可以看出,采用circuitOpen(bool型) 代替status(CLOSED 、OPEN 、HALF_OPEN),这样的好处是对调用者而言熔断器API更简单。



3.6 与HystrixEventStream的关联

Hystrix Command执行过程中,各种情况都以事件形式发出,再封装成特定的数据结构,最后汇入到事件流中(HystrixEventStream)。事件流提供了 observe() 方法,摇身一变,事件流把自己变成了一个数据源(各小溪汇入成河,消费者从河里取水),其他消费者可以从这里获取数据,而 circuit-breaker 就是消费者之一。


在上一节“Metrics 收集”讲过HystrixEventStream有承上启下的作用,接盘侠就是BucketedCounterStream(这个下面会讲),那纠结是怎么回事呢?那还要从HystrixCommandMetrics.healthCountsStream讲起,它通过HystrixCommandCompletionStream.getInstance(commandKey)将事件流转接到BucketedCounterStream中。

public class HystrixCommandMetrics extends HystrixMetrics {
    HystrixCommandMetrics(final HystrixCommandKey key, HystrixCommandGroupKey commandGroup, HystrixThreadPoolKey threadPoolKey, HystrixCommandProperties properties, HystrixEventNotifier eventNotifier) {
        healthCountsStream = HealthCountsStream.getInstance(key, properties);//实例化
public class HealthCountsStream {
    public static HealthCountsStream getInstance(HystrixCommandKey commandKey, HystrixCommandProperties properties) {
        final int healthCountBucketSizeInMs = properties.metricsHealthSnapshotIntervalInMilliseconds().get();
        if (healthCountBucketSizeInMs == 0) {
            throw new RuntimeException("You have set the bucket size to 0ms.  Please set a positive number, so that the metric stream can be properly consumed");
        final int numHealthCountBuckets = properties.metricsRollingStatisticalWindowInMilliseconds().get() / healthCountBucketSizeInMs;

        return getInstance(commandKey, numHealthCountBuckets, healthCountBucketSizeInMs);
    public static HealthCountsStream getInstance(HystrixCommandKey commandKey, int numBuckets, int bucketSizeInMs) {
        HealthCountsStream initialStream = streams.get(commandKey.name());
        if (initialStream != null) {
            return initialStream;
        } else {
            final HealthCountsStream healthStream;
            synchronized (HealthCountsStream.class) {
                HealthCountsStream existingStream = streams.get(commandKey.name());
                if (existingStream == null) {
                    HealthCountsStream newStream = new HealthCountsStream(commandKey, numBuckets, bucketSizeInMs,

                    streams.putIfAbsent(commandKey.name(), newStream);
                    healthStream = newStream;
                } else {
                    healthStream = existingStream;
            return healthStream;
public class HealthCountsStream extends BucketedRollingCounterStream{
    private HealthCountsStream(final HystrixCommandKey commandKey, final int numBuckets, final int bucketSizeInMs,
                               Func2<long[], HystrixCommandCompletion, long[]> reduceCommandCompletion) {
        super(HystrixCommandCompletionStream.getInstance(commandKey), numBuckets, bucketSizeInMs, reduceCommandCompletion, healthCheckAccumulator);
//extends 关系
public abstract class BucketedRollingCounterStream extends BucketedCounterStream{
//extends 关系
public abstract class BucketedCounterStream{
    protected BucketedCounterStream(final HystrixEventStream<Event> inputEventStream, final int numBuckets, final int bucketSizeInMs,
                                    final Func2<Bucket, Event, Bucket> appendRawEventToBucket) {
        this.bucketedStream = Observable.defer(new Func0<Observable<Bucket>>() {
            public Observable<Bucket> call() {
                return inputEventStream //这个是个关键点(HystrixCommandCompletionStream.observe)
                        .window(bucketSizeInMs, TimeUnit.MILLISECONDS) //bucket it by the counter window so we can emit to the next operator in time chunks, not on every OnNext
                        .flatMap(reduceBucketToSummary)                //for a given bucket, turn it into a long array containing counts of event types
                        .startWith(emptyEventCountsToStart);           //start it with empty arrays to make consumer logic as generic as possible (windows are always full)

3.7 HealthCountsStream


总结,metrics.getHealthCountsStream()拿到的是一个已经汇总成以 “rollingWindow” 为单位的统计数据,observe() 实际拿到的是BucketedRollingCounterStream的sourceStream。

public abstract class BucketedCounterStream<Event extends HystrixEvent, Bucket, Output> {
    protected BucketedCounterStream(final HystrixEventStream<Event> inputEventStream, final int numBuckets, final int bucketSizeInMs,
                                    final Func2<Bucket, Event, Bucket> appendRawEventToBucket) {
        this.numBuckets = numBuckets;
        // 将Hystrix事件汇总成Bucket的处理者, 是一个Func1
        this.reduceBucketToSummary = new Func1<Observable<Event>, Observable<Bucket>>() {
            // 传入Event类型的数据源,汇总成Bucket类型的数据
            public Observable<Bucket> call(Observable<Event> eventBucket) {
                return eventBucket.reduce(getEmptyBucketSummary(), appendRawEventToBucket);

        final List<Bucket> emptyEventCountsToStart = new ArrayList<Bucket>();
        for (int i = 0; i < numBuckets; i++) {
        this.bucketedStream = Observable.defer(new Func0<Observable<Bucket>>() {
            //inputEventStream 就是一直提到的HystrixEventStream, 通过observe()来获取数据源
            public Observable<Bucket> call() {
                return inputEventStream
                        .window(bucketSizeInMs, TimeUnit.MILLISECONDS) //bucket it by the counter window so we can emit to the next operator in time chunks, not on every OnNext
                        .flatMap(reduceBucketToSummary)                //for a given bucket, turn it into a long array containing counts of event types
                        .startWith(emptyEventCountsToStart);           //start it with empty arrays to make consumer logic as generic as possible (windows are always full)
public abstract class BucketedRollingCounterStream<Event extends HystrixEvent, Bucket, Output> extends BucketedCounterStream<Event, Bucket, Output> {
    private Observable<Output> sourceStream;
    private final AtomicBoolean isSourceCurrentlySubscribed = new AtomicBoolean(false);
    protected BucketedRollingCounterStream(HystrixEventStream<Event> stream, final int numBuckets, int bucketSizeInMs,
              final Func2<Bucket, Event, Bucket> appendRawEventToBucket,
              final Func2<Output, Bucket, Output> reduceBucket) {//reduceBucket就是healthCounts.plus
        super(stream, numBuckets, bucketSizeInMs, appendRawEventToBucket);
        Func1<Observable<Bucket>, Observable<Output>> reduceWindowToSummary = new Func1<Observable<Bucket>, Observable<Output>>() {
            public Observable<Output> call(Observable<Bucket> window) {
                return window.scan(getEmptyOutputValue(), reduceBucket).skip(numBuckets);
        // 基于父类BucketedCounterStream已经汇总的bucketedStream
        this.sourceStream = bucketedStream      //stream broken up into buckets
                .window(numBuckets, 1)          //emit overlapping windows of buckets
                .flatMap(reduceWindowToSummary) //convert a window of bucket-summaries into a single summary
                .doOnSubscribe(new Action0() {
                    public void call() {
                .doOnUnsubscribe(new Action0() {
                    public void call() {
                .share()                        //multiple subscribers should get same data
                .onBackpressureDrop();          //if there are slow consumers, data should not buffer
//HealthCountsStream 继承上一个,对外提供健康统计
public class HealthCountsStream extends BucketedRollingCounterStream<HystrixCommandCompletion, long[], HystrixCommandMetrics.HealthCounts> {
    private static final Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts> healthCheckAccumulator = new Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts>() {
        public HystrixCommandMetrics.HealthCounts call(HystrixCommandMetrics.HealthCounts healthCounts, long[] bucketEventCounts) {
            return healthCounts.plus(bucketEventCounts);//统计
    private HealthCountsStream(final HystrixCommandKey commandKey, final int numBuckets, final int bucketSizeInMs,
                               Func2<long[], HystrixCommandCompletion, long[]> reduceCommandCompletion) {
        super(HystrixCommandCompletionStream.getInstance(commandKey), numBuckets, bucketSizeInMs, reduceCommandCompletion, healthCheckAccumulator);

HystrixEventStream的接盘侠已经讲过,它已经承接(接收)了事件流,这时你会问题HealthCountsStream 是什么时候被订阅消费的呢?

public HealthCounts getHealthCounts() {
    return healthCountsStream.getLatest();
public abstract class BucketedCounterStream<Event extends HystrixEvent, Bucket, Output> {
     * Synchronous call to retrieve the last calculated bucket without waiting for any emissions
     * @return last calculated bucket
    public Output getLatest() {
        if (counterSubject.hasValue()) {
            return counterSubject.getValue();
        } else {
            return getEmptyOutputValue();
    public void startCachingStreamValuesIfUnstarted() {
        if (subscription.get() == null) {
            //the stream is not yet started
            Subscription candidateSubscription = observe().subscribe(counterSubject);
            if (subscription.compareAndSet(null, candidateSubscription)) {
                //won the race to set the subscription
            } else {
                //lost the race to set the subscription, so we need to cancel this one






当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


