Hystrix Semaphore timeout

最新推荐文章于 2024-06-11 13:41:41 发布

weixin_33895516

最新推荐文章于 2024-06-11 13:41:41 发布

阅读量474

点赞数

文章标签： python java

原文链接：https://my.oschina.net/tigerlene/blog/2222699

版权

2019独角兽企业重金招聘Python工程师标准>>>

When to use semaphore

For Thread isolation. there is a thread context switch cost. but this is almost can be ignored in most application. see Thread pool

For circuits that wrap very low-latency requests (such as those that primarily hit in-memory caches) the overhead can be too high and in those cases you can use another method such as tryable semaphores which, while they do not allow for timeouts, provide most of the resilience benefits without the overhead. The overhead in general, however, is small enough that Netflix in practice usually prefers the isolation benefits of a separate thread over such techniques.

Drawbacks

It does not allow for timing out and walking away.

why? show me the code.

    /**
     * Semaphore that only supports tryAcquire and never blocks and that supports a dynamic permit count.
     * <p>
     * Using AtomicInteger increment/decrement instead of java.util.concurrent.Semaphore since we don't need blocking and need a custom implementation to get the dynamic permit count and since
     * AtomicInteger achieves the same behavior and performance without the more complex implementation of the actual Semaphore class using AbstractQueueSynchronizer.
     */
    /* package */static class TryableSemaphoreActual implements TryableSemaphore {
        protected final HystrixProperty<Integer> numberOfPermits;
        private final AtomicInteger count = new AtomicInteger(0);

        public TryableSemaphoreActual(HystrixProperty<Integer> numberOfPermits) {
            this.numberOfPermits = numberOfPermits;
        }

        @Override
        public boolean tryAcquire() {
            int currentCount = count.incrementAndGet();
            if (currentCount > numberOfPermits.get()) {
                count.decrementAndGet();
                return false;
            } else {
                return true;
            }
        }

        @Override
        public void release() {
            count.decrementAndGet();
        }

        @Override
        public int getNumberOfPermitsUsed() {
            return count.get();
        }

    }

通过计数的形式实现信号量机制。 Since1.4.4 Hystrix 提供了信号量的timeOut 机制，但是timeout 不会中断原来的线程，但是在timeout 发生的时候，把这个信息反馈到熔断器上，这样能做到更加实时的熔断机制。timeout 也是通过Timer 来实现的。


       Observable<R> execution;
        if (properties.executionTimeoutEnabled().get()) {
            execution = executeCommandWithSpecifiedIsolation(_cmd)
                    .lift(new HystrixObservableTimeoutOperator<R>(_cmd));
        } else {
            execution = executeCommandWithSpecifiedIsolation(_cmd);
        }
		
		
		TimerListener listener = new TimerListener() {

                @Override
                public void tick() {
                    // if we can go from NOT_EXECUTED to TIMED_OUT then we do the timeout codepath
                    // otherwise it means we lost a race and the run() execution completed or did not start
                    if (originalCommand.isCommandTimedOut.compareAndSet(TimedOutStatus.NOT_EXECUTED, TimedOutStatus.TIMED_OUT)) {
                        // report timeout failure
                        originalCommand.eventNotifier.markEvent(HystrixEventType.TIMEOUT, originalCommand.commandKey);

                        // shut down the original request
                        s.unsubscribe();

                        final HystrixContextRunnable timeoutRunnable = new HystrixContextRunnable(originalCommand.concurrencyStrategy, hystrixRequestContext, new Runnable() {

                            @Override
                            public void run() {
                                child.onError(new HystrixTimeoutException());
                            }
                        });


                        timeoutRunnable.run();
                        //if it did not start, then we need to mark a command start for concurrency metrics, and then issue the timeout
                    }
                }

                @Override
                public int getIntervalTimeInMilliseconds() {
                    return originalCommand.properties.executionTimeoutInMilliseconds().get();
                }
            };

Note: if a dependency is isolated with a semaphore and then becomes latent, the parent threads will remain blocked until the underlying network calls timeout. Semaphore rejection will start once the limit is hit but the threads filling the semaphore can not walk away.

信号量机制不允许超时设定，所以会阻塞服务端的线程，这样如果当command里面的执行变得很慢的时候，就会block当前请求的线程。这个时候如果信号量设置得很大，比如100，那么这100个线程都会被阻塞。如果有多个调用如此，那客户端能处理的请求就会变少。

这里是ab测试：默认请求是3秒完成，客户端无超时时间，信号量100:

ab -n 200 -c 200 http://localhost:8990/coke/block
Percentage of the requests served within a certain time (ms)
  50%   3057
  66%   3238
  75%   3274
  80%   3321
  90%   3510
  95%   3516
  98%   3523
  99%   3524
 100%   3526 (longest request)

期望：100个请求通过，100个被拒绝

Zuul and Hystrix timeout

As we know Zuul default use HystrixCommand with in ribbon http client.Default Hystrix isolation pattern (ExecutionIsolationStrategy) for all routes is SEMAPHORE.

Zuul 默认使用信号量做隔离（因为Zuul主要做请求转发已经是线程隔离的了，所以没有必要再使用一次线程隔离）,超时由HttpClient的timeout 设置，当请求timeout之后抛出异常，然后才会触发对应的熔断降级。如果使用线程池做隔离，则超时实践如下：

(ribbon.ConnectTimeout + ribbon.ReadTimeout) * (ribbon.MaxAutoRetries + 1) * (ribbon.MaxAutoRetriesNextServer + 1)

Zuul default use serviceId as commandKey, default semophore is 100.

Sentinel

If you trust the client and you only want load shedding, you could use this approach.(Semaphore)

当我们的调用方有可能出现延迟，并且qps很高的时候(如果这个时候使用线程池，你可能需要创建很大数量的线程池比如，qps2000，响应时间1秒，这个时候就需要 2000大小的线程，会额外带来大量的线程切换开销)，这个时候我们可以使用alibaba/Sentinel 来做熔断降级。

具体可以参考这里熔断

Sentinel 与 Hystrix 的对比

转载于:https://my.oschina.net/tigerlene/blog/2222699