最近使用Spring Cloud Gateway
的时候,遇到了一个奇怪的问题:
线上有3个 API 网关实例,压力均衡,平稳运行3天后,突然有一个实例,CPU飚高,并且响应时间增加很多,从几十毫秒涨到了几分钟。
线上是 k8s 管理容器,立刻停掉了这个 pod,重建,恢复正常。
线上我们开启了 JFR
记录(可以参考我的另外系列文章:Java 监控 JFR),通过 JMC 查看下出问题的 JFR 记录。
首先我们来看 GC,我们的 GC 算法是 G1,主要通过 G1 Garbage Collection这个事件查看:
发现 GC 全部为 Young GC,且耗时比较正常,频率上也没有什么明显异常。
接下来来看,CPU 占用相关。直接看 Thread CPU Load 这个事件,看每个线程的 CPU 占用情况。发现reactor-http-epoll
线程池的线程,CPU 占用很高,加在一起,接近了 100%。
这些线程是 reactor-netty 处理业务的线程,观察其他实例,发现正常情况下,并不会有这么高的 CPU 负载。那么为啥会有这么高的负载呢?通过 Thread Dump 来看一下线程堆栈有何发现.
通过查看多个线程堆栈 dump,发现这些线程基本都处于 Runnable,并且执行的方法是原生方法,和StackWalker
相关,例如:
"reactor-http-epoll-2" #75 daemon prio=5 os_prio=0 cpu=25100145.64ms elapsed=306507.26s tid=0x0000556eddcbd000 nid=0x61 runnable [0x00007f8605443000]
java.lang.Thread.State: RUNNABLE
at java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk(java.base@11.0.6/Native Method)
at java.lang.StackStreamFactory$AbstractStackWalker.beginStackWalk(java.base@11.0.6/StackStreamFactory.java:370)
at java.lang.StackStreamFactory$AbstractStackWalker.walk(java.base@11.0.6/StackStreamFactory.java:243)
at java.lang.StackWalker.walk(java.base@11.0.6/StackWalker.java:498)
at org.apache.logging.log4j.util.StackLocator.calcLocation(StackLocator.java:81)
at org.apache.logging.log4j.util.StackLocatorUtil.calcLocation(StackLocatorUtil.java:76)
at org.apache.logging.log4j.spi.AbstractLogger.getLocation(AbstractLogger.java:2201)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2144)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2127)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2020)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1436)
at com.xxx.apigateway.filter.AccessCheckFilter.filter(AccessCheckFilter.java:144)
at org.springframework.cloud.gateway.handler.FilteringWebHandler$GatewayFilterAdapter.filter(FilteringWebHandler.java:138)
at org.springframework.cloud.gateway.filter.OrderedGatewayFilter.filter(OrderedGatewayFilter.java:44)
at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain.lambda$filter$0(FilteringWebHandler.java:118)
at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain$$Lambda$1265/0x0000000800b83440.get(Unknown Source)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.Mono.subscribe(Mono.java:4105)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172)
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerNext(FluxConcatMap.java:274)
at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:851)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1637)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onSubscribe(MonoFlatMap.java:230)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(Mon