接口偶尔超时,竟又是JVM停顿的锅!

本文讲述了作者在遇到接口超时问题后,通过分析JVM safepoint日志和深入理解safepoint机制,利用async-profiler工具,最终发现问题是由于宿主机CPU负载过高导致的线程调度延迟。通过监控宿主机和容器的CPU使用情况,找到了问题的根本原因,并探讨了在容器环境中如何收集证据和避免此类问题。
摘要由CSDN通过智能技术生成

🚀 优质资源分享 🚀

学习路线指引(点击解锁) 知识定位 人群定位
🧡 Python实战微信订餐小程序 🧡 进阶级 本课程是python flask+微信小程序的完美结合,从项目搭建到腾讯云部署上线,打造一个全栈订餐系统。
💛Python量化交易实战💛 入门级 手把手带你打造一个易扩展、更安全、效率更高的量化交易系统

原创:扣钉日记(微信公众号ID:codelogs),欢迎分享,转载请保留出处。

简介

继上次我们JVM停顿十几秒的问题解决后,我们系统终于稳定了,再也不会无故重启了!
这是之前的文章:耗时几个月,终于找到了JVM停顿十几秒的原因

但有点奇怪的是,每隔一段时间,我们服务接口就会有一小波499超时,经过查看gc日志,又发现JVM停顿了好几秒!

查看safepoint日志

有了上次JVM停顿排查经验后,我马上就检查了gc日志与safepoint日志,发现如下日志:

$ cat gc-*.log | awk '/application threads were stopped/ && $(NF-6)>1'|tail
2022-05-08T16:40:53.886+0800: 78328.993: Total time for which application threads were stopped: 9.4917471 seconds, Stopping threads took: 9.3473059 seconds
2022-05-08T17:40:32.574+0800: 81907.681: Total time for which application threads were stopped: 3.9786219 seconds, Stopping threads took: 3.9038683 seconds
2022-05-08T17:41:00.063+0800: 81935.170: Total time for which application threads were stopped: 1.2607608 seconds, Stopping threads took: 1.1258499 seconds

$ cat safepoint.log | awk '/vmop/{title=$0;getline;if($(NF-2)+$(NF-4)>1000){print title;print $0}}'
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
78319.500: G1IncCollectionPause             [     428          0              2    ]      [     0  9347  9347     7   137    ]  0
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
81903.703: G1IncCollectionPause             [     428          0              4    ]      [     0  3903  3903    14    60    ]  0
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
81933.906: G1IncCollectionPause             [     442          0              1    ]      [     0  1125  1125     8   126    ]  0

从日志上可以看到,JVM停顿也是由safepoint导致的,而safepoint耗时主要在block阶段!

通过添加JVM参数-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1000后,可打印出哪些线程超过1000ms没有到达safepoint,如下:
image_2022-06-11_20220611122458
可以看到都是一些http或grpc的worker线程没走到safepoint,但为啥没到达safepoint,看不出关键&#

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值