🚀 优质资源分享 🚀
学习路线指引(点击解锁) | 知识定位 | 人群定位 |
---|---|---|
🧡 Python实战微信订餐小程序 🧡 | 进阶级 | 本课程是python flask+微信小程序的完美结合,从项目搭建到腾讯云部署上线,打造一个全栈订餐系统。 |
💛Python量化交易实战💛 | 入门级 | 手把手带你打造一个易扩展、更安全、效率更高的量化交易系统 |
原创:扣钉日记(微信公众号ID:codelogs),欢迎分享,转载请保留出处。
简介
继上次我们JVM停顿十几秒的问题解决后,我们系统终于稳定了,再也不会无故重启了!
这是之前的文章:耗时几个月,终于找到了JVM停顿十几秒的原因
但有点奇怪的是,每隔一段时间,我们服务接口就会有一小波499超时,经过查看gc日志,又发现JVM停顿了好几秒!
查看safepoint日志
有了上次JVM停顿排查经验后,我马上就检查了gc日志与safepoint日志,发现如下日志:
$ cat gc-*.log | awk '/application threads were stopped/ && $(NF-6)>1'|tail
2022-05-08T16:40:53.886+0800: 78328.993: Total time for which application threads were stopped: 9.4917471 seconds, Stopping threads took: 9.3473059 seconds
2022-05-08T17:40:32.574+0800: 81907.681: Total time for which application threads were stopped: 3.9786219 seconds, Stopping threads took: 3.9038683 seconds
2022-05-08T17:41:00.063+0800: 81935.170: Total time for which application threads were stopped: 1.2607608 seconds, Stopping threads took: 1.1258499 seconds
$ cat safepoint.log | awk '/vmop/{title=$0;getline;if($(NF-2)+$(NF-4)>1000){print title;print $0}}'
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
78319.500: G1IncCollectionPause [ 428 0 2 ] [ 0 9347 9347 7 137 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
81903.703: G1IncCollectionPause [ 428 0 4 ] [ 0 3903 3903 14 60 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
81933.906: G1IncCollectionPause [ 442 0 1 ] [ 0 1125 1125 8 126 ] 0
从日志上可以看到,JVM停顿也是由safepoint导致的,而safepoint耗时主要在block阶段!
通过添加JVM参数-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1000
后,可打印出哪些线程超过1000ms
没有到达safepoint,如下:
可以看到都是一些http或grpc的worker线程没走到safepoint,但为啥没到达safepoint,看不出关键&#