文章目录
记一次OOM问题排查
1.问题发现
1.1 告警提示
fund_route_center
调用 fund_operation
大量超时,猜测 fund_operation
服务挂了?
1.2 mesh 监控定位 IP
第二步上 mesh 监控查看机器情况,确定问题机器?
从上图可以看出,10.0.121.209、10.0.121.234 挂了
1.3 搜索日志,确定是否发生 oom
其实到这里不能确定是否发生了 oom,还有可能是线程阻塞等诸多问题。
直接搜索 Java heap space,确实有 oom 日志(笔者曾经碰到 dubbo 线程池阻塞问题,那种问题则需要dump线程堆栈信息才能排查)
2.下载 dump 文件
2.1 查看 JVM 参数
不是所有的服务发生 oom 后,都会有 dump 文件,这个需要 jvm 启动参数配置:
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=../log/
直接在 PASS 平台可以看是否已经陪着
2.2 下载 dump 文件
很幸运,credit.fund_operation.center
的启动参数配有这2条;针对没有配有这2条参数的,无法生成dump文件,无法继续采用此方案分析下去了,请去看 xlog 日志分析吧。
2.2.1 官方教程,容器上下载 dump 文件
官方地址,请进入这个链接,它会告诉你怎么从容器上下载文件到本地(方便分析dump文件)
2.2.2 笔者教程,容器上下载 dump 文件
如果官方教程有不懂的,可以看【本人教程】这环
- 申请跳板机权限 10.0.121.209
- 通过 iTerm2 登陆 10.0.121.209(PS:没有 iTerm2 也可以用 mac 终端登录)
- 执行命令:
cd /home/log/credit.fund_operation.center/
进入日志文件目录 - 目录下很多日志 *.hprof 就是 oom 的 dump 文件
- 分别执行以下命令,将 dump 文件上传到远端,同时返回远端链接
curl -s https://om.yingzhongtong.com/tools/ops-file-helper -o ops-file-helper && chmod a+x ops-file-helper
./ops-file-helper -f ${Your_File}
由于 dump 文件一般都比较大,所以需要等一会(不建议用 rz 命令,下载 dump 文件,它不适合大文件)
- 成功后显示是这样的,我们复制返回的 https://om.yingzhongtong.com/bksrv/tmp-swap/list 链接,在浏览器打开
- 打开链接之后,搜索主机:10.0.121.209,第一个就是我们刚才上传到远端的 dump 文件记录,点击申请下载权限,然后找运维助手审批
- 审批通过后,进入到我的临时文件,直接下载
至此,容器上的 dump 文件已经下载到本地。
3.下载 jProfiler 分析工具
这里不建议用 visualvm,难用且功能不全;请用 jProfiler。
破解版链接: https://pan.baidu.com/s/1LwsGCC_ZZCNOndCZ4deAfw?pwd=fwa4 提取码: fwa4
3.1 安装 jProfiler
1.点击 jprofiler_macos_11_jb51.dmg,将 JProfiler 拖动到 Applications,拖动完不要关闭这个窗口,我们还需要它的注册码
点击 JProfiler11注册码,会有个弹窗
2. 去应用程序打开 JProfiler。进入证书激活界面,我们选择“Enter license key”输入产品密钥选项进行注册激活
- Name: JProfiler
- Company: 可以不填
- License key: L-J11-Everyone#speedzodiac-327a9wrs5dxvz#463a59
至此,我们的 JProfiler 破解版安装完毕。
4. dump 文件分析
4.1 JProfiler 载入 dump 文件
点击 Start Center,再点击 Open a Single Snapshot,选择我们需要分析的 dump 文件载入
载入成功之后,它是这样的
- 类 Classes:显示所有类和它们的实例,可以右击具体的类"Used Selected Instance"实现进一步跟踪
- 分配 Allocations:为所有记录对象显示分配树和分配热点
- Biggest Objects:大对象,堆内存中的对象按大小倒序展示
- 索引 References:为单个对象和“显示到垃圾回收根目录的路径”提供索引图的显示功能。还能提供合并输入视图和输出视图的功能
4.2 进入 Biggest Objects 页
我们可以直观的看出,有一个 454M 的 java.util.ArrayList对象,它的元素有 3651811。看起来很直了,我们可以直接分析出这个大对象产生的堆栈信息。
4.3 获取大对象产生的堆栈信息
选中 java.util.ArrayList 对象,右键选择 Use Selected Objects,出现弹窗
选中 Rreference,再选择 Incoming references,最后点击OK,出现新页面
展开 java.util.ArrayList,移动到最右边点击 show more
Details for Selected Element 显示的就是堆栈信息
4.4 从堆栈信息定位到代码行
copy 堆栈信息到本地文本,方便搜索。
java stack of Saturn-SendEmailJob-21-thread-1
at com.mysql.cj.protocol.a.TextResultsetReader.read(int, boolean, com.mysql.cj.protocol.a.NativePacketPayload, com.mysql.cj.protocol.ColumnDefinition, com.mysql.cj.protocol.ProtocolEntityFactory) (line: 87)
at com.mysql.cj.protocol.a.TextResultsetReader.read(int, boolean, com.mysql.cj.protocol.Message, com.mysql.cj.protocol.ColumnDefinition, com.mysql.cj.protocol.ProtocolEntityFactory) (line: 48)
at com.mysql.cj.protocol.a.NativeProtocol.read(java.lang.Class, int, boolean, com.mysql.cj.protocol.a.NativePacketPayload, boolean, com.mysql.cj.protocol.ColumnDefinition, com.mysql.cj.protocol.ProtocolEntityFactory) (line: 1691)
at com.mysql.cj.protocol.a.NativeProtocol.readAllResults(int, boolean, com.mysql.cj.protocol.a.NativePacketPayload, boolean, com.mysql.cj.protocol.ColumnDefinition, com.mysql.cj.protocol.ProtocolEntityFactory) (line: 1745)
at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(com.mysql.cj.Query, com.mysql.cj.protocol.a.NativePacketPayload, int, boolean, java.lang.String, com.mysql.cj.protocol.ColumnDefinition, com.mysql.cj.protocol.Protocol$GetProfilerEventHandlerInstanceFunction, com.mysql.cj.protocol.ProtocolEntityFactory) (line: 1034)
at com.mysql.cj.NativeSession.execSQL(com.mysql.cj.Query, java.lang.String, int, com.mysql.cj.protocol.a.NativePacketPayload, boolean, com.mysql.cj.protocol.ProtocolEntityFactory, java.lang.String, com.mysql.cj.protocol.ColumnDefinition, boolean) (line: 1153)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(int, com.mysql.cj.protocol.Message, boolean, boolean, com.mysql.cj.protocol.ColumnDefinition, boolean) (line: 951)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$xWKNNgqg() (line: 391)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$xWKNNgqg$accessor$w2RhkjP4()
at com.mysql.cj.jdbc.ClientPreparedStatement$auxiliary$FBOgMAum.call()
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(java.lang.Object, java.lang.Object[ ], java.util.concurrent.Callable, java.lang.reflect.Method) (line: 86)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute()
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(com.alibaba.druid.proxy.jdbc.PreparedStatementProxy) (line: 3461)
at com.alibaba.druid.wall.WallFilter.preparedStatement_execute(com.alibaba.druid.filter.FilterChain, com.alibaba.druid.proxy.jdbc.PreparedStatementProxy) (line: 660)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(com.alibaba.druid.proxy.jdbc.PreparedStatementProxy) (line: 3459)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(com.alibaba.druid.filter.FilterChain, com.alibaba.druid.proxy.jdbc.PreparedStatementProxy) (line: 440)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(com.alibaba.druid.proxy.jdbc.PreparedStatementProxy) (line: 3459)
at com.alibaba.druid.proxy.jdbc.PreparedStatementProxyImpl.execute() (line: 167)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute() (line: 497)
at sun.reflect.GeneratedMethodAccessor156.invoke(java.lang.Object, java.lang.Object[ ])
at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 43)
at java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ]) (line: 498)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 59)
at com.sun.proxy.$Proxy335.execute()
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(java.sql.Statement, org.apache.ibatis.session.ResultHandler) (line: 64)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(java.sql.Statement, org.apache.ibatis.session.ResultHandler) (line: 79)
at sun.reflect.GeneratedMethodAccessor155.invoke(java.lang.Object, java.lang.Object[ ])
at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 43)
at java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ]) (line: 498)
at org.apache.ibatis.plugin.Plugin.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 63)
at com.sun.proxy.$Proxy333.query(java.sql.Statement, org.apache.ibatis.session.ResultHandler)
at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.doQuery(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler, org.apache.ibatis.mapping.BoundSql) (line: 67)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler, org.apache.ibatis.cache.CacheKey, org.apache.ibatis.mapping.BoundSql) (line: 324)
at org.apache.ibatis.executor.BaseExecutor.query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler, org.apache.ibatis.cache.CacheKey, org.apache.ibatis.mapping.BoundSql) (line: 156)
at com.baomidou.mybatisplus.core.executor.MybatisCachingExecutor.query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler, org.apache.ibatis.cache.CacheKey, org.apache.ibatis.mapping.BoundSql) (line: 155)
at com.baomidou.mybatisplus.core.executor.MybatisCachingExecutor.query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler) (line: 90)
at sun.reflect.GeneratedMethodAccessor134.invoke(java.lang.Object, java.lang.Object[ ])
at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 43)
at java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ]) (line: 498)
at org.apache.ibatis.plugin.Plugin.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 63)
at com.sun.proxy.$Proxy332.query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler)
at sun.reflect.GeneratedMethodAccessor134.invoke(java.lang.Object, java.lang.Object[ ])
at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 43)
at java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ]) (line: 498)
at org.apache.ibatis.plugin.Invocation.proceed() (line: 49)
at com.xiaoying.fundoperation.common.logsql.LogSqlHelper.intercept(org.apache.ibatis.plugin.Invocation, int, boolean) (line: 43)
at com.xiaoying.fundoperation.common.logsql.LogQueryAndUpdateSqlHandler.intercept(org.apache.ibatis.plugin.Invocation) (line: 42)
at org.apache.ibatis.plugin.Plugin.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 61)
at com.sun.proxy.$Proxy332.query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(java.lang.String, java.lang.Object, org.apache.ibatis.session.RowBounds) (line: 147)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(java.lang.String, java.lang.Object) (line: 140)
at sun.reflect.GeneratedMethodAccessor139.invoke(java.lang.Object, java.lang.Object[ ])
at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 43)
at java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ]) (line: 498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 426)
at com.sun.proxy.$Proxy110.selectList(java.lang.String, java.lang.Object)
at org.mybatis.spring.SqlSessionTemplate.selectList(java.lang.String, java.lang.Object) (line: 223)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.executeForMany(org.apache.ibatis.session.SqlSession, java.lang.Object[ ]) (line: 177)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(org.apache.ibatis.session.SqlSession, java.lang.Object[ ]) (line: 78)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 96)
at com.sun.proxy.$Proxy141.selectList(com.baomidou.mybatisplus.core.conditions.Wrapper)
at sun.reflect.GeneratedMethodAccessor138.invoke(java.lang.Object, java.lang.Object[ ])
at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 43)
at java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ]) (line: 498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 343)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint() (line: 198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() (line: 163)
at com.baomidou.dynamic.datasource.aop.DynamicDataSourceAnnotationInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) (line: 52)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() (line: 186)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ]) (line: 212)
at com.sun.proxy.$Proxy142.selectList(com.baomidou.mybatisplus.core.conditions.Wrapper)
at com.xiaoying.fundoperation.repository.repository.fundsroute.impl.WaitlistCombinationRepositoryImpl.getCombinationList(java.lang.Long, java.lang.Long, java.util.List, java.util.List, java.util.List) (line: 88)
at com.xiaoying.fundoperation.repository.repository.fundsroute.impl.WaitlistCombinationRepositoryImpl$$FastClassBySpringCGLIB$$594862a6.invoke(int, java.lang.Object, java.lang.Object[ ])
at org.springframework.cglib.proxy.MethodProxy.invoke(java.lang.Object, java.lang.Object[ ]) (line: 218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint() (line: 749)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() (line: 163)
at com.baomidou.dynamic.datasource.aop.DynamicDataSourceAnnotationInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) (line: 52)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() (line: 186)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ], org.springframework.cglib.proxy.MethodProxy) (line: 688)
at com.xiaoying.fundoperation.repository.repository.fundsroute.impl.WaitlistCombinationRepositoryImpl$$EnhancerBySpringCGLIB$$d0399a0f.getCombinationList(java.lang.Long, java.lang.Long, java.util.List, java.util.List, java.util.List)
at com.xiaoying.fundoperation.service.biz.impl.FundOperationServiceImpl.getFundOperationData(java.util.List, java.util.List, com.xiaoying.fundoperation.service.api.dto.quotaview.QuotaViewProjectFieldConfig, com.xiaoying.fundoperation.service.api.dto.quotaview.QuotaViewProductFieldConfig) (line: 217)
at com.xiaoying.fundoperation.center.task.SendEmailJob.handleJavaJob(java.lang.String, java.lang.Integer, java.lang.String, com.vip.saturn.job.SaturnJobExecutionContext) (line: 59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[ ])
at sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[ ]) (line: 43)
at java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ]) (line: 498)
at com.vip.saturn.job.java.SaturnJavaJob$1.internalCall(java.lang.ClassLoader, java.lang.Class) (line: 236)
at com.vip.saturn.job.basic.AbstractSaturnJob$JobBusinessClassMethodCaller.call(java.lang.Object, com.vip.saturn.job.executor.SaturnExecutorService) (line: 205)
at com.vip.saturn.job.java.SaturnJavaJob.handleJavaJob(java.lang.String, java.lang.Integer, java.lang.String, com.vip.saturn.job.basic.SaturnExecutionContext, com.vip.saturn.job.basic.JavaShardingItemCallable) (line: 239)
at com.vip.saturn.job.java.SaturnJavaJob.doExecution(java.lang.String, java.lang.Integer, java.lang.String, com.vip.saturn.job.basic.SaturnExecutionContext, com.vip.saturn.job.basic.JavaShardingItemCallable) (line: 215)
at com.vip.saturn.job.basic.JavaShardingItemCallable.call() (line: 158)
at com.vip.saturn.job.basic.ShardingItemFutureTask.call() (line: 88)
at com.vip.saturn.job.basic.ShardingItemFutureTask.call() (line: 17)
at java.util.concurrent.FutureTask.run() (line: 266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) (line: 1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run() (line: 624)
at java.lang.Thread.run() (line: 748)
堆栈信息这么多,怎么看?
我们直接搜 com.xiaoying 相关的堆栈记录就可以定位到我们的代码了,比较重要的是下面这条堆栈记录
at com.xiaoying.fundoperation.service.biz.impl.FundOperationServiceImpl.getFundOperationData(java.util.List, java.util.List, com.xiaoying.fundoperation.service.api.dto.quotaview.QuotaViewProjectFieldConfig, com.xiaoying.fundoperation.service.api.dto.quotaview.QuotaViewProductFieldConfig) (line: 217)
根据错误记录搜索 com.xiaoying.fundoperation.service.biz.impl.FundOperationServiceImpl#getFundOperationData 定位到 217行,代码如下:
它是查询 t_waitlist_combination 表数据,返回 List ,List 前面分析了是个大对象有 3651811 个元素。初步分析:这是一个查询海量数据的慢查 SQL 导致 OOM。
4.5 从代码行定位到慢 SQL
进入慢SQL平台,根据 OOM 时间范围,初步锁定慢 SQL 语句
SELECT
FuiMatchRequestId AS matchRequestId,FuiFundItemId AS fundItemId
FROM t_waitlist_combination
WHERE (FuiCreateTime BETWEEN 0 AND 1661270399 AND FuiFundItemId IN (5,7,18,25,27,33,34,38,44,46,50,58,82,85,99,100,104,105,106,109,111,114,116,117,120,121,123,125,127,128,130,131,132,133,135,137,138,139,140,141,142,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,186,187,188,189,190,192,193,194,195,196,197,198,199,200,201,202,203,204,205,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,229,230,231,232,233,235,236,237,238,239,240,241,242,243,244,245,247,248,249,250,252,254,255,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278) AND FuiStatus = 4);
执行 select count(1) 看这条 SQL 能查询出多少条记录