排查 DotNET Core 内存暴涨问题

0. 问题

新版本上线之后,发现内存猛涨,入站流量猛增,不清楚具体原因,部分接口提示 OOM 异常,随后 Pod 直接崩溃无限重启。

1. 准备

Pod 已经接入了 NewRelic 和 Graylog,但是仍然没有办法找到真正的罪魁祸手,此时只能进入 Pod 容器当中抓取内存 Dump 信息。我们容器的基础镜像是基于 Apline-3.18 的,进入容器之后执行了以下命令开始安装相应的工具。

# 我们的镜像是基于 runtime 的,因此需要手动安装一下 SDK,以便后续操作。
# 这里还安装了 bash,后续会使用 bash 进行交互操作,自带的 sh 不好用。
apk add dotnet6-sdk bash
# 安装 Dump 工具
dotnet tool install --global dotnet-dump

因为容器的 ENTRYPOINT 就是直接运行的 dotNET 程序,一般来说其 PID 都是 1,如果你不清楚具体的进程 ID,可以执行

尝试运行 dotnet-dump collect -p 1 收集 Dump 信息,但是得到了以下错误:

/build# dotnet-dump collect -p 1

Writing full to /build/core_20240307_090401
Write dump failed - HRESULT: 0x00000000.

搜索一番之后,得知这是 Pod 没有足够的权限去执行 Dump 操作,因此修改了 Rollouts(或者 Deplotment) 的 YAML 定义,添加对应的 securityContext 应用即可,随后便能够正确地获取 Dump 文件。

securityContext:
  capabilities:
    add:
    - SYS_PTRACE
    - SYS_ADMIN
  seccompProfile:
    type: RuntimeDefault

再次执行 dotnet-dump collect -p 1 获取到了对应的 Dump 文件,将文件拷贝到挂载的 NFS 卷当中,随即下载到本地以便进行调试排查问题。

2. 调查

得到 Dump 文件之后,我们可以使用多种工具来分析 Dump 文件,这里我使用的是 dotnet-dump 命令。因为我是 macOS 的机器,使用 dotnet-dump 我可以直接开始进行分析,你也可以使用 Visual Studio 、dotnetMemory、WinDBG 来打开 Dump 的文件,具体看你的喜好了。

使用 dotnet dump analyze <dump file path> 进入交互式页面:

Loading core dump: D:\dotNET_Dumps\\core_20240307_142201 ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.

首先我们可以看一下目前 GC 堆的信息:

> eeheap -gc

========================================
Number of GC Heaps: 3
----------------------------------------
Heap 0 (00007faa2a73b6b0)
generation 0 starts at 7fa2495932e8
generation 1 starts at 7fa2458279f0
generation 2 starts at 7fa232703000
ephemeral segment allocation context: none
Small object heap
         segment            begin        allocated        committed allocated size         committed size
    7fa232702000     7fa232703000     7fa249be4020     7fa252174000 0x174e1020 (390991904) 0x1fa72000 (531046400)
Large object heap starts at 7fa3b2703000
         segment            begin        allocated        committed allocated size         committed size
    7fa3b2702000     7fa3b2703000     7fa3e3dfc348     7fa3e3dfd000 0x316f9348 (829395784) 0x316fb000 (829403136)
Pinned object heap starts at 7fa6b2703000
         segment            begin        allocated        committed allocated size         committed size
    7fa6b2702000     7fa6b2703000     7fa6b27d4bb8     7fa6b27d5000 0xd1bb8 (859064)       0xd3000 (864256)
------------------------------
Heap 1 (00007faa2a68b6e0)
generation 0 starts at 7fa2c75ae080
generation 1 starts at 7fa2c40eec00
generation 2 starts at 7fa2b2703000
ephemeral segment allocation context: none
Small object heap
         segment            begin        allocated        committed allocated size         committed size
    7fa2b2702000     7fa2b2703000     7fa2c9b1ebb0     7fa2d00b8000 0x1741bbb0 (390183856) 0x1d9b6000 (496721920)
Large object heap starts at 7fa4b2703000
         segment            begin        allocated        committed allocated size         committed size
    7fa4b2702000     7fa4b2703000     7fa4e3f804f0     7fa4e3f81000 0x3187d4f0 (830985456) 0x3187f000 (830992384)
Pinned object heap starts at 7fa7b2703000
         segment            begin        allocated        committed allocated size         committed size
    7fa7b2702000     7fa7b2703000     7fa7b2703018     7fa7b2704000 0x18 (24)              0x2000 (8192)
------------------------------
Heap 2 (00007faa2a5db720)
generation 0 starts at 7fa3466d0298
generation 1 starts at 7fa343173ee0
generation 2 starts at 7fa332703000
ephemeral segment allocation context: none
Small object heap
         segment            begin        allocated        committed allocated size         committed size
    7fa332702000     7fa332703000     7fa348631878     7fa34f736000 0x15f2e878 (368240760) 0x1d034000 (486752256)
Large object heap starts at 7fa5b2703000
         segment            begin        allocated        committed allocated size         committed size
    7fa5b2702000     7fa5b2703000     7fa5e519c3b0     7fa5e519d000 0x32a993b0 (849974192) 0x32a9b000 (849981440)
Pinned object heap starts at 7fa8b2703000
         segment            begin        allocated        committed allocated size         committed size
    7fa8b2702000     7fa8b2703000     7fa8b270c0f0     7fa8b2714000 0x90f0 (37104)         0x12000 (73728)
------------------------------
GC Allocated Heap Size:    Size: 0xda315cf0 (3660668144) bytes.
GC Committed Heap Size:    Size: 0xeff58000 (4025843712) bytes.

可以看到有 3 个 GC 堆,并且大部分内存占用都在 LOH 上,我们使用 dumpheap -stat -min 85000 搜索一下大小大于 85000 字节的对象有多少?

> dumpheap -stat -min 85000
Statistics:
          MT Count     TotalSize Class Name
7fa9b9be29c0     1        85,112 Serilog.Events.LogEventPropertyValue[]
7fa9ba87d710     1       117,464 Microsoft.AspNetCore.Routing.Matching.DfaState[]
7fa9b327b110     2       261,648 System.Object[]
7fa9b3348080     2       849,380 System.Int32[]
7fa9bb1e29f8     5     1,441,912 ***.Core.***.*************[]
7fa9b334d2e0     6     1,939,370 System.String
7fa9bb3589a0     1     2,097,176 ***.Core.***.***.***[]
7fa9b5200528     9     2,228,440 ***.Core.***.***[]
7fa9b5206200    20     3,670,496 ***.Core.***.***[]
7fa9bb3625e8     1     4,506,048 System.Collections.Generic.Dictionary<System.String, ***.***.***.***.***>+Entry[]
7fa9b338edd0    20     9,716,748 System.Char[]
7faa2cb14350    76    13,295,160 Free
7fa9b3d60c98 1,100 2,464,160,840 System.Byte[]
Total 1,244 objects, 2,504,369,794 bytes

可以看到这里面有 1100 个对象的大小都超过了 85000 字节,总共加起来快 2.3GB 了,所以问题出在这里。随后使用 dumpheap -type System.Byte[] 查看这些具体的对象列表,以便得到具体对象的地址:

    7fa5d5175480     7fa9b3d60c98     18,749,311
    7fa5d6356c20     7fa9b3d60c98      6,734,857
    7fa5d69c3050     7fa9b3d60c98        878,704
    7fa5d6a998e0     7fa9b3d60c98        174,565
    7fa5d6ad21c0     7fa9b3d60c98     18,749,311
    7fa5d7cb3960     7fa9b3d60c98      6,734,857
    7fa5d831fd90     7fa9b3d60c98     10,670,254
    7fa5d8d4ce60     7fa9b3d60c98     10,670,254
    7fa5d9779f30     7fa9b3d60c98     18,749,311
    7fa5da95b6d0     7fa9b3d60c98     18,749,311
    7fa5dbb3ce70     7fa9b3d60c98      1,931,776
    7fa5dbd6b8e0     7fa9b3d60c98      6,842,488
    7fa5dc3f2178     7fa9b3d60c98      7,773,830
    7fa5dcb5c020     7fa9b3d60c98      7,773,830
    7fa5dd2c5ec8     7fa9b3d60c98      7,773,830
    7fa5dda2fd70     7fa9b3d60c98     12,585,235
    7fa5de6306a8     7fa9b3d60c98      1,889,260
    7fa5de7fdab8     7fa9b3d60c98      1,172,106
    7fa5de91bd68     7fa9b3d60c98        134,508
    7fa5de94dff8     7fa9b3d60c98      8,857,584
    7fa5df1c0808     7fa9b3d60c98      6,842,488
    7fa5df8470a0     7fa9b3d60c98      6,842,488
    7fa5dfecd938     7fa9b3d60c98      6,842,488
    7fa5e05541d0     7fa9b3d60c98      8,857,584
    7fa5e0dc69e0     7fa9b3d60c98      7,773,449
    7fa5e1530710     7fa9b3d60c98      7,773,449
    7fa5e1c9a440     7fa9b3d60c98        980,321
    7fa5e1d899c8     7fa9b3d60c98      1,052,316
    7fa5e1e8a888     7fa9b3d60c98      1,052,316
    7fa5e1f8b748     7fa9b3d60c98      7,373,509
    7fa5e2693a30     7fa9b3d60c98      7
  • 23
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

!chen

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值