上一篇 我们讲了基本的剖析工具的用法,这一篇我们来介绍一种新的工具。
运行时加入GODEBUG=gctrace=1,可以在标准错误中输出gc信息,如图:
我们来解析一下一些主要的含义,下面是golang官方runtime的说明(具体参考https://studygolang.com/articles/9641):
gctrace: setting gctrace=1 causes the garbage collector to emit a single line to standard
error at each collection, summarizing the amount of memory collected and the
length of the pause. Setting gctrace=2 emits the same summary but also
repeats each collection. The format of this line is subject to change.
Currently, it is:
gc # @#s #%: #+#+# ms clock, #+#/#/#+# ms cpu, #->#-># MB, # MB goal, # P
where the fields are as follows:
gc # the GC number, incremented at each GC
@#s time in seconds since program start
#% percentage of time spent in GC since program start
#+...+# wall-clock/CPU times for the phases of the GC
#->#-># MB heap size at GC start, at GC end, and live heap
# MB goal goal heap size
# P number of processors used
The phases are stop-the-world (STW) sweep termination, concurrent
mark and scan, and STW mark termination. The CPU times
for mark/scan are broken down in to assist time (GC performed in
line with allocation), background GC time, and idle GC time.
If the line ends with "(forced)", this GC was forced by a
runtime.GC() call and all phases are STW.
Setting gctrace to any value > 0 also causes the garbage collector
to emit a summary when memory is released back to the system.
This process of returning memory to the system is called scavenging.
The format of this summary is subject to change.
Currently it is:
scvg#: # MB released printed only if non-zero
scvg#: inuse: # idle: # sys: # released: # consumed: # (MB)
where the fields are as follows:
scvg# the scavenge cycle number, incremented at each scavenge
inuse: # MB used or partially used spans
idle: # MB spans pending scavenging
sys: # MB mapped from the system
released: # MB released to the system
consumed: # MB allocated from the system
实例如下:
服务停止后内存并没有下降的原因分析:
首先是上一节的工具获取内存信息,curl http://10.10.74.7:8601/debug/pprof/heap?debug=1 >1.txt
每个条目开头的数字("1: 4653056 [1: 4653056]")分别表示目前存活的对象,存活对象占据的内存, 分配对象的个数和所有分配对象占据的内存总量。
之后便是这些内存都是在何处分配的具体堆栈。这次我们找到的是 handleNewConnection的 83行,代码如下:
一眼看上去并没有什么问题,但是问题就是在这里。原来ip 是string类型,我们有60w连接,基友60w个ip小对象,这一段引起大量的内存不能正常gc。将ip直接使用session.hostname修改该问题,以减少内存的创建。修改完成后,内存下降的比较正常。