long long ago,研发提交了一个middleware的测试,要求结合go tool pprof验证单节点和多节点的并发瓶颈,内存,CPU的消耗,我所掌握的性能测试仅限于接口,简单的jmeter,到强大的商业级工具loadrunner,对于golang自带的pprof非常陌生,查了很多很多资料,基本了解了一点go tool pprof的使用,在这里记录并分享下
基本原理:
在了解基本原理前,先了解几个概念:
1.CPU概要文件
在介绍CPU概要文件的生成方法之前,我们先来简单了解一下CPU主频。CPU的主频,即CPU内核工作的时钟频率(CPU ClockSpeed)。CPU的主频的基本单位是赫兹(Hz),但更多的是以兆赫兹(MHz)或吉赫兹(GHz)为单位。时钟频率的倒数即为时钟周期。时钟周期的基本单位为秒(s),但更多的是以毫秒(ms)、微妙(us)或纳秒(ns)为单位。在一个时钟周期内,CPU执行一条运算指令。也就是说,在1000Hz的CPU主频下,每1毫秒可以执行一条CPU运算指令。在1 MHz的CPU主频下,每1微妙可以执行一条CPU运算指令。而在1GHz的CPU主频下,每1纳秒可以执行一条CPU运算指令。在默认情况下,Go语言的运行时系统会以100Hz的的频率对CPU使用情况进行取样。也就是说每秒取样100次,即每10毫秒会取样一次。为什么使用这个频率呢?因为100Hz既足够产生有用的数据,又不至于让系统产生停顿。并且100这个数上也很容易做换算,比如把总取样计数换算为每秒的取样数。实际上,这里所说的对CPU使用情况的取样就是对当前的Goroutine的堆栈上的程序计数器的取样。由此,我们就可以从样本记录中分析出哪些代码是计算时间最长或者说最耗CPU资源的部分了。我们可以通过以下代码启动对CPU使用情况的记录。
- 内存概要文件
内存概要文件用于保存在用户程序执行期间的内存使用情况。这里所说的内存使用情况,其实就是程序运行过程中堆内存的分配情况。Go语言运行时系统会对用户程序运行期间的所有的堆内存分配进行记录。不论在取样的那一时刻、堆内存已用字节数是否有增长,只要有字节被分配且数量足够,分析器就会对其进行取样
- 程序阻塞概要文件
程序阻塞概要文件用于保存用户程序中的Goroutine阻塞事件的记录。
所以基本的原理就是golang的runtime会产生运行的数据,runtime会记录Goroutine占用内存和阻塞的取样文件,我们只需将这些数据写入到我们指定的文件即可,
代码实例:
import(
"runtime/pprof"
)
func main() {
cache.Init()
/* 单节点任务队列 sJob demo */
// 初始化工作池 goroutine 个数
// worker工作携程数量,capacity池子容量,超出部分阻塞
s := sJob.Init(200, 65535, 5)
//go sJobTest()
connPool := cache.GetPool()
funcMap := make(map[string]func(interface{}))
funcMap[mJob.FunctionName(UserFunction)] = UserFunction
funcMap[mJob.FunctionName(UserDBFunction)] = UserDBFunction
startCPUProfile() //mouniter cpu
startMemProfile() //mouniter memeery
startBlockProfile() //mouniter blockrountine
m, _ := mJob.Init(connPool, "SEARCH", 1000, 65535, funcMap)
defer stopBlockProfile()
defer stopMemProfile()
defer stopCPUProfile()
// go mJobTest()
gin.SetMode(gin.ReleaseMode)
router := gin.Default()
router.GET("/sjob/test", sJobTest)
router.GET("/mjob/test", mJobTest)
//router.Run(":9111")
go genServer(router, "9111")
//循环打印输出当前进程的Goroutine 个数
go func() {
for {
fmt.Printf("runtime goroutine number: %d, sjobs: %d, mjobs: %d\n", runtime.NumGoroutine(), len(s.JobQueue), len(m.JobQueue))
fmt.Println("###########################PUT mjob count: " + strconv.FormatInt(m.GetNum(), 10))
fmt.Println("###########################PUT sjob count: " + strconv.FormatInt(s.GetNum(), 10))
time.Sleep(3 * time.Second)
}
}()
destroy()
}
func genServer(router *gin.Engine, port string) {
//_ = router.Run(":" + port)
server := &http.Server{
Addr: ":" + port,
Handler: router,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
}
fmt.Printf("listening port: %s\n", port)
_ = server.ListenAndServe()
}
func destroy() {
sigs := make(chan os.Signal, 1)
done := make(chan bool, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-sigs
done <- true
}()
<-done
fmt.Println("退出程序.... ")
db.DBClose()
time.Sleep(1 * time.Second)
}
var cpuProfile = flag.String("cpuProfile", "pprof/cpuProfile", "write cpu profile")
var memProfile = flag.String("memProfile", "pprof/memProfile", "write memery profile")
var memProfileRate = flag.Int("memProfileRate", 512*1024, "default mem")
var blockProfile = flag.String("blockProfile", "pprof/blockProfile", "write memery profile")
var blockProfileRate = flag.Int("blockProfileRate", 1, "default mem")
func startCPUProfile() {
flag.Parse()
//终端用 go tool pprof mjob.prof查看文件内容
if *cpuProfile != "" {
CPUf, err := os.Create(*cpuProfile)
if err != nil {
fmt.Fprintf(os.Stderr, "Can not create cpu profile output file: %s",
err)
}
if err := pprof.StartCPUProfile(CPUf); err != nil {
fmt.Fprintf(os.Stderr, "Can not start cpu profile: %s", err)
}
}
}
func stopCPUProfile() {
flag.Parse()
if *cpuProfile != "" {
pprof.StopCPUProfile() // 把记录的概要信息写到已指定的文件
}
}
func startMemProfile() {
flag.Parse()
if *memProfile != "" && *memProfileRate > 0 {
runtime.MemProfileRate = *memProfileRate
}
}
func stopMemProfile() {
flag.Parse()
if *memProfile != "" {
f, err := os.Create(*memProfile)
if err != nil {
fmt.Fprintf(os.Stderr, "Can not create mem profile output file: %s", err)
return
}
if err = pprof.WriteHeapProfile(f); err != nil {
fmt.Fprintf(os.Stderr, "Can not write %s: %s", *memProfile, err)
}
f.Close()
}
}
func startBlockProfile() {
flag.Parse()
if *blockProfile != "" && *blockProfileRate > 0 {
runtime.SetBlockProfileRate(*blockProfileRate)
}
}
func stopBlockProfile() {
flag.Parse()
if *blockProfile != "" && *blockProfileRate >= 0 {
f, err := os.Create(*blockProfile)
if err != nil {
fmt.Fprintf(os.Stderr, "Can not create block profile output file: %s", err)
return
}
if err = pprof.Lookup("block").WriteTo(f, 0); err != nil {
fmt.Fprintf(os.Stderr, "Can not write %s: %s", *blockProfile, err)
}
f.Close()
}
}
详细分析:
1.开始cpu取样:
func startCPUProfile() {
flag.Parse()
//终端用 go tool pprof mjob.prof查看文件内容
if *cpuProfile != "" {
CPUf, err := os.Create(*cpuProfile)
if err != nil {
fmt.Fprintf(os.Stderr, "Can not create cpu profile output file: %s",
err)
}
if err := pprof.StartCPUProfile(CPUf); err != nil {
fmt.Fprintf(os.Stderr, "Can not start cpu profile: %s", err)
}
}
}
2.结束cpu取样:
func stopCPUProfile() {
flag.Parse()
if *cpuProfile != "" {
pprof.StopCPUProfile() // 把记录的概要信息写到已指定的文件
}
}
3.开始MEM取样:
func startMemProfile() {
flag.Parse()
if *memProfile != "" && *memProfileRate > 0 {
runtime.MemProfileRate = *memProfileRate
}
}
//memProfile文件路径
//memProfileRate取样间隔
4.停止MEM取样:
func stopMemProfile() {
flag.Parse()
if *memProfile != "" {
f, err := os.Create(*memProfile)
if err != nil {
fmt.Fprintf(os.Stderr, "Can not create mem profile output file: %s", err)
return
}
if err = pprof.WriteHeapProfile(f); err != nil {
fmt.Fprintf(os.Stderr, "Can not write %s: %s", *memProfile, err)
}
f.Close()
}
}
//WriteHeapProfile()方法将heap的profil信息写到指定file
5.开始协程阻塞取样:
func startBlockProfile() {
flag.Parse()
if *blockProfile != "" && *blockProfileRate > 0 {
runtime.SetBlockProfileRate(*blockProfileRate)
}
}
6.结束协程阻塞取样
func stopBlockProfile() {
flag.Parse()
if *blockProfile != "" && *blockProfileRate >= 0 {
f, err := os.Create(*blockProfile)
if err != nil {
fmt.Fprintf(os.Stderr, "Can not create block profile output file: %s", err)
return
}
if err = pprof.Lookup("block").WriteTo(f, 0); err != nil {
fmt.Fprintf(os.Stderr, "Can not write %s: %s", *blockProfile, err)
}
f.Close()
}
}
7.指定三个文件路径和取样间隔:(这里需要用到falg解析字符串和int的方式,不然读不到)
var cpuProfile = flag.String("cpuProfile", "pprof/cpuProfile", "write cpu profile")
var memProfile = flag.String("memProfile", "pprof/memProfile", "write memery profile")
var memProfileRate = flag.Int("memProfileRate", 512*1024, "default mem")
var blockProfile = flag.String("blockProfile", "pprof/blockProfile", "write memery profile")
var blockProfileRate = flag.Int("blockProfileRate", 1, "default mem")
8.开始取样:(在你要监控的代码快前面)
startCPUProfile() //mouniter cpu
startMemProfile() //mouniter memeery
startBlockProfile() //mouniter blockrountine
9.结束取样:
stopBlockProfile()
stopMemProfile()
stopCPUProfile()
10.查看profile文件
cd /文件路径
go tool pprof cpuprofile
hp@hp-HP-ProDesk-600-G3-SFF:~/Mygo/src/jobs/pprof$ go tool pprof cpuProfile
Local symbolization failed for demo: open /tmp/go-build716442338/command-line-arguments/_obj/exe/demo: no such file or directory
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
File: demo
Type: cpu
Time: Dec 31, 2020 at 3:51pm (CST)
Duration: 7.91s, Total samples = 2.36s (29.84%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 1650ms, 69.92% of 2360ms total
Dropped 72 nodes (cum <= 11.80ms)
Showing top 10 nodes out of 105
flat flat% sum% cum cum%
710ms 30.08% 30.08% 740ms 31.36% syscall.Syscall /usr/local/go/src/syscall/asm_linux_amd64.s
230ms 9.75% 39.83% 230ms 9.75% runtime.futex /usr/local/go/src/runtime/sys_linux_amd64.s
170ms 7.20% 47.03% 170ms 7.20% runtime.epollctl /usr/local/go/src/runtime/sys_linux_amd64.s
140ms 5.93% 52.97% 140ms 5.93% runtime.epollwait /usr/local/go/src/runtime/sys_linux_amd64.s
110ms 4.66% 57.63% 110ms 4.66% syscall.RawSyscall /usr/local/go/src/syscall/asm_linux_amd64.s
100ms 4.24% 61.86% 100ms 4.24% runtime.usleep /usr/local/go/src/runtime/sys_linux_amd64.s
80ms 3.39% 65.25% 180ms 7.63% syscall.Syscall6 /usr/local/go/src/syscall/asm_linux_amd64.s
40ms 1.69% 66.95% 40ms 1.69% runtime.heapBitsSetType /usr/local/go/src/runtime/mbitmap.go
40ms 1.69% 68.64% 40ms 1.69% runtime.nextFreeFast /usr/local/go/src/runtime/malloc.go
30ms 1.27% 69.92% 30ms 1.27% runtime._ExternalCode /usr/local/go/src/runtime/proc.go
flat: 是指该函数执行耗时, 程序总耗时 570ms, main.NewNode 的 200ms 占了 35.09%
sum: 当前函数与排在它上面的其他函数的 flat 占比总和, 比如 35.09% + 12.28% = 47.37%
cum: 是指该函数加上在该函数调用之前累计的总耗时, 这个看图片格式的话会更清晰一些.
go tool pprof memProfile
Type: inuse_space
Time: Sep 24, 2020 at 6:33pm (CST)
Showing nodes accounting for 28183.67kB, 100% of 28183.67kB total
(pprof) top10
Showing nodes accounting for 2804.22kB, 100% of 2804.22kB total
flat flat% sum% cum cum%
1184.27kB 42.23% 42.23% 1184.27kB 42.23% runtime/pprof.StartCPUProfile /usr/local/go/src/runtime/pprof/pprof.go
809.97kB 28.88% 71.12% 809.97kB 28.88% jobs/mJob.NewWorkerPool /home/hp/Mygo/src/jobs/mJob/worker.go (inline)
809.97kB 28.88% 100% 809.97kB 28.88% jobs/sJob.NewWorkerPool /home/hp/Mygo/src/jobs/sJob/worker.go (inline)
0 0% 100% 809.97kB 28.88% jobs/mJob.Init /home/hp/Mygo/src/jobs/mJob/worker.go
0 0% 100% 809.97kB 28.88% jobs/sJob.Init /home/hp/Mygo/src/jobs/sJob/worker.go
0 0% 100% 2804.22kB 100% main.main /home/hp/Mygo/src/jobs/demo.go
0 0% 100% 1184.27kB 42.23% main.startCPUProfile /home/hp/Mygo/src/jobs/demo.go
0 0% 100% 2804.22kB 100% runtime.main /usr/local/go/src/runtime/proc.go
还可以借助其他工具生成可视化的图,后续在深入研究,欢迎小伙伴积极讨论
参考:https://studygolang.com/articles/12100?fr=sidebar