mysql top消耗_perf-让CPU消耗无处遁形

最新推荐文章于 2023-05-01 11:32:06 发布

高冷小笼包

最新推荐文章于 2023-05-01 11:32:06 发布

阅读量229

点赞数

文章标签： mysql top消耗

本文链接：https://blog.csdn.net/weixin_42252435/article/details/113288068

版权

前言

前几天我在看一篇公众号文章文章写得很棒，分析地也很彻底，但是更吸引我的是文中的几张图，例如

区别于我们平时的perf top所看到的，在这张图里能看到函数嵌套的调用关系以及子函数的CPU占比。

又例如：

这里能看到整个函数调用链和各自的CPU占比，这对于我们进行性能诊断的时候来确定具体是哪个函数造成的无疑是非常有帮助的，一眼就能诊断出问题出在哪里。

再比如这张火焰图

简直太炫酷了。

那么今天我们就来看下这些效果都是怎么实现的。

一、perf top

我们通过man perf top查看perf top的使用帮助，我们注意到有一个-g选项，大意就是能够追踪链路调用

-g

Enables call-graph (stack chain/backtrace) recording

我们来测试一把(13499是我的mysqld pid)

perf top -p 13499 -v -g

通过上下键选择某个函数，按回车就可以展开，是不是很方便呢。

二、调用链

要生成如前文所说的调用链的话，首先需要使用perf record记录采样数据保存在文件中，然后使用perf report进行分析生成报告输出

perf record常用选项-e record指定PMU事件

--filter event事件过滤器

-a 录取所有CPU的事件

-p 录取指定pid进程的事件

-o 指定录取保存数据的文件名

-g 使能函数调用图功能

-C 录取指定CPU的事件

实际测试

perf record -F 99 -a -g -p 13499 -- sleep 60

采集60秒以后会保存在文件perf.data中，然后使用perf report工具进行分析

perf report --stdio

生成函数调用链图，比如如下图(当时我正在用sysbench准备数据)

# Children Self Command Shared Object Symbol

# ........ ........ ....... .................. ...........................................................................................................................

95.42% 0.00% mysqld libpthread-2.17.so [.] start_thread

---start_thread

pfs_spawn_thread

handle_connection

do_command

dispatch_command

mysql_parse

mysql_execute_command

Sql_cmd_create_or_drop_index_base::execute

mysql_alter_table

mysql_inplace_alter_table

ha_innobase::inplace_alter_table

ha_innobase::inplace_alter_table_impl

row_merge_build_indexes

--95.36%--row_merge_read_clustered_index

|--39.67%--row_merge_buf_sort

| |

| --39.48%--row_merge_tuple_sort

| |

| |--33.11%--row_merge_tuple_sort

| | |

| | |--27.10%--row_merge_tuple_sort

| | | |

| | | |--22.09%--row_merge_tuple_sort

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

| | | | | | | | | |

| | | | | | | | | |

| | | | | | | | | | |

| | | | | | | | | | | |

| | | | | | | | | | | | |

| | | | | | | | | | | | |

| | | | | | | | | | | | |

| | | | | | | | | | | | |

| | | | | | | | | | | | | |

| | | | | | | | | | | | | |

| | | | | | | | | | | | | | |

| | | | | | | | | | | | | | |

| | | | | | | | | | | | | | |

| | | | | | | | | | | | | |

| | | | | | | | | | | | |

| | | | | | | | | | | |

| | | | | | | | | | | |

| | | | | | | | | | |

| | | | | | | | | | |

| | | | | | | | | |

| | | | | | | | | |

| | | | | | | | |

| | | | | | | |

| | | | | | |

| | | | | | |

| | | | | |

| | | | |

| | | | --3.28%--row_merge_tuple_cmp

| | | | |

| | | | --0.80%--cmp_data_data

| | | |

| | | --4.83%--row_merge_tuple_cmp

| | | |

| | | --0.80%--cmp_data_data

| | |

| | --5.75%--row_merge_tuple_cmp

| | |

| | --0.62%--cmp_data_data

| |

| --6.13%--row_merge_tuple_cmp

| |

| --0.56%--cmp_data_data

|--34.96%--buf_page_get_gen

| |

| --34.90%--Buf_fetch::single_page

| |

| |--29.08%--buf_read_ahead_linear

| | |

| | --28.96%--buf_read_page_low

| | |

| | |--27.04%--fil_io

| | | |

| | | --26.92%--Fil_shard::do_io

| | | |

| | | --26.30%--os_aio_func

| | | |

| | | |--25.62%--io_submit

| | | | |

| | | | --25.56%--system_call_fastpath

| | | | |

| | | | --25.50%--sys_io_submit

| | | | |

| | | | --25.37%--do_io_submit

| | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | | |

| | | | |

| | | | --1.92%--ext4_file_read

| | | | generic_file_aio_read

| | | | |

| | | | --1.67%--ext4_direct_IO

| | | | ext4_ind_direct_IO

| | | | __blockdev_direct_IO

| | | | |

| | | | --1.61%--do_blockdev_direct_IO

| | | | |

| | | | --0.62%--submit_bio

| | | | generic_make_request

| | | | |

| | | | --0.56%--blk_mq_make_request

| | | |

| | | --0.56%--AIO::reserve_slot

| | |

| | --1.79%--buf_page_init_for_read

| | |

| | --1.49%--buf_LRU_get_free_block

| | |

| | --1.18%--buf_LRU_scan_and_free_block

| | |

| | --0.99%--buf_LRU_free_page

| |

| |--3.28%--0x7ff1f636be5d

| | |

| | --2.97%--system_call_fastpath

| | |

| | --2.78%--sys_nanosleep

| | |

| | --2.35%--hrtimer_nanosleep

| | |

| | --2.17%--do_nanosleep

| | |

| | |--1.30%--schedule

| | | |

| | | --1.24%--__schedule

| | | |

| | | --0.99%--finish_task_switch

| | |

| | --0.62%--hrtimer_start_range_ns

| | |

| | --0.56%--_raw_spin_unlock_irqrestore

| |

| |--0.93%--__libc_nanosleep

| |

| --0.68%--buf_wait_for_read

|--8.66%--row_merge_buf_write

| |

| --7.67%--rec_get_converted_size_temp

|--2.54%--row_build_low

|--2.48%--row_merge_buf_add

|--2.23%--rec_get_offsets_func

| |

| --1.61%--rec_init_offsets

|--1.30%--dict_table_copy_types

--0.68%--__memset_sse2三、火焰图

如何生成火焰图呢？这里介绍一个工具，叫做FlameGraph

git clone https://github.com/brendangregg/FlameGraph

里面其实就是一堆perl脚本

用perf script工具对perf.data(第二步使用perf record采集到的数据)进行解析

perf script -i perf.data &> perf.unfold

将perf.unfold中的符号进行折叠：

./stackcollapse-perf.pl perf.unfold &> perf.folded

最后生成svg图：

./flamegraph.pl perf.folded > perf.svg

如下所示：

总结：到这里主要的技巧已经介绍完了，具体深入的用法也还是需要读者们深入研究的，我这里只是给出了简单的用法

掌握了这些技巧以后，相信对以后的性能问题诊断会有巨大的帮助，但是手工处理可能还是比较麻烦的，大家可以考虑集成到自动化中使用起来更加方便

另外需要注意的是线上千万不要随便使用perf，可能会造成性能下降，慎用！参考文档：http://www.brendangregg.com/perf.html

https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/

https://www.ibm.com/developerworks/cn/linux/l-cn-perf2/

高冷小笼包

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫