1. bcc/ebpf介绍
ebpf是linux trace框架的一部分内容,trace的介绍可以参考linux tracers使用介绍。trace框架允许我们在内核态/用户态的代码中加钩子,并定义了一些预置的钩子函数,实现一些基本的调试功能。而对于需要比较灵活的处理的情况,可以使用ebpf,允许用户自定义钩子函数,进行例如信息的过滤、统计、计算等处理。
bcc是一个工具包,使用python来对ebpf进行封装,以便更加方便的使用ebpf,并内置了很多已经写好的工具,bcc的github地址是:https://github.com/iovisor/bcc。
上图是bcc内置的工具,以及其分布的模块,包含block io的大小、耗时分析,内存的泄露检查,网络的统计等功能,具体可以参考github中的说明。
2. 编译安装
github中的INSTALL.md文件,介绍了编译与安装的过程,我使用的环境是UBUNTU 18.04.1:
Linux xxx 5.4.0-107-generic #121~18.04.1-Ubuntu SMP Thu Mar 24 17:21:33 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
以此为例介绍使用源码编译安装的过程。
2.1 安装依赖
首先安装一下编译和运行的依赖包,期间可能遇到一些错误,请自行百度解决:
# For Bionic (18.04 LTS)
sudo apt-get -y install bison build-essential cmake flex git libedit-dev \
libllvm6.0 llvm-6.0-dev libclang-6.0-dev python zlib1g-dev libelf-dev libfl-dev python3-distutils
2.2 下载源码,编译安装bcc
然后下载源码,其中还有一些submodule,会在执行cmake时下载:
git clone https://github.com/iovisor/bcc.git
git checkout v0.24.0 # 在我使用的这个时间点,最新的代码在我的环境编译会有问题,需要checkout到这个tag上
然后进行bcc的编译:
mkdir bcc/build; cd bcc/build
cmake ..
make
sudo make install
cmake执行时会从github下载submodule,由于国内网络环境原因,可能会有概率下载失败,如果失败建议删除掉重新下载,否则编译时可能出现例如文件找不到的问题。
编译的产物如下,包含了bcc的头文件、lib库、内置的example和tools:
$ tree
.
├── include
│ └── bcc
│ ├── bcc_common.h
│ ├── bcc_elf.h
│ ├── bcc_exception.h
│ ├── bcc_proc.h
│ ├── bcc_syms.h
│ ├── bcc_usdt.h
│ ├── bcc_version.h
│ ├── BPF.h
│ ├── bpf_module.h
│ ├── BPFTable.h
│ ├── compat
│ │ └── linux
│ │ ├── bpf_common.h
│ │ ├── bpf.h
│ │ ├── btf.h
│ │ ├── if_link.h
│ │ ├── if_xdp.h
│ │ ├── netlink.h
│ │ ├── perf_event.h
│ │ ├── pkt_cls.h
│ │ └── pkt_sched.h
│ ├── file_desc.h
│ ├── libbpf.h
│ ├── perf_reader.h
│ ├── table_desc.h
│ └── table_storage.h
├── lib
│ └── x86_64-linux-gnu
│ ├── libbcc.a
│ ├── libbcc_bpf.a
│ ├── libbcc_bpf.so -> libbcc_bpf.so.0
│ ├── libbcc_bpf.so.0 -> libbcc_bpf.so.0.24.0
│ ├── libbcc_bpf.so.0.24.0
│ ├── libbcc-loader-static.a
│ ├── libbcc.so -> libbcc.so.0
│ ├── libbcc.so.0 -> libbcc.so.0.24.0
│ ├── libbcc.so.0.24.0
│ └── pkgconfig
│ └── libbcc.pc
└── share
└── bcc
├── examples
│ ├── hello_world.py
│ ├── lua
│ │ ├── bashreadline.c
│ │ ├── ......
│ ├── networking
│ │ ├── distributed_bridge
│ │ │ ├── main.py
│ │ │ ├── simulation.py -> ../simulation.py
│ │ │ ├── ......
│ └── tracing
│ ├── biolatpcts_example.txt
│ ├── biolatpcts.py
│ ├── ......
├── introspection
│ └── bps
├── man
│ └── man8
│ ├── argdist.8.gz
│ ├── bashreadline.8.gz
│ ├── ......
└── tools
├── argdist
├── bashreadline
├── ......
这些产物可以保留,相似平台上要使用的话,可以直接拷贝过去,免得重新编译。
2.3 编译安装python的module
还是在前面创建的build目录下,执行:
cmake -DPYTHON_CMD=python3 .. # build python3 binding
pushd src/python/
make
sudo make install
popd
可以看到产物如下,安装到了python的module放置的路径下:
Built target bcc_py_python3
Install the project...
-- Install configuration: "Release"
running install
running build
running build_py
running install_lib
copying build/lib/bcc/containers.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/tcp.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/__init__.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/syscall.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/perf.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/usdt.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/libbcc.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/version.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/table.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/utils.py -> /usr/lib/python3/dist-packages/bcc
copying build/lib/bcc/disassembler.py -> /usr/lib/python3/dist-packages/bcc
byte-compiling /usr/lib/python3/dist-packages/bcc/containers.py to containers.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/tcp.py to tcp.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/__init__.py to __init__.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/syscall.py to syscall.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/perf.py to perf.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/usdt.py to usdt.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/libbcc.py to libbcc.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/version.py to version.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/table.py to table.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/utils.py to utils.cpython-38.pyc
byte-compiling /usr/lib/python3/dist-packages/bcc/disassembler.py to disassembler.cpython-38.pyc
running install_egg_info
Removing /usr/lib/python3/dist-packages/bcc-0.24.0_8f40d6f5.egg-info
Writing /usr/lib/python3/dist-packages/bcc-0.24.0_8f40d6f5.egg-info
至此编译安装已经完成。
2.4 内核依赖
ubuntu系统一般是已经带上依赖了,如果需要自行编译内核的,记得加上以下配置:
In general, to use these features, a Linux kernel version 4.1 or newer is required. In addition, the kernel should have been compiled with the following flags set:
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
# [optional, for tc filters]
CONFIG_NET_CLS_BPF=m
# [optional, for tc actions]
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
# [for Linux kernel versions 4.1 through 4.6]
CONFIG_HAVE_BPF_JIT=y
# [for Linux kernel versions 4.7 and later]
CONFIG_HAVE_EBPF_JIT=y
# [optional, for kprobes]
CONFIG_BPF_EVENTS=y
# Need kernel headers through /sys/kernel/kheaders.tar.xz
CONFIG_IKHEADERS=y
There are a few optional kernel flags needed for running bcc networking examples on vanilla kernel:
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_DUMMY=m
CONFIG_VXLAN=m
Kernel compile flags can usually be checked by looking at /proc/config.gz or /boot/config-<kernel-version>.
3. 使用示例
这里我挑选了几个可能比较有用的内置工具介绍一下,当然每个人所需的工具不同,如果不想错过可能有用的工具的话,可以到github上简单浏览一下简要介绍。
3.1 内存泄露
内存泄露使用的是memleak.py文件,作用时统计从运行开始之后,内存的分配与释放情况,每隔一段时间,打印出分配了,但没有被释放的内存,信息包括分配的函数栈以及内存大小、个数。
相比于valgrind、asan等常用的内存泄露检测工具,可能有以下优势:
- 不需要重新编译软件,不需要重启,安装bcc后运行即可检测。
- 比较灵活,有一些内置参数,例如如果cpu比较吃紧,可以指定采样频率以减少开销。有额外需求也可以直接修改脚本进行定制。
- 开销相对valgrind应该会小很多,asan不清楚。
使用说明和示例参考memleak_example.txt,这里简单贴一个打印结果:
# ./memleak -p $(pidof allocs) -a
Attaching to pid 5193, Ctrl+C to quit.
[11:16:33] Top 2 stacks with outstanding allocations:
addr = 948cd0 size = 16
addr = 948d10 size = 16
addr = 948d30 size = 16
addr = 948cf0 size = 16
64 bytes in 4 allocations from stack
main+0x6d [allocs]
__libc_start_main+0xf0 [libc-2.21.so]
[11:16:34] Top 2 stacks with outstanding allocations:
addr = 948d50 size = 16
addr = 948cd0 size = 16
addr = 948d10 size = 16
addr = 948d30 size = 16
addr = 948cf0 size = 16
addr = 948dd0 size = 16
addr = 948d90 size = 16
addr = 948db0 size = 16
addr = 948d70 size = 16
addr = 948df0 size = 16
160 bytes in 10 allocations from stack
main+0x6d [allocs]
__libc_start_main+0xf0 [libc-2.21.so]
需要注意的是,这个脚本统计的是分配了但是没有释放的内存,可以用来作为判断内存泄露的参考信息,而不是说一定是泄露了的内存。
这个脚本可以统计用户态和内核态的内存,统计用户态的内存,原理是在malloc、calloc、posix_memalign等分配内存的函数中加uprobe,挂上脚本中定义的统计函数进行统计。
另外,可能会遇到bpf_probe_read_user的报错,将脚本中的这个函数改为bpf_probe_read可能可以解决。
3.2 缓存命中率统计
使用cachetop.py文件,可以统计每个进程的缓存命中率:
# ./cachetop 5
13:01:01 Buffers MB: 76 / Cached MB: 114 / Sort: HITS / Order: ascending
PID UID CMD HITS MISSES DIRTIES READ_HIT% WRITE_HIT%
1 root systemd 2 0 0 100.0% 0.0%
680 root vminfo 3 4 2 14.3% 42.9%
567 syslog rs:main Q:Reg 10 4 2 57.1% 21.4%
986 root kworker/u2:2 10 2457 4 0.2% 99.5%
988 root kworker/u2:2 10 9 4 31.6% 36.8%
877 vagrant systemd 18 4 2 72.7% 13.6%
983 root python 148 3 143 3.3% 1.3%
981 root strace 419 3 143 65.4% 0.5%
544 messageb dbus-daemon 455 371 454 0.1% 0.4%
243 root jbd2/dm-0-8 457 371 454 0.4% 0.4%
985 root (mount) 560 2457 4 18.4% 81.4%
987 root systemd-udevd 566 9 4 97.7% 1.2%
988 root systemd-cgroups 569 9 4 97.8% 1.2%
986 root modprobe 578 9 4 97.8% 1.2%
287 root systemd-journal 598 371 454 14.9% 0.3%
985 root mount 692 2457 4 21.8% 78.0%
984 vagrant find 9529 2457 4 79.5% 20.5%
原理是在内核cache相关的几个函数上加了kprobe进行统计:
b.attach_kprobe(event="add_to_page_cache_lru", fn_name="do_count")
b.attach_kprobe(event="mark_page_accessed", fn_name="do_count")
b.attach_kprobe(event="mark_buffer_dirty", fn_name="do_count")
# Function account_page_dirtied() is changed to folio_account_dirtied() in 5.15.
if BPF.get_kprobe_functions(b'folio_account_dirtied'):
b.attach_kprobe(event="folio_account_dirtied", fn_name="do_count")
elif BPF.get_kprobe_functions(b'account_page_dirtied'):
b.attach_kprobe(event="account_page_dirtied", fn_name="do_count")
3.3 死锁检测
死锁检测使用脚本deadlock.py,原理是表示锁关系的图,每个锁是一个点,一个线程按照顺序获取锁A和锁B,在图中产生一条A->B的边。如果图中出现了环,就表示可能存在死锁。
示例:
# ./deadlock.py 181
Tracing... Hit Ctrl-C to end.
----------------
Potential Deadlock Detected!
Cycle in lock order graph: Mutex M0 (main::static_mutex3 0x0000000000473c60) => Mutex M1 (0x00007fff6d738400) => Mutex M2 (global_mutex1 0x0000000000473be0) => Mutex M3 (global_mutex2 0x0000000000473c20) => Mutex M0 (main::static_mutex3 0x0000000000473c60)
Mutex M1 (0x00007fff6d738400) acquired here while holding Mutex M0 (main::static_mutex3 0x0000000000473c60) in Thread 357250 (lockinversion):
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402e38 main::{lambda()#3}::operator()() const
@ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()()
@ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Mutex M0 (main::static_mutex3 0x0000000000473c60) previously acquired by the same Thread 357250 (lockinversion) here:
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402e22 main::{lambda()#3}::operator()() const
@ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()()
@ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Mutex M2 (global_mutex1 0x0000000000473be0) acquired here while holding Mutex M1 (0x00007fff6d738400) in Thread 357251 (lockinversion):
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402ea8 main::{lambda()#4}::operator()() const
@ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()()
@ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Mutex M1 (0x00007fff6d738400) previously acquired by the same Thread 357251 (lockinversion) here:
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402e97 main::{lambda()#4}::operator()() const
@ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()()
@ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Mutex M3 (global_mutex2 0x0000000000473c20) acquired here while holding Mutex M2 (global_mutex1 0x0000000000473be0) in Thread 357247 (lockinversion):
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402d5f main::{lambda()#1}::operator()() const
@ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()()
@ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Mutex M2 (global_mutex1 0x0000000000473be0) previously acquired by the same Thread 357247 (lockinversion) here:
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402d4e main::{lambda()#1}::operator()() const
@ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()()
@ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Mutex M0 (main::static_mutex3 0x0000000000473c60) acquired here while holding Mutex M3 (global_mutex2 0x0000000000473c20) in Thread 357248 (lockinversion):
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402dc9 main::{lambda()#2}::operator()() const
@ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()()
@ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Mutex M3 (global_mutex2 0x0000000000473c20) previously acquired by the same Thread 357248 (lockinversion) here:
@ 00000000004024d0 pthread_mutex_lock
@ 0000000000406dd0 std::mutex::lock()
@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
@ 0000000000402db8 main::{lambda()#2}::operator()() const
@ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>)
@ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()()
@ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run()
@ 00007fd4496564e1 execute_native_thread_routine
@ 00007fd449dd57f1 start_thread
@ 00007fd44909746d __clone
Thread 357248 created by Thread 350692 (lockinversion) here:
@ 00007fd449097431 __clone
@ 00007fd449dd5ef5 pthread_create
@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
@ 00000000004033ac std::thread::thread<main::{lambda()#2}>(main::{lambda()#2}&&)
@ 000000000040308f main
@ 00007fd448faa0f6 __libc_start_main
@ 0000000000402ad8 [unknown]
Thread 357250 created by Thread 350692 (lockinversion) here:
@ 00007fd449097431 __clone
@ 00007fd449dd5ef5 pthread_create
@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
@ 00000000004034b2 std::thread::thread<main::{lambda()#3}>(main::{lambda()#3}&&)
@ 00000000004030b9 main
@ 00007fd448faa0f6 __libc_start_main
@ 0000000000402ad8 [unknown]
Thread 357251 created by Thread 350692 (lockinversion) here:
@ 00007fd449097431 __clone
@ 00007fd449dd5ef5 pthread_create
@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
@ 00000000004035b8 std::thread::thread<main::{lambda()#4}>(main::{lambda()#4}&&)
@ 00000000004030e6 main
@ 00007fd448faa0f6 __libc_start_main
@ 0000000000402ad8 [unknown]
Thread 357247 created by Thread 350692 (lockinversion) here:
@ 00007fd449097431 __clone
@ 00007fd449dd5ef5 pthread_create
@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
@ 00000000004032a6 std::thread::thread<main::{lambda()#1}>(main::{lambda()#1}&&)
@ 0000000000403070 main
@ 00007fd448faa0f6 __libc_start_main
@ 0000000000402ad8 [unknown]
This is output from a process that has a potential deadlock involving 4 mutexes
and 4 threads:
- Thread 357250 acquired M1 while holding M0 (edge M0 -> M1)
- Thread 357251 acquired M2 while holding M1 (edge M1 -> M2)
- Thread 357247 acquired M3 while holding M2 (edge M2 -> M3)
- Thread 357248 acquired M0 while holding M3 (edge M3 -> M0)
这个死锁涉及到4个线程,不使用工具去分析还是比较复杂的,但是看最后打印出来的关系就比较清楚了。根据前面打印的栈信息,也很容易能找到代码。
但是这个工具需要在死锁出现之前就运行起来挂上,出现死锁之后再使用这个工具是没有用的。而且由于需要画图、判断环,如果锁和线程比较多,开销可能会比较大。
3.4 bio
bio相关有几个工具:
- biolatency 统计bio的耗时分布情况
# ./biolatency
Tracing block device I/O... Hit Ctrl-C to end.
^C
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 1 | |
128 -> 255 : 12 |******** |
256 -> 511 : 15 |********** |
512 -> 1023 : 43 |******************************* |
1024 -> 2047 : 52 |**************************************|
2048 -> 4095 : 47 |********************************** |
4096 -> 8191 : 52 |**************************************|
8192 -> 16383 : 36 |************************** |
16384 -> 32767 : 15 |********** |
32768 -> 65535 : 2 |* |
65536 -> 131071 : 2 |* |
- biotop 统计每个进程的bio数据量大小
# ./biotop
Tracing... Output every 1 secs. Hit Ctrl-C to end
08:04:11 loadavg: 1.48 0.87 0.45 1/287 14547
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
14501 cksum R 202 1 xvda1 361 28832 3.39
6961 dd R 202 1 xvda1 1628 13024 0.59
13855 dd R 202 1 xvda1 1627 13016 0.59
326 jbd2/xvda1-8 W 202 1 xvda1 3 168 3.00
1880 supervise W 202 1 xvda1 2 8 6.71
1873 supervise W 202 1 xvda1 2 8 2.51
1871 supervise W 202 1 xvda1 2 8 1.57
1876 supervise W 202 1 xvda1 2 8 1.22
1892 supervise W 202 1 xvda1 2 8 0.62
1878 supervise W 202 1 xvda1 2 8 0.78
1886 supervise W 202 1 xvda1 2 8 1.30
1894 supervise W 202 1 xvda1 2 8 3.46
1869 supervise W 202 1 xvda1 2 8 0.73
1888 supervise W 202 1 xvda1 2 8 1.48
- biopattern 统计随机io和顺序io的比例,原理应该是判断前后两次io是否联系
# ./biopattern.py
TIME DISK %RND %SEQ COUNT KBYTES
22:03:51 vdb 0 99 788 3184
22:03:51 Unknown 0 100 4 0
22:03:51 vda 85 14 21 488
[...]
- biosnoop 统计每个bio的进程、大小、耗时等信息,
# ./biosnoop
TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
0.000004 supervise 1950 xvda1 W 13092560 4096 0.74
0.000178 supervise 1950 xvda1 W 13092432 4096 0.61
0.001469 supervise 1956 xvda1 W 13092440 4096 1.24
0.001588 supervise 1956 xvda1 W 13115128 4096 1.09
1.022346 supervise 1950 xvda1 W 13115272 4096 0.98
1.022568 supervise 1950 xvda1 W 13188496 4096 0.93
1.023534 supervise 1956 xvda1 W 13188520 4096 0.79
1.023585 supervise 1956 xvda1 W 13189512 4096 0.60
2.003920 xfsaild/md0 456 xvdc W 62901512 8192 0.23
2.003931 xfsaild/md0 456 xvdb W 62901513 512 0.25
2.004034 xfsaild/md0 456 xvdb W 62901520 8192 0.35
2.004042 xfsaild/md0 456 xvdb W 63542016 4096 0.36
2.004204 kworker/0:3 26040 xvdb W 41950344 65536 0.34
2.044352 supervise 1950 xvda1 W 13192672 4096 0.65
2.044574 supervise 1950 xvda1 W 13189072 4096 0.58
- bitesize 统计每个进程的bio大小的分布情况
# ./bitesize.py
Tracing... Hit Ctrl-C to end.
^C
Process Name = 'kworker/u128:1'
Kbytes : count distribution
0 -> 1 : 1 |******************** |
2 -> 3 : 0 | |
4 -> 7 : 2 |****************************************|
Process Name = 'bitesize.py'
Kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 1 |****************************************|
Process Name = 'dd'
Kbytes : count distribution
0 -> 1 : 3 | |
2 -> 3 : 0 | |
4 -> 7 : 6 | |
8 -> 15 : 0 | |
16 -> 31 : 1 | |
32 -> 63 : 1 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 1 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 488 |****************************************|
3.5 bio火焰图
这个不是bcc内置的工具,参考https://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html#IO。
脚本是使用blk_account_io_start和blk_account_io_completion这两个函数上添加kprobe来进行数据获取的,分别是block io的开始与结束,这个函数名可能在不同版本的内核中会发生变化,因此可能需要对脚本做一些修改,例如我使用的版本上结束的函数名是blk_account_io_done。
使用方法参考链接,结果是这样的: