《性能之巅第2版》阅读笔记(二)--性能观察工具

本文链接：https://blog.csdn.net/qq_23662505/article/details/125427689

《System Performance: Enterprise and the Cloud, 2nd Edition (2020)》阅读笔记简要记录

第四章观察工具

4. 观察工具

4.1 tool coverage

工具功能总览图：
请添加图片描述

4.1.1 static performance tools

请添加图片描述

4.1.2 crisis tools

4.2 tools type

请添加图片描述

4.2.1 计数器类型

Kernels maintain various counters for providing system statistics. They are usually implemented
as unsigned integers that are incremented when events occur.

系统级别的工具


vmstat	Virtual and physical memory statistics, system-wide
mpstat	Per-CPU usage
iostat	Per-disk I/O usage, reported from the block device interface
nstat	TCP/IP stack statistics
sar	Various statistics; can also archive them for historical reporting

进程级别的工具


ps	Shows process status, shows various process statistics, including memory and CPU usage
top	Shows top processes, sorted by CPU usage or another statistic.
pmap	Lists process memory segments with usage statistics

4.2.2 分析类型

系统级别


perf	The standard Linux profiler, which includes profiling subcommands
profile	A BPF-based CPU profiler from the BCC repository (covered in Chapter 15, BPF) that frequency counts stack traces in kernel context
Intel VTune Amplifier XE	Linux and Windows profiling, with a graphical interface including source browsing

进程级别


gprof	The GNU profiling tool, which analyzes profiling information added by compilers (e.g., gcc -pg).
cachegrind	A tool from the valgrind toolkit, can profile hardware cache usage (and more) and visualize profiles using kcachegrind
Java Flight Recorder（JER）	Programming languages often have their own special-purpose profilers that can inspect language context. For example, JFR for Java

4.2.3 追踪工具

系统级别


tcpdump	抓包工具
biosnoop	Block I/O tracing (uses BCC or bpftrace)
execsnoop	New processes tracing (uses BCC or bpftrace)
perf	The standard Linux profiler, can also trace events
perf trace	A special perf subcommand that traces system calls system-wide
ftrace	The Linux built-in tracer
BCC	A BPF-based tracing library and toolkit
bpftrace	A BPF-based tracer (bpftrace(8)) and toolkit

进程级别


strace	System call tracing
gdb	A source-level debugger

4.2.4 监控

monitor工具一般记录保存statistics，以便分析使用。


sar	Collect, report, or save system activity information
snmp	Devices and operating systems can support SNMP and in some cases provide it by default, avoiding the need to install third-party agents or exporters
agents

4.3 观察的资源

linux可供观测的资源，最主要的来源就是/proc和/sys两个目录。

linux跟踪资源汇总
请添加图片描述

4.3.1 /proc文件系统

/proc is dynamically created by the kernel and is not backed by storage devices (it runs inmemory). It is mostly read-only, providing statistics for observability tools. Some files are writeable, for controlling process and kernel behavior.

进程级别的statistics

请添加图片描述


limits	实际资源限制
maps	映射内存区域
sched	CPU调度器的统计数据
schedstat	获取到CPU运行时间、延时和时间片（runtime、latency、time slice）
smaps	映射内存区域的使用统计
stat	进程状态和统计数据，包括总体CPU和内存使用情况
statm	以page为单位的内存使用统计
status	stat和statm的信息，用户可读
fd	（打开的）文件符号链接目录
cgroup	cgroup组员信息
task	每个线程的详细数据

系统相关的statistics
请添加图片描述


cpuinfo	物理处理器信息，包括每个虚拟CPU、厂商名、时钟速率、缓存大小
diskstats	所有的磁盘的I/O统计数据
interrupts	每个CPU的中断统计
loadavg	负载平均值
meminfo	系统内存使用情况breakdown
net/dev	网络接口汇总
net/netstat	系统级networking数据统计
net/tcp	活动的TCP套接字信息
pressure	Pressure stall information (PSI) files；cpu、io、memory的压力阻塞记录，分析比如OOM问题
schedstat	系统级别的CPU调度统计
self	当前进程的符号链接
slabinfo	内核slab缓存分配使用情况
stat	内核和系统的资源统计汇总：CPUs、磁盘、页表、swap、进程
zoneinfo	memory zone信息

4.3.2 /sys文件系统

不同于/proc系统，/sys一开始是为统计device driver statistics设计的，不过现在也发展到全面统计数据。

4.3.4 延时核算

内核开启CONFIG_TASK_DELAY_ACCT后，就会为每个任务统计以下数据：

Scheduler latency: 调度延时，等待获取到CPU的时间
Block I/O：块I/O，等待块I/O完成
Swapping：交换，等待换页（内存压力）
Memory reclaim：内存回收，等待内存回收例程

内核Documentation/accounting/delay-accounting.txt中帮助文档，且有个例子tools/accounting/getdelays.c

请添加图片描述

这是在一个高负载的系统上采集的数据，CPU延时很严重。

4.3.4 netlink

netlink机制，用户态和内核态通信的方法之一，genetlink更方便扩展。

4.3.5 tracepoints

Tracepoints are hard-coded instrumentation points placed at logical locations in kernel code。

举例：

在系统调用的start和end处、调度事件、文件系统操作、以及磁盘I/O等地方都有tarcepoints。有些tracepoint需要开启内核支持，比如CONFIG_RCU_TRACE用于支持rcu tracepoints。

tracepoint overhead（跟踪点的开销）

激活了tracepoints后，会增大CPU开销、文件记录操作开销等，这些额外的开销是否干扰到测试关心的性能数据，具体情况具体分析。

4.3.6 kprobes

kprobes (short for kernel probes) is a Linux kernel event source for tracers based on dynamic instrumentation。

kprobes可以跟踪任一内核函数或指令。

kprobe如何使用：标准做法是在正在运行的内核代码中修改指令以插入我们想要的监测点；测量函数入口时可以使用已有的ftrace功能，减少额外overhead开销。

kprobes和tracepoints对比：

请添加图片描述

kprobe可观察函数入参，kretprobes观察函数返回值

4.3.7 uprobes

uprobes (user-space probes) are similar to kprobes, but for user-space.

4.3.8 USDT

User-level statically-defined tracing (USDT) is the user-space version of tracepoints

4.3.9 Hardware Counters（PMCs）

The processor and other devices commonly support hardware counters for observing activity. The main source are the processors, where they are commonly called performance monitoring counters (PMCs). They are known by other names as well: CPU performance counters (CPCs), performance instrumentation counters (PICs), and performance monitoring unit events (PMU events). These all refer to the same thing: programmable hardware registers on the processor that provide low-level performance information at the CPU cycle level.

处理器上的可编程硬件寄存器，可提供CPU循环级别的系统性能信息；

PMC面临的挑战：

溢流式采样的精度问题
云环境中的可用性问题

4.3.10 其他观测资源

MSR: model-specific registers；

ptrace:系统调用，被gdb用于调试，被strace用于跟踪

netfilter conntrack：netfilter连接跟踪机制；

4.4 sar工具

# 如何开启sysstat统计？
# vi /etc/default/sysstat
Enable="true"

root@ubuntu:~# sar -u -n TCP 3 3
Linux 5.4.0-58-generic (ubuntu) 	2020年12月28日 	_x86_64_	(2 CPU)

18时05分02秒     CPU     %user     %nice   %system   %iowait    %steal     %idle
18时05分05秒     all     11.59      0.00     38.10      0.21      0.00     50.10

18时05分02秒  active/s passive/s    iseg/s    oseg/s
18时05分05秒      3.67      0.00     15.00     19.33

18时05分05秒     CPU     %user     %nice   %system   %iowait    %steal     %idle
18时05分08秒     all      9.64      0.00     32.13      3.61      0.00     54.62

18时05分05秒  active/s passive/s    iseg/s    oseg/s
18时05分08秒      4.33      0.00     92.33     95.33

18时05分08秒     CPU     %user     %nice   %system   %iowait    %steal     %idle
18时05分11秒     all     10.82      0.00     48.12      0.22      0.00     40.84

18时05分08秒  active/s passive/s    iseg/s    oseg/s
18时05分11秒      0.33      0.00     13.67     14.33

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all     10.67      0.00     39.19      1.39      0.00     48.74

Average:     active/s passive/s    iseg/s    oseg/s
Average:         2.78      0.00     40.33     43.00

4.5 tracing工具


perf	Linux官方分析工具，擅长CPU分析（采样分析）和PMC统计，也能分析其他event事件
ftrace	Linux官方跟踪工具，可以不需要依赖运行（需要内核开启一些CONFIG）
BPF	Extended BPF工具，BCC,bpftrace
system tap	A high-level language and tracer with many tapsets (libraries) for tracing different targets. 工具stapbpf暂未研究
LTTng	A tracer optimized for black-box recording: optimally recording many events for later analysis

perf用于CPU分析，ftrace用于内核代码跟踪，BCC/bpftrace用于其他任何地方（内存、文件系统、磁盘、网络以及应用程序追踪）。