背景
通常网络丢包有众多排查或观测方案,如 :
1.ifocnfig观测网卡层丢包。
2.ethtool -s观测协议栈丢包。
3.netstat 观测连接丢包。
4.nettrace分析丢包。
本文给出内核net/core自带drop_monitor模块和用户态dropwatch配合使用,完整定位协议栈丢包回溯定位的方案。
环境
本文案例基于linux开源学习项目:linux-ps / ls · GitCode
该项目基于yocto搭建了一整套qemuarm64的环境,学习使用参考前期博文。
实现
1.工程支持iperf3
本文使用iperf3在qemu虚拟机和host ubuntu做udp打流,构造丢包案例。
因此,工程需要支持iperf3。
在yocto中实现rootfs支持iperf3很简单,详见代码修改:
涉及仓库:
https://gitcode.net/linux-ps/ls |
修改点:IMAGE_INSTALL += "iperf3"
表示安装支持iperf3包。
2.内核支持
本文内核使用linux-study项目内核,涉及仓库:
https://gitcode.net/linux-ps/kernel-note |
修改内核配置支持drop monitor
3.部署dropwatch用户态工具
本文使用dropwatch工具为1.5.4版本,源码仓库见博主gitee:
https://gitee.com/chenheyun2022/dropwatch |
因此需要将源码拉入ls工程中,修改工程manifest,将内核源码下载到source/tools目录。
修改点:
dropwatch的bb文件编写:
此处不得不得不说,yocto支持交叉编译真的是神神器,dropwatch使用autoreconf配合makefile编译安装。
dropwatch依赖libnl readline libpcap binutils等,嵌入式使用工具链编译安装十分复杂,而在yocto下已经使用sysroot环境,且依赖描述非常简单。
yocto原生不支持dropwatch,因此,编写bb如下:
LICENSE = "GPLv2"
DESCRIPTION = "DropWatch"
inherit externalsrc
inherit autotools pkgconfig gettext
METASEMI_DIR = "${THISDIR}/../../../"
DEPENDS = "libnl readline libpcap binutils"
EXTERNALSRC:pn-dropwatch = "${METASEMI_DIR}/source/tools/dropwatch-1.5.4"
EXTERNALSRC_BUILD:pn-linux-kernel = "${B}"
KBUILD_OUTPUT = "${B}"
OE_TERMINAL_EXPORTS += "KBUILD_OUTPUT"
S = "${METASEMI_DIR}/source/tools/dropwatch-1.5.4"
其中:
1.本文使用外部源码编译:inherit externalsrc
2.指定源码路径:METASEMI_DIR = "${THISDIR}/../../../" EXTERNALSRC:pn-dropwatch = "${METASEMI_DIR}/source/tools/dropwatch-1.5.4" 即源码下载路径。
3.使用autoreconf编译套件: inherit autotools pkgconfig gettext
这里值得提一下,yocto支持大部分编译方式,标准且美丽:如裸makefile、cmake工程,meson build、rust等,可以直接引用标准类完成适配。
4.dropwatch源码依赖库:
DEPENDS = "libnl readline libpcap binutils"
编译安装:
直接初始化工程之后:bitbake dropwatch 编译通过。
案例
完成内核配置修改,以及用户态工具部署(yocto工程可参考前期博文或者联系博主)之后,编译镜像:
bitbake core-image-base
启动qemuarm64:
chy@ubuntu:/home/samba_shar/work/linux-ps1/meta-ls$ bitbake core-image-base
Loading cache: 100% |#############################################################################################################################################################| Time: 0:00:00
Loaded 2572 entries from dependency cache.
WARNING: /home/samba_shar/work/linux-ps1/build/../poky/meta/recipes-core/systemd/systemd_249.7.bb: Var <do_install>:1: DeprecationWarning: invalid escape sequence \$ | ETA: --:--:--
Parsing recipes: 100% |###########################################################################################################################################################| Time: 0:00:00
Parsing of 1663 .bb files complete (1660 cached, 3 parsed). 2575 targets, 337 skipped, 0 masked, 0 errors.
NOTE: Resolving any missing task queue dependencies
Build Configuration:
BB_VERSION = "1.52.0"
BUILD_SYS = "x86_64-linux"
NATIVELSBSTRING = "universal"
TARGET_SYS = "aarch64-poky-linux"
MACHINE = "qemuarm64"
DISTRO = "poky"
DISTRO_VERSION = "3.4.4"
TUNE_FEATURES = "aarch64 armv8a crc cortexa57"
TARGET_FPU = ""
meta
meta-poky
meta-yocto-bsp = "HEAD:2cee7f9ef080f3c512745721442119b78b5db028"
meta-oe = "HEAD:ee8a85a35b18f90a3fea50b6f09f6092856a54d7"
meta-ls = "master:3a7b4c226581f8f3ec52aa5047c1d1b896121782"
WARNING: /home/samba_shar/work/linux-ps1/build/../meta-ls/recipes-kernel/linux/linux-kernel_5.10.bb:do_compile is tainted from a forced run | ETA: 0:00:00
Initialising tasks: 100% |########################################################################################################################################################| Time: 0:00:01
Sstate summary: Wanted 19 Local 1 Network 0 Missed 18 Current 957 (5% match, 98% complete)
Removing 16 stale sstate objects for arch qemuarm64: 100% |#######################################################################################################################| Time: 0:00:00
NOTE: Executing Tasks
NOTE: linux-kernel: compiling from external source tree /home/samba_shar/work/linux-ps1/build/../meta-ls/recipes-kernel/linux/../../..//source/linux
NOTE: dropwatch: compiling from external source tree /home/samba_shar/work/linux-ps1/build/../meta-ls/recipes-support/dropwatch/../../..//source/tools/dropwatch-1.5.4
NOTE: Tasks Summary: Attempted 2581 tasks of which 2527 didn't need to be rerun and all succeeded.
Summary: There were 2 WARNING messages shown.
chy@ubuntu:/home/samba_shar/work/linux-ps1/meta-ls$
chy@ubuntu:/home/samba_shar/work/linux-ps1/meta-ls$
chy@ubuntu:/home/samba_shar/work/linux-ps1/meta-ls$
chy@ubuntu:/home/samba_shar/work/linux-ps1/meta-ls$ runqemu qemuarm64 nographic
runqemu - INFO - Running MACHINE=qemuarm64 bitbake -e ...
runqemu - INFO - Continuing with the following parameters:
KERNEL: [/home/samba_shar/work/linux-ps1/build/tmp/deploy/images/qemuarm64/Image--5.10-rc7-r0-qemuarm64-20240202164940.bin]
MACHINE: [qemuarm64]
FSTYPE: [ext4]
ROOTFS: [/home/samba_shar/work/linux-ps1/build/tmp/deploy/images/qemuarm64/core-image-base-qemuarm64-20240202164940.rootfs.ext4]
CONFFILE: [/home/samba_shar/work/linux-ps1/build/tmp/deploy/images/qemuarm64/core-image-base-qemuarm64-20240202164940.qemuboot.conf]
runqemu - INFO - Using preconfigured tap device tap0
runqemu - INFO - If this is not intended, touch /tmp/qemu-tap-locks/tap0.skip to make runqemu skip tap0.
runqemu - INFO - Network configuration: ip=192.168.7.2::192.168.7.1:255.255.255.0
1111
runqemu - INFO - Running /home/samba_shar/work/linux-ps1/build/tmp/work/x86_64-linux/qemu-helper-native/1.0-r1/recipe-sysroot-native/usr/bin/qemu-system-aarch64 -device virtio-net-device,netdev=net0,mac=52:54:00:12:34:02 -netdev tap,id=net0,ifname=tap0,script=no,downscript=no -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -drive id=disk0,file=/home/samba_shar/work/linux-ps1/build/tmp/deploy/images/qemuarm64/core-image-base-qemuarm64-20240202164940.rootfs.ext4,if=none,format=raw -device virtio-blk-device,drive=disk0 -device qemu-xhci -device usb-tablet -device usb-kbd -machine virt -cpu cortex-a57 -smp 8 -m 1024 -serial mon:stdio -serial null -nographic -device virtio-gpu-pci -kernel /home/samba_shar/work/linux-ps1/build/tmp/deploy/images/qemuarm64/Image--5.10-rc7-r0-qemuarm64-20240202164940.bin -append 'root=/dev/vda rw mem=1024M ip=192.168.7.2::192.168.7.1:255.255.255.0 console=ttyAMA0 console=hvc0 '
qemuarm64做服务端
qemu中执行:iperf3 -s
ubuntu做客户端,使用udp,且贷款拉满,运行60s:
qemu中开启dropwatch(内核:CONFIG_NET_DROP_MONITOR=y)
执行dropwatch:
分析
由此可见,此时udp丢包率为96%,丢包逻辑全在:
45324 drops at udp_queue_rcv_one_skb+3fc (0xffffffc0109b179c) [software]