最近交叉编译最新的busybox 1.35源码在OK6410板子上做实验,遇到了些问题,做下总结。
1. gcc-linaro-7.5.0-2019.12-arm-linux-gnueabi 该系列高版本工具链不能用于busybox的交叉编译
虽然能够编译出来,但系统无法运行,直接提示SIGSEGV段错误,每次重编后的错误日志基本如下,可以看出PC和LR都是错误的,已超出实际内核MMU映射范围,其他寄存器也定位不到代码。
[ 2.696508] init: unhandled page fault (11) at 0x0000000c, code 0x817
[ 2.700443] pgd = (ptrval)
[ 2.702998] [0000000c] *pgd=516bd831, *pte=00000000, *ppte=00000000
[ 2.709293] CPU: 0 PID: 1 Comm: init Not tainted 5.10.103-g43566770ca58 #61
[ 2.716247] Hardware name: Samsung S3C64xx (Flattened Device Tree)
[ 2.722404] PC is at 0xb6fad000
[ 2.725506] LR is at 0xb6facb53
[ 2.728592] pc : [<b6fad000>] lr : [<b6facb53>] psr: 00000030
[ 2.734917] sp : be8b4f00 ip : 00000000 fp : 00000000
[ 2.740135] r10: 000275e4 r9 : 00000000 r8 : 00000000
[ 2.745297] r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : 00000940
[ 2.751830] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : be8b4f00
[ 2.758298] Flags: nzcv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user
[ 2.765702] Control: 00c5387d Table: 517b0008 DAC: 00000055
[ 2.771470] CPU: 0 PID: 1 Comm: init Not tainted 5.10.103-g43566770ca58 #61
[ 2.778348] Hardware name: Samsung S3C64xx (Flattened Device Tree)
[ 2.784583] [<c0014474>] (unwind_backtrace) from [<c0012090>] (show_stack+0x10/0x14)
[ 2.792320] [<c0012090>] (show_stack) from [<c0014c18>] (__do_user_fault+0xbc/0xd0)
[ 2.799852] [<c0014c18>] (__do_user_fault) from [<c0014dd8>] (do_page_fault+0x1ac/0x29c)
[ 2.808017] [<c0014dd8>] (do_page_fault) from [<c001502c>] (do_DataAbort+0x38/0xbc)
[ 2.815603] [<c001502c>] (do_DataAbort) from [<c000903c>] (__dabt_usr+0x3c/0x40)
[ 2.822988] Exception stack(0xc0c47fb0 to 0xc0c47ff8)
由于低版本的arm-none-linux-gnueabi-4.8.3交叉工具链是可以正常编译busybox并且运行正常。故一开始的思路是比较4.8.3 与7.5.0工具链的差异 。ELF文件的运行除了本身代码,还跟可能的关联动态库有关。
readelf -h -d <so库或elf文件> 查询关联的so依赖库和ELF格式信息
$ arm-linux-gnueabi-readelf -d rootfs/bin/busybox 查询依赖动态链接段
Dynamic section at offset 0x10900c contains 26 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libresolv.so.2]
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000000c (INIT) 0xc604
$ arm-linux-gnueabi-readelf -h ~/tools/7.5.0/arm-linux-gnueabi/libc/lib/libc-2.25.so
ELF Header:
Magic: 7f 45 4c 46 01 01 01 03 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - GNU
ABI Version: 0
Type: DYN (Shared object file)
Machine: ARM
Version: 0x1
Entry point address: 0x16989
Start of program headers: 52 (bytes into file)
Start of section headers: 13886044 (bytes into file)
Flags: 0x5000200, Version5 EABI, soft-float ABI
...
$ arm-linux-gnueabi-readelf -h ~/tools/4.8.3/arm-none-linux-gnueabi/libc/lib/libc-2.18.so
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: ARM
Version: 0x1
Entry point address: 0x183fc
Start of program headers: 52 (bytes into file)
Start of section headers: 1374952 (bytes into file)
Flags: 0x5000202, Version5 EABI, soft-float ABI, <unknown>
对比后,发现7.5.0版本的以下so文件的OS/ABI(Application Binary Interface)格式都变成“UNIX - GNU”,而旧版本默认“UNIX -System V” :
File: /home/golden/tools/7.5.0/arm-linux-gnueabi/libc/lib/libatomic.so
OS/ABI: UNIX - GNU
File: /home/golden/tools/7.5.0/arm-linux-gnueabi/libc/lib/libc-2.25.so
OS/ABI: UNIX - GNU
File: /home/golden/tools/7.5.0/arm-linux-gnueabi/libc/lib/libpthread-2.25.so
OS/ABI: UNIX - GNU
File: /home/golden/tools/7.5.0/arm-linux-gnueabi/libc/lib/librt-2.25.so
OS/ABI: UNIX - GNU
File: /home/golden/tools/7.5.0/arm-linux-gnueabi/libc/lib/libstdc++.so
OS/ABI: UNIX - GNU
File: /home/golden/tools/4.8.3/arm-none-linux-gnueabi/libc/lib/librt-2.18.so
OS/ABI: UNIX - GNU
参考文章“STT_GNU_IFUNC 与 libc.so 的 GNU 扩展类型 ABI 问题” 可知,SystemV ABI是通用接口类型,GNU ABI则支持IFUNC扩展功能,所以可执行文件需要正确的关联动态库,否则可能有问题。但比较奇怪的是即便确认rootfs/库文件与交叉工具链一致,问题依然如此; 按照网上的一些建议,尝试设置CONFIG_STATIC,静态编译busybox,这样规避掉动态链接库的差异,不过编译出来的bin/busybox的OS/ABI仍然存在差异, gcc7.5.0版本的编译结果是“UNIX - GNU”, gcc4.8.3版本的则仍然是“UNIX - System V”。
静态编译busybox后,开机日志出现了变化, 仍然会crash, 但PC和LR寄存器这回都是正常的,都能定位到代码位置, 但每次代码修改后,crash的位置和原因都不一样, 可能是unhandled page fault , 也可能是undefined instruction,定位到汇编的位置也很奇葩,比如明明是stmdb有效指令,kernel panic依然提示未定义指令。
到这里的时候,这个问题已经有两方面的怀疑点:
- gcc-7.5.0编译的OS/ABI格式差异,可能是导致/sbin/init崩溃的原因。
- gcc-7.5.0库文件存在缺陷。
为了验证这两点,测试源文件简单处理,不涉及函数调用,编译成ELF文件,重点测试工具链的库文件的函数。
文件: add.c
#include <stdio.h>
int main(void)
{
int a=1,b=2;
return a+b;
}
- 分别以动态链接和静态链接的方式编译add.c,对比大小 --- 无异常。
$ ~/tools/4.8.3/bin/arm-none-linux-gnueabi-gcc -o add4 add.c
$ ~/tools/4.8.3/bin/arm-none-linux-gnueabi-nm -l -n add4 | grep -E " T | B "
00008394 T _init
000083e4 T _start
00008518 T main
00008558 T __libc_csu_init
000085bc T __libc_csu_fini
000085c0 T _fini
0001077c B __bss_start
0001077c B __bss_start__
00010780 B __bss_end__
00010780 B _bss_end__
00010780 B __end__
00010780 B _end
$ ~/tools/7.5.0/bin/arm-linux-gnueabi-gcc -o add7 add.c
$ ~/tools/7.5.0/bin/arm-linux-gnueabi-nm -l -n add7 | grep -E " T | B "
000102bc T _init ...~master/csu/../sysdeps/arm/crti.S:83
0001030c T _start ...~master/csu/../sysdeps/arm/start.S:79
000103cc T main
000103f4 T __libc_csu_init ...~master/csu/elf-init.c:67
00010430 T __libc_csu_fini ...~master/csu/elf-init.c:95
00010434 T _fini ...~master/csu/../sysdeps/arm/crti.S:95
00021028 B __bss_start
00021028 B __bss_start__
0002102c B __bss_end__
0002102c B _bss_end__
0002102c B __end__
0002102c B _end
$ ~/tools/4.8.3/bin/arm-none-linux-gnueabi-gcc -o add4_static add.c -static //静态编译
$ ~/tools/4.8.3/bin/arm-none-linux-gnueabi-nm -l -n add4_static | grep -E " T | B "
00000010 B errno
00000010 B __libc_errno
00000018 B __libc_tsd_CTYPE_B
0000001c B __libc_tsd_CTYPE_TOUPPER
00000020 B __libc_tsd_CTYPE_TOLOWER
00008114 T _init
00008b5c T _start
00008cd0 T main
....
0008caa0 B __printf_arginfo_table
0008caa4 B __printf_va_arg_table
0008cab0 B __end__
0008cab0 B _end
$ ~/tools/7.5.0/bin/arm-linux-gnueabi-gcc -o add7_static add.c -static //静态编译
$ ~/tools/7.5.0/bin/arm-linux-gnueabi-nm -l -n add7_static | grep -E " T | B "
00000010 B errno
00000010 B __libc_errno
00000018 B __libc_tsd_CTYPE_B
0000001c B __libc_tsd_CTYPE_TOUPPER
00000020 B __libc_tsd_CTYPE_TOLOWER
00010160 T _init ...~master/csu/../sysdeps/arm/crti.S:83
000102ec T _start ...~master/csu/../sysdeps/arm/start.S:79
000103e8 T main
...
0007aa58 B __printf_arginfo_table ...~master/stdio-common/reg-printf.c:27
0007aa5c B __printf_va_arg_table ...~master/stdio-common/reg-type.c:25
0007aa68 B __end__
0007aa68 B _end
- 对比ABI类型:动态链接编译方式都是SystemV,仅gcc-7.5.0静态链接结果为GNU类型
$ readelf -h add4* add7* | grep -E "File|OS/ABI"
File: add4
OS/ABI: UNIX - System V
File: add4_static
OS/ABI: UNIX - System V
File: add7
OS/ABI: UNIX - System V
File: add7_static
OS/ABI: UNIX - GNU
- 在arm-linux-gnueabi-gcc version 11.1.0的系统环境中测试如下,
root@ok6410:/home# ./add4 //执行正常
root@ok6410:/home# ./add4_static //执行正常
root@ok6410:/home# ./add7 //程序崩溃
[ 4228.126974] 8<--- cut here ---
[ 4228.127056] add7: unhandled page fault (11) at 0x0005f4d2, code 0x80000007
[ 4228.131410] pgd = a395bdef
[ 4228.133984] [0005f4d2] *pgd=51ec4831, *pte=00000000, *ppte=00000000
[ 4228.140302] CPU: 0 PID: 15926 Comm: add7 Not tainted 5.10.103-g43566770ca58 #63
[ 4228.147513] Hardware name: Samsung S3C64xx (Flattened Device Tree)
[ 4228.153742] PC is at 0x5f4d2
[ 4228.156542] LR is at 0x1031f
[ 4228.159402] pc : [<0005f4d2>] lr : [<0001031f>] psr: 40000030
[ 4228.165725] sp : beed9e6c ip : 00000014 fp : 00000000
[ 4228.170937] r10: b6fed000 r9 : 00000000 r8 : 00000000
[ 4228.176063] r7 : 00000000 r6 : 0001030d r5 : 00000000 r4 : 00000000
[ 4228.182622] r3 : 0000001c r2 : beed9e74 r1 : 00000001 r0 : 00000000
[ 4228.189081] Flags: nZcv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user
[ 4228.196545] Control: 00c5387d Table: 51c58008 DAC: 00000055
[ 4228.202250] CPU: 0 PID: 15926 Comm: add7 Not tainted 5.10.103-g43566770ca58 #63
[ 4228.209464] Hardware name: Samsung S3C64xx (Flattened Device Tree)
[ 4228.215751] [<c0013e1c>] (unwind_backtrace) from [<c0011b14>] (show_stack+0x10/0x14)
[ 4228.223461] [<c0011b14>] (show_stack) from [<c0014494>] (__do_user_fault+0xbc/0xd0)
[ 4228.231072] [<c0014494>] (__do_user_fault) from [<c00146e4>] (do_page_fault+0x1b4/0x294)
[ 4228.239079] [<c00146e4>] (do_page_fault) from [<c0014a40>] (do_PrefetchAbort+0x38/0x88)
[ 4228.247129] [<c0014a40>] (do_PrefetchAbort) from [<c0009228>] (ret_from_exception+0x0/0x18)
[ 4228.255446] Exception stack(0xc1d1dfb0 to 0xc1d1dff8)
[ 4228.260462] dfa0: 00000000 00000001 beed9e74 0000001c
[ 4228.268566] dfc0: 00000000 00000000 0001030d 00000000 00000000 00000000 b6fed000 00000000
[ 4228.276799] dfe0: 00000014 beed9e6c 0001031f 0005f4d2 40000030 ffffffff
Segmentation fault
root@ok6410:/home# ./add7_static //无限等待
程序add7触发kernel panic,add7程序代码边界是 00010780 B _end,pc 值 <0005f4d2>超出了,具体无知;通过arm-linux-gnueabi-odjdump -D -S add7 反汇编,能定位了 lr值 <0001031f>地址位置,可以看出:
lr值不是一个有效的程序地址,也不是发生了函数跳转;pc值从代码逻辑上看,不通,跑飞了
Disassembly of section .text:
0001030c <_start>:
1030c: f04f 0b00 mov.w fp, #0
10310: f04f 0e00 mov.w lr, #0
10314: bc02 pop {r1}
10316: 466a mov r2, sp
10318: b404 push {r2}
1031a: b401 push {r0}
1031c: f8df c010 ldr.w ip, [pc, #16] ; 10330 <_start+0x24> =>lr : [<0001031f>] 地址异常
10320: f84d cd04 str.w ip, [sp, #-4]!
10324: 4803 ldr r0, [pc, #12] ; (10334 <_start+0x28>)
10326: 4b04 ldr r3, [pc, #16] ; (10338 <_start+0x2c>)
10328: f7ff efde blx 102e8 <__libc_start_main@plt>
1032c: f7ff efe8 blx 10300 <abort@plt>
10330: 00010431 andeq r0, r1, r1, lsr r4
10334: 000103cd andeq r0, r1, sp, asr #7
10338: 000103f5 strdeq r0, [r1], -r5
测试结果分析:
- gcc-4.8.3的编译结果都正常;
- gcc-7.5.0动态链接编译结果崩溃;
从crash栈分析,说明程序异常发生于__start函数中,属于gcc-7.5.0工具链本身的代码。
另外,OS/ABI类型不是GNU,故与ABI类型应该无关 - gcc-7.5.0静态链接编译结果阻塞或死循环
静态链接,不依赖so库,程序本身代码逻辑异常,与源码设定不符;
从侧面说明了,GNU ABI格式,不是发生panic的原因。
结论:
1. gcc-7.5.0工具链确定存在问题,或者存在一些特殊设置或限制;
2. 程序的OS/ABI类型,不会导致程序崩溃,可排除此点
2. 使用Ubuntu arm-linux-gnueabi-gcc-11-- busybox编译错误
安装 gcc-11安装
命令: sudo apt-get install gcc-11-arm-linux-gnueabi g++-11-arm-linux-gnueabi
注意在/usr/bin/下,需要手动添加 arm-linux-gnueabi-gcc 等符号链接:
/usr/bin/arm-linux-gnueabi-cpp -> arm-linux-gnueabi-cpp-11
/usr/bin/arm-linux-gnueabi-g++ -> arm-linux-gnueabi-g++-11
/usr/bin/arm-linux-gnueabi-gcc -> arm-linux-gnueabi-gcc-11
/usr/bin/arm-linux-gnueabi-gcc-ar -> arm-linux-gnueabi-gcc-ar-11
/usr/bin/arm-linux-gnueabi-gcc-nm -> arm-linux-gnueabi-gcc-nm-11
/usr/bin/arm-linux-gnueabi-gcc-ranlib -> arm-linux-gnueabi-gcc-ranlib-11
/usr/bin/arm-linux-gnueabi-gcov -> arm-linux-gnueabi-gcov-11
/usr/bin/arm-linux-gnueabi-gcov-dump -> arm-linux-gnueabi-gcov-dump-11
/usr/bin/arm-linux-gnueabi-gcov-tool -> arm-linux-gnueabi-gcov-tool-11
/usr/bin/arm-linux-gnueabi-lto-dump -> arm-linux-gnueabi-lto-dump-11
编译错误
>>>Error 1:
CC networking/inetd.o
networking/inetd.c:255:11: fatal error: rpc/rpc.h: No such file or directory
255 | # include <rpc/rpc.h>
>>>Error 2:
networking/lib.a(inetd.o): In function`unregister_rpc':
networking/inetd.c:(.text.unregister_rpc+0x20):undefined reference to `pmap_unset'
networking/lib.a(inetd.o): In function`register_rpc':
networking/inetd.c:(.text.register_rpc+0x50):undefined reference to `pmap_unset'
解决办法
以上两个错误本质上是同一个问题, 可交叉编译libtirpc库源码来解决。
下载源码: /libtirpc/1.3.2/libtirpc-1.3.2.tar.bz2
解压源码后: tar xjvf libtirpc-1.3.2.tar.bz2
编译步骤:
./configure --host=arm-linux-gnueabi --prefix=/usr/arm-linux-gnueabi CFLAGS=-I/usr/arm-linux-gnueabi/include LDFLAGS=-L/usr/arm-linux-gnueabi/lib --disable-gssapi
make
sudo make install
安装好libtirpc的头文件和库到/usr/arm-linux-gnueabi/ 目录下,就准备好了编译环境。
还有最后一步: 需要将关联头文件路径和库,配置到busybox脚本里面
CONFIG_EXTRA_CFLAGS="-I/usr/include/tirpc"
CONFIG_EXTRA_LDLIBS="tirpc"
追加到.config文件中, 或你的defconfig